* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
@ 2018-09-20 19:50 0% ` Michel Machado
0 siblings, 0 replies; 200+ results
From: Michel Machado @ 2018-09-20 19:50 UTC (permalink / raw)
To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, yipeng1.wang
On 09/12/2018 04:37 PM, Honnappa Nagarahalli wrote:
>>> +int32_t
>>> +rte_hash_iterator_init(const struct rte_hash *h,
>>> + struct rte_hash_iterator_state *state) {
>>> + struct rte_hash_iterator_istate *__state;
>>> '__state' can be replaced by 's'.
>>>
>>> +
>>> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>>> +
>>> + __state = (struct rte_hash_iterator_istate *)state;
>>> + __state->h = h;
>>> + __state->next = 0;
>>> + __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>>> +
>>> + return 0;
>>> +}
>>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>>
>> Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>>
>> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
>
> It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.
>
> About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.
>
> As my tests showed, I do not see any impact of this.
Ok, we are going to eliminate the init functions in v4.
>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>> index 9e7d9315f..fdb01023e 100644
>>> --- a/lib/librte_hash/rte_hash.h
>>> +++ b/lib/librte_hash/rte_hash.h
>>> @@ -14,6 +14,8 @@
>>> #include <stdint.h>
>>> #include <stddef.h>
>>>
>>> +#include <rte_compat.h>
>>> +
>>> #ifdef __cplusplus
>>> extern "C" {
>>> #endif
>>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>> /** @internal A hash table structure. */ struct rte_hash;
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * @internal A hash table iterator state structure.
>>> + */
>>> +struct rte_hash_iterator_state {
>>> + uint8_t space[64];
>>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>>
>> Okay.
>
> I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.
>
> Ok. May be have a #define for 64?
Ok.
[ ]'s
Michel Machado
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID
2018-09-21 16:13 16% ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-21 16:13 4% ` Anatoly Burakov
1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external
@ 2018-09-21 16:13 16% ` Anatoly Burakov
2018-09-21 16:13 4% ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 12 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 133 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,9 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK.
+
Removed Items
-------------
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
@ 2018-09-24 17:31 4% ` Ferruh Yigit
2018-09-24 17:12 0% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-09-24 17:31 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic
Cc: dev, Ferruh Yigit, Thomas Monjalon, david.marchand
Document changes done in
commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2f53564a9..41b9cd8d5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -112,6 +112,12 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* The default behaviour of CRC strip offload changed. Without any specific Rx
+ offload flag, default behavior by PMD is now to strip CRC.
+ DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
+ To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
+ offload.
+
ABI Changes
-----------
--
2.17.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
2018-09-24 17:31 4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
@ 2018-09-24 17:12 0% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2018-09-24 17:12 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, dev, Thomas Monjalon
On Mon, Sep 24, 2018 at 7:31 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> Document changes done in
> commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index 2f53564a9..41b9cd8d5 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -112,6 +112,12 @@ API Changes
> flag the MAC can be properly configured in any case. This is particularly
> important for bonding.
>
> +* The default behaviour of CRC strip offload changed. Without any specific Rx
> + offload flag, default behavior by PMD is now to strip CRC.
> + DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
> + To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
> + offload.
> +
>
> ABI Changes
> -----------
Reviewed-by: David Marchand <david.marchand@6wind.com>
--
David Marchand
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
@ 2018-09-25 8:49 4% ` Nikhil Rao
2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:49 4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
1 sibling, 1 reply; 200+ results
From: Nikhil Rao @ 2018-09-25 8:49 UTC (permalink / raw)
To: jerin.jacob; +Cc: dev, Nikhil Rao, stable
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.
Also, update the event rx adapter test to use 16 bit
ethernet port ids.
Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)
lib/librte_eventdev/rte_eventdev.h | 2 +-
lib/librte_eventdev/rte_eventdev.c | 2 +-
test/test/test_event_eth_rx_adapter.c | 6 +++---
doc/guides/rel_notes/release_18_11.rst | 4 +++-
lib/librte_eventdev/Makefile | 2 +-
5 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
*
*/
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps);
#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
}
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps)
{
struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
static struct event_eth_rx_adapter_test_params default_params;
static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
struct rte_mempool *mp)
{
const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+ has been changed from uint8_t to uint16_t.
ABI Changes
-----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_distributor.so.1
librte_eal.so.8
librte_ethdev.so.10
- librte_eventdev.so.4
+ + librte_eventdev.so.6
librte_flow_classify.so.1
librte_gro.so.1
librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_eventdev.a
# library version
-LIBABIVER := 5
+LIBABIVER := 6
# build flags
CFLAGS += -DALLOW_EXPERIMENTAL_API
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:50 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-25 9:15 UTC (permalink / raw)
To: Nikhil Rao; +Cc: dev, stable, thomas
-----Original Message-----
> Date: Tue, 25 Sep 2018 14:19:12 +0530
> From: Nikhil Rao <nikhil.rao@intel.com>
> To: jerin.jacob@caviumnetworks.com
> CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> X-Mailer: git-send-email 1.8.3.1
>
>
> Make the ethernet port id passed into
> rte_event_eth_rx_adapter_caps_get() 16 bit.
>
> Also, update the event rx adapter test to use 16 bit
> ethernet port ids.
>
> Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> Cc: stable@dpdk.org
>
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>
> v2:
> * squash changes to autotest and library into a single patch (Jerin Jacob)
> * add update to release notes (Jerin Jacob)
>
> lib/librte_eventdev/rte_eventdev.h | 2 +-
> lib/librte_eventdev/rte_eventdev.c | 2 +-
> test/test/test_event_eth_rx_adapter.c | 6 +++---
> doc/guides/rel_notes/release_18_11.rst | 4 +++-
> lib/librte_eventdev/Makefile | 2 +-
Missing version update in lib/librte_eventdev/meson.build. See version=
> 5 files changed, 9 insertions(+), 7 deletions(-)
>
> ABI Changes
> -----------
> @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> librte_distributor.so.1
> librte_eal.so.8
> librte_ethdev.so.10
> - librte_eventdev.so.4
> + + librte_eventdev.so.6
Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
release notes. The version(change to librte_eventdev.so.5) should have been
updated in change set in 3810ae4357.
+Thomas,
In case if he has difference in opinion on updating released release note file.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v3] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25 9:49 4% ` Nikhil Rao
1 sibling, 0 replies; 200+ results
From: Nikhil Rao @ 2018-09-25 9:49 UTC (permalink / raw)
To: jerin.jacob; +Cc: dev, Nikhil Rao, stable
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.
Also, update the event rx adapter test to use 16 bit
ethernet port ids.
Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)
v3:
* update meson.build (Jerin Jacob)
lib/librte_eventdev/rte_eventdev.h | 2 +-
lib/librte_eventdev/rte_eventdev.c | 2 +-
test/test/test_event_eth_rx_adapter.c | 6 +++---
doc/guides/rel_notes/release_18_11.rst | 4 +++-
lib/librte_eventdev/Makefile | 2 +-
lib/librte_eventdev/meson.build | 2 +-
6 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
*
*/
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps);
#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
}
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps)
{
struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
static struct event_eth_rx_adapter_test_params default_params;
static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
struct rte_mempool *mp)
{
const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+ has been changed from uint8_t to uint16_t.
ABI Changes
-----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_distributor.so.1
librte_eal.so.8
librte_ethdev.so.10
- librte_eventdev.so.4
+ + librte_eventdev.so.6
librte_flow_classify.so.1
librte_gro.so.1
librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_eventdev.a
# library version
-LIBABIVER := 5
+LIBABIVER := 6
# build flags
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/lib/librte_eventdev/meson.build b/lib/librte_eventdev/meson.build
index 3cbaf29..3c4e510 100644
--- a/lib/librte_eventdev/meson.build
+++ b/lib/librte_eventdev/meson.build
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
-version = 5
+version = 6
allow_experimental_apis = true
if host_machine.system() == 'linux'
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 9:15 0% ` Jerin Jacob
@ 2018-09-25 9:50 0% ` Thomas Monjalon
2018-09-25 9:56 0% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-09-25 9:50 UTC (permalink / raw)
To: Jerin Jacob; +Cc: Nikhil Rao, dev, stable
25/09/2018 11:15, Jerin Jacob:
> -----Original Message-----
> > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > From: Nikhil Rao <nikhil.rao@intel.com>
> > To: jerin.jacob@caviumnetworks.com
> > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > X-Mailer: git-send-email 1.8.3.1
> >
> >
> > Make the ethernet port id passed into
> > rte_event_eth_rx_adapter_caps_get() 16 bit.
> >
> > Also, update the event rx adapter test to use 16 bit
> > ethernet port ids.
> >
> > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >
> > v2:
> > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > * add update to release notes (Jerin Jacob)
> >
> > lib/librte_eventdev/rte_eventdev.h | 2 +-
> > lib/librte_eventdev/rte_eventdev.c | 2 +-
> > test/test/test_event_eth_rx_adapter.c | 6 +++---
> > doc/guides/rel_notes/release_18_11.rst | 4 +++-
> > lib/librte_eventdev/Makefile | 2 +-
>
> Missing version update in lib/librte_eventdev/meson.build. See version=
>
> > 5 files changed, 9 insertions(+), 7 deletions(-)
> >
> > ABI Changes
> > -----------
> > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> > librte_distributor.so.1
> > librte_eal.so.8
> > librte_ethdev.so.10
> > - librte_eventdev.so.4
> > + + librte_eventdev.so.6
>
> Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> release notes. The version(change to librte_eventdev.so.5) should have been
> updated in change set in 3810ae4357.
>
> +Thomas,
> In case if he has difference in opinion on updating released release note file.
I prefer such changes being atomic.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 9:50 0% ` Thomas Monjalon
@ 2018-09-25 9:56 0% ` Jerin Jacob
0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-09-25 9:56 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: Nikhil Rao, dev, stable
-----Original Message-----
> Date: Tue, 25 Sep 2018 11:50:06 +0200
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: Nikhil Rao <nikhil.rao@intel.com>, dev@dpdk.org, stable@dpdk.org
> Subject: Re: [PATCH v2] eventdev: fix port id argument in Rx adapter caps
> API
>
>
> 25/09/2018 11:15, Jerin Jacob:
> > -----Original Message-----
> > > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > > From: Nikhil Rao <nikhil.rao@intel.com>
> > > To: jerin.jacob@caviumnetworks.com
> > > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > > X-Mailer: git-send-email 1.8.3.1
> > >
> > >
> > > Make the ethernet port id passed into
> > > rte_event_eth_rx_adapter_caps_get() 16 bit.
> > >
> > > Also, update the event rx adapter test to use 16 bit
> > > ethernet port ids.
> > >
> > > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > >
> > > v2:
> > > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > > * add update to release notes (Jerin Jacob)
> > >
> > > lib/librte_eventdev/rte_eventdev.h | 2 +-
> > > lib/librte_eventdev/rte_eventdev.c | 2 +-
> > > test/test/test_event_eth_rx_adapter.c | 6 +++---
> > > doc/guides/rel_notes/release_18_11.rst | 4 +++-
> > > lib/librte_eventdev/Makefile | 2 +-
> >
> > Missing version update in lib/librte_eventdev/meson.build. See version=
> >
> > > 5 files changed, 9 insertions(+), 7 deletions(-)
> > >
> > > ABI Changes
> > > -----------
> > > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> > > librte_distributor.so.1
> > > librte_eal.so.8
> > > librte_ethdev.so.10
> > > - librte_eventdev.so.4
> > > + + librte_eventdev.so.6
> >
> > Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> > release notes. The version(change to librte_eventdev.so.5) should have been
> > updated in change set in 3810ae4357.
> >
> > +Thomas,
> > In case if he has difference in opinion on updating released release note file.
>
> I prefer such changes being atomic.
Me too. But the offending change set(3810ae4357) is old
➜ [master][dpdk.org] $ git describe 3810ae4357
v18.05-389-g3810ae435
Do you prefer to have patch to update the release_18_08.rst file or ignore it?
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
@ 2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
0 siblings, 2 replies; 200+ results
From: Luca Boccassi @ 2018-09-25 12:22 UTC (permalink / raw)
To: Thomas Monjalon, Konstantin Ananyev; +Cc: dev
On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> 24/08/2018 18:47, Konstantin Ananyev:
> > If user specifies priority=0 for some of ACL rules
> > that can cause rte_acl_classify to return wrong results.
> > The reason is that priority zero is used internally for no-match
> > nodes.
> > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > The simplest way to overcome the issue is just not allow zero
> > to be a valid priority for the rule.
> >
> > Fixes: dc276b5780c2 ("acl: new library")
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
> Cc: stable@dpdk.org
>
> Applied with below title, thanks
> acl: forbid rule with priority zero
Hi,
This patch is marked for stable, but it changes an enum in a public
header so it looks like an ABI breakage? Have I got it wrong?
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 12:22 3% ` Luca Boccassi
@ 2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-25 12:57 UTC (permalink / raw)
To: Luca Boccassi, Konstantin Ananyev; +Cc: dev
25/09/2018 14:22, Luca Boccassi:
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > >
> > > Fixes: dc276b5780c2 ("acl: new library")
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Cc: stable@dpdk.org
> >
> > Applied with below title, thanks
> > acl: forbid rule with priority zero
>
> Hi,
>
> This patch is marked for stable, but it changes an enum in a public
> header so it looks like an ABI breakage? Have I got it wrong?
- RTE_ACL_MIN_PRIORITY = 0,
+ RTE_ACL_MIN_PRIORITY = 1,
In my understanding, the change is not breaking the ABI because
the old minimal value (0) can still be used, with the same side effect.
The new value is just removing a side effect for newly compiled apps.
Konstantin, am I right?
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
@ 2018-09-25 14:34 0% ` Ananyev, Konstantin
2018-10-03 16:18 0% ` Luca Boccassi
1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-09-25 14:34 UTC (permalink / raw)
To: Luca Boccassi, Thomas Monjalon; +Cc: dev
Hi Luca,
>
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > >
> > > Fixes: dc276b5780c2 ("acl: new library")
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Cc: stable@dpdk.org
> >
> > Applied with below title, thanks
> > acl: forbid rule with priority zero
>
> Hi,
>
> This patch is marked for stable, but it changes an enum in a public header
Yes it does.
> so it looks like an ABI breakage? Have I got it wrong?
Strictly speaking - yes, but priority=0 is invalid value with current implementation.
I don't think someone uses it - as in that case acl library simply wouldn't work
correctly.
Konstantin
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v1] doc: remove unused release note file
@ 2018-09-25 15:25 3% John McNamara
0 siblings, 0 replies; 200+ results
From: John McNamara @ 2018-09-25 15:25 UTC (permalink / raw)
To: dev; +Cc: John McNamara
Remove unused file from the release notes docs. This file was
used to display a hierarchy in older releases, circa 2015, but
doesn't seem useful in the current structure.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
doc/guides/rel_notes/index.rst | 1 -
doc/guides/rel_notes/rel_description.rst | 12 ------------
2 files changed, 13 deletions(-)
delete mode 100644 doc/guides/rel_notes/rel_description.rst
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 89fdb4b..1243e98 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,7 +8,6 @@ Release Notes
:maxdepth: 1
:numbered:
- rel_description
release_18_11
release_18_08
release_18_05
diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst
deleted file mode 100644
index 8f28556..0000000
--- a/doc/guides/rel_notes/rel_description.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
- Copyright(c) 2010-2015 Intel Corporation.
-
-Description of Release
-======================
-
-This document contains the release notes for Data Plane Development Kit (DPDK)
-release version |release| and previous releases.
-
-It lists new features, fixed bugs, API and ABI changes and known issues.
-
-For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide <linux_gsg>`.
--
2.7.5
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps
` (3 preceding siblings ...)
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-26 11:22 9% ` Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1020,6 +1019,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps
` (4 preceding siblings ...)
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-26 11:22 4% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK
@ 2018-09-26 11:21 2% ` Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (4 more replies)
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 5 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:21 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 305 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 14 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 316 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
47 files changed, 1913 insertions(+), 139 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID
` (2 preceding siblings ...)
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-26 11:22 4% ` Anatoly Burakov
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-26 11:22 16% ` Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 134 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
@ 2018-09-26 12:22 3% ` Burakov, Anatoly
2018-09-29 6:15 3% ` Jeff Guo
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-26 12:22 UTC (permalink / raw)
To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 17-Aug-18 11:51 AM, Jeff Guo wrote:
> There are some extended interrupt types in vfio pci device except from the
> existing interrupts, such as err and req notifier, it could be useful for
> device error monitoring. And these corresponding interrupt handler is
> different from the other interrupt handler that register in PMDs, so a new
> interrupt handler should be added. This patch will add specific req handler
> in generic pci device.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> drivers/bus/pci/rte_bus_pci.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index 0d1955f..c45a820 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -66,6 +66,7 @@ struct rte_pci_device {
> uint16_t max_vfs; /**< sriov enable if not zero */
> enum rte_kernel_driver kdrv; /**< Kernel driver passthrough */
> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
> + struct rte_intr_handle req_notifier_handler;/**< Req notifier handle */
> };
>
> /**
>
Does this break ABI?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
@ 2018-09-26 18:04 2% ` Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
To: dev, ferruh.yigit; +Cc: thomas, Shreyansh Jain
About the series:
This series of patches upgrades the DPAA2 driver firmware to
v10.10.10 (MC Firmware).
As the bus/fslmc is modified, it is a dependent object for other
drivers like net/crypto/qdma. Also, the changes are mostly tightly
linked - thus, the patches include upgrade as well as sequential
changes to driver.
Once done, it would imply that DPAA2 driver won't work with any MC
FW lower than 10.10.10.
Support for this new firmware is available in publically available
LSDK (Layerscape SDK) release [1].
Besides the FW change, there are other subtle changes as well:
- Support reading the MAC address from NIC device, rather than
using a default MAC
- Adding support for QBMan 5.0 FW APIs
- Some patches for NXP's LX2 platform specific features
- And some bug fixes.
Dependency:
* These patches are based on net-next/master 58c3b609699a8c
* Series [1] is logically related to this, but has no git/patch
related dependency. It is series for upgrade of DPAA.
[1] https://lsdk.github.io/index.html
[2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
Version History:
v1->v2:
- Bumped up the version of the libraries (pmd/bus/crypto/event) as the
first set of patches (MC firmware update) breaks the internal ABI
- Added support for ordered processing APIs. These APIs are expected
to be used in subseqent feature updates on DPAA2 ethernet driver.
- Some internal bug fixes.
(Patches increased from 11~15)
Hemant Agrawal (9):
net/dpaa2: fix VLAN filter enablement
bus/fslmc: upgrade mc FW APIs to 10.10.0
net/dpaa2: upgrade dpni to mc FW APIs to 10.10.0
crypto/dpaa2_sec: upgarde mc FW APIs to 10.10.0
net/dpaa2: update RSS value in mbuf for lx2 platform
net/dpaa2: optimize the fd reset in Tx path
net/dpaa2: enhance the queue memory cleanup routines
net/dpaa2: support MBUF VLAN tci population from HW parser
net/dpaa2: support Rx checksum offload in slow parsing
Nipun Gupta (4):
net/dpaa2: fix IOVA conversion for congestion memory
bus/fslmc: support memory backed portals with QBMAN 5.0
bus/fslmc: support 32 enq and deq for LX2 platform
bus/fslmc: disable annotation prefetch for LX2
Shreyansh Jain (2):
net/dpaa2: read hardware provided MAC for DPNI devices
net/dpaa2: add per queue stats get and reset support
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 +++++
drivers/bus/fslmc/mc/dpcon.c | 30 +
drivers/bus/fslmc/mc/dpdmai.c | 14 +
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 +-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 +
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 ++
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 197 +++--
drivers/bus/fslmc/portal/dpaa2_hw_dpio.h | 4 +
drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 32 +-
drivers/bus/fslmc/qbman/include/compat.h | 3 +-
.../fslmc/qbman/include/fsl_qbman_portal.h | 33 +-
drivers/bus/fslmc/qbman/qbman_portal.c | 764 +++++++++++++++---
drivers/bus/fslmc/qbman/qbman_portal.h | 30 +-
drivers/bus/fslmc/qbman/qbman_sys.h | 100 ++-
drivers/bus/fslmc/qbman/qbman_sys_decl.h | 4 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 12 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 8 +-
drivers/crypto/dpaa2_sec/mc/dpseci.c | 128 ++-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci.h | 25 +-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci_cmd.h | 73 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/dpaa2_eventdev.c | 4 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/base/dpaa2_hw_dpni_annot.h | 40 +
drivers/net/dpaa2/dpaa2_ethdev.c | 173 +++-
drivers/net/dpaa2/dpaa2_rxtx.c | 95 ++-
drivers/net/dpaa2/mc/dpni.c | 134 ++-
drivers/net/dpaa2/mc/fsl_dpkg.h | 71 +-
drivers/net/dpaa2/mc/fsl_dpni.h | 378 +++++----
drivers/net/dpaa2/mc/fsl_dpni_cmd.h | 87 +-
drivers/net/dpaa2/mc/fsl_net.h | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
53 files changed, 2377 insertions(+), 585 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
@ 2018-09-26 18:04 2% ` Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
To: dev, ferruh.yigit; +Cc: thomas, Hemant Agrawal
From: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.
The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.
This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 ++++++++++++++++++++
drivers/bus/fslmc/mc/dpcon.c | 30 +++
drivers/bus/fslmc/mc/dpdmai.c | 14 ++
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 ++++-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +++++-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 ++
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 +++++++++
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 10 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
31 files changed, 541 insertions(+), 34 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 515d0f534..e95551980 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_ethdev
EXPORT_MAP := rte_bus_fslmc_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
qbman/qbman_portal.c \
diff --git a/drivers/bus/fslmc/mc/dpbp.c b/drivers/bus/fslmc/mc/dpbp.c
index 0215d22da..d9103409c 100644
--- a/drivers/bus/fslmc/mc/dpbp.c
+++ b/drivers/bus/fslmc/mc/dpbp.c
@@ -248,6 +248,16 @@ int dpbp_reset(struct fsl_mc_io *mc_io,
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpbp_get_attributes - Retrieve DPBP attributes.
+ *
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPBP object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpbp_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/dpci.c b/drivers/bus/fslmc/mc/dpci.c
index ff366bfa9..95edae9d9 100644
--- a/drivers/bus/fslmc/mc/dpci.c
+++ b/drivers/bus/fslmc/mc/dpci.c
@@ -265,6 +265,15 @@ int dpci_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpci_get_attributes() - Retrieve DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -292,6 +301,94 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpci_get_peer_attributes() - Retrieve peer DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned peer attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr)
+{
+ struct dpci_rsp_get_peer_attr *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_PEER_ATTR,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_peer_attr *)cmd.params;
+ attr->peer_id = le32_to_cpu(rsp_params->id);
+ attr->num_of_priorities = rsp_params->num_of_priorities;
+
+ return 0;
+}
+
+/**
+ * dpci_get_link_state() - Retrieve the DPCI link state.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @up: Returned link state; returns '1' if link is up, '0' otherwise
+ *
+ * DPCI can be connected to another DPCI, together they
+ * create a 'link'. In order to use the DPCI Tx and Rx queues,
+ * both objects must be enabled.
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up)
+{
+ struct dpci_rsp_get_link_state *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_LINK_STATE,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_link_state *)cmd.params;
+ *up = dpci_get_field(rsp_params->up, UP);
+
+ return 0;
+}
+
+/**
+ * dpci_set_rx_queue() - Set Rx queue configuration
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @priority: Select the queue relative to number of
+ * priorities configured at DPCI creation; use
+ * DPCI_ALL_QUEUES to configure all Rx queues
+ * identically.
+ * @cfg: Rx queue configuration
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -314,6 +411,9 @@ int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
dpci_set_field(cmd_params->dest_type,
DEST_TYPE,
cfg->dest_cfg.dest_type);
+ dpci_set_field(cmd_params->dest_type,
+ ORDER_PRESERVATION,
+ cfg->order_preservation_en);
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
@@ -438,3 +538,100 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
return 0;
}
+
+/**
+ * dpci_set_opr() - Set Order Restoration configuration.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @options: Configuration mode options
+ * can be OPR_OPT_CREATE or OPR_OPT_RETIRE
+ * @cfg: Configuration options for the OPR
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg)
+{
+ struct dpci_cmd_set_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_SET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_set_opr *)cmd.params;
+ cmd_params->index = index;
+ cmd_params->options = options;
+ cmd_params->oloe = cfg->oloe;
+ cmd_params->oeane = cfg->oeane;
+ cmd_params->olws = cfg->olws;
+ cmd_params->oa = cfg->oa;
+ cmd_params->oprrws = cfg->oprrws;
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
+/**
+ * dpci_get_opr() - Retrieve Order Restoration config and query.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @cfg: Returned OPR configuration
+ * @qry: Returned OPR query
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry)
+{
+ struct dpci_rsp_get_opr *rsp_params;
+ struct dpci_cmd_get_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_get_opr *)cmd.params;
+ cmd_params->index = index;
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_opr *)cmd.params;
+ cfg->oloe = rsp_params->oloe;
+ cfg->oeane = rsp_params->oeane;
+ cfg->olws = rsp_params->olws;
+ cfg->oa = rsp_params->oa;
+ cfg->oprrws = rsp_params->oprrws;
+ qry->rip = dpci_get_field(rsp_params->flags, RIP);
+ qry->enable = dpci_get_field(rsp_params->flags, OPR_ENABLE);
+ qry->nesn = le16_to_cpu(rsp_params->nesn);
+ qry->ndsn = le16_to_cpu(rsp_params->ndsn);
+ qry->ea_tseq = le16_to_cpu(rsp_params->ea_tseq);
+ qry->tseq_nlis = dpci_get_field(rsp_params->tseq_nlis, TSEQ_NLIS);
+ qry->ea_hseq = le16_to_cpu(rsp_params->ea_hseq);
+ qry->hseq_nlis = dpci_get_field(rsp_params->hseq_nlis, HSEQ_NLIS);
+ qry->ea_hptr = le16_to_cpu(rsp_params->ea_hptr);
+ qry->ea_tptr = le16_to_cpu(rsp_params->ea_tptr);
+ qry->opr_vid = le16_to_cpu(rsp_params->opr_vid);
+ qry->opr_id = le16_to_cpu(rsp_params->opr_id);
+
+ return 0;
+}
diff --git a/drivers/bus/fslmc/mc/dpcon.c b/drivers/bus/fslmc/mc/dpcon.c
index 3f6e04b97..92bd26512 100644
--- a/drivers/bus/fslmc/mc/dpcon.c
+++ b/drivers/bus/fslmc/mc/dpcon.c
@@ -295,6 +295,36 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpcon_set_notification() - Set DPCON notification destination
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCON object
+ * @cfg: Notification parameters
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg)
+{
+ struct dpcon_cmd_set_notification *dpcon_cmd;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCON_CMDID_SET_NOTIFICATION,
+ cmd_flags,
+ token);
+ dpcon_cmd = (struct dpcon_cmd_set_notification *)cmd.params;
+ dpcon_cmd->dpio_id = cpu_to_le32(cfg->dpio_id);
+ dpcon_cmd->priority = cfg->priority;
+ dpcon_cmd->user_ctx = cpu_to_le64(cfg->user_ctx);
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
/**
* dpcon_get_api_version - Get Data Path Concentrator API version
* @mc_io: Pointer to MC portal's DPCON object
diff --git a/drivers/bus/fslmc/mc/dpdmai.c b/drivers/bus/fslmc/mc/dpdmai.c
index 528889df3..dcb9d516a 100644
--- a/drivers/bus/fslmc/mc/dpdmai.c
+++ b/drivers/bus/fslmc/mc/dpdmai.c
@@ -113,6 +113,7 @@ int dpdmai_create(struct fsl_mc_io *mc_io,
cmd_flags,
dprc_token);
cmd_params = (struct dpdmai_cmd_create *)cmd.params;
+ cmd_params->num_queues = cfg->num_queues;
cmd_params->priorities[0] = cfg->priorities[0];
cmd_params->priorities[1] = cfg->priorities[1];
@@ -297,6 +298,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
rsp_params = (struct dpdmai_rsp_get_attr *)cmd.params;
attr->id = le32_to_cpu(rsp_params->id);
attr->num_of_priorities = rsp_params->num_of_priorities;
+ attr->num_of_queues = rsp_params->num_of_queues;
return 0;
}
@@ -306,6 +308,8 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation; use
* DPDMAI_ALL_QUEUES to configure all Rx queues
@@ -317,6 +321,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg)
{
@@ -331,6 +336,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
cmd_params->dest_id = cpu_to_le32(cfg->dest_cfg.dest_id);
cmd_params->dest_priority = cfg->dest_cfg.priority;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
cmd_params->user_ctx = cpu_to_le64(cfg->user_ctx);
cmd_params->options = cpu_to_le32(cfg->options);
dpdmai_set_field(cmd_params->dest_type,
@@ -346,6 +352,8 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Rx queue attributes
@@ -355,6 +363,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr)
{
@@ -369,6 +378,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
@@ -392,6 +402,8 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Tx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Tx queue attributes
@@ -401,6 +413,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr)
{
@@ -415,6 +428,7 @@ int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
diff --git a/drivers/bus/fslmc/mc/dpio.c b/drivers/bus/fslmc/mc/dpio.c
index 966277cc6..a3382ed14 100644
--- a/drivers/bus/fslmc/mc/dpio.c
+++ b/drivers/bus/fslmc/mc/dpio.c
@@ -268,6 +268,15 @@ int dpio_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpio_get_attributes() - Retrieve DPIO attributes
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPIO object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
int dpio_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp.h b/drivers/bus/fslmc/mc/fsl_dpbp.h
index 111836261..9d405b42c 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp.h
@@ -82,6 +82,7 @@ int dpbp_get_attributes(struct fsl_mc_io *mc_io,
/**
* BPSCN write will attempt to allocate into a cache (coherent write)
*/
+#define DPBP_NOTIF_OPT_COHERENT_WRITE 0x00000001
int dpbp_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
index 18402cedf..55c9fc9b4 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
@@ -9,13 +9,15 @@
/* DPBP Version */
#define DPBP_VER_MAJOR 3
-#define DPBP_VER_MINOR 3
+#define DPBP_VER_MINOR 4
/* Command versioning */
#define DPBP_CMD_BASE_VERSION 1
+#define DPBP_CMD_VERSION_2 2
#define DPBP_CMD_ID_OFFSET 4
#define DPBP_CMD(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_BASE_VERSION)
+#define DPBP_CMD_V2(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_VERSION_2)
/* Command IDs */
#define DPBP_CMDID_CLOSE DPBP_CMD(0x800)
@@ -37,8 +39,8 @@
#define DPBP_CMDID_GET_IRQ_STATUS DPBP_CMD(0x016)
#define DPBP_CMDID_CLEAR_IRQ_STATUS DPBP_CMD(0x017)
-#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD(0x1b0)
-#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD(0x1b1)
+#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD_V2(0x1b0)
+#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD_V2(0x1b1)
#define DPBP_CMDID_GET_FREE_BUFFERS_NUM DPBP_CMD(0x1b2)
@@ -68,8 +70,8 @@ struct dpbp_cmd_set_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
@@ -79,8 +81,8 @@ struct dpbp_rsp_get_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
diff --git a/drivers/bus/fslmc/mc/fsl_dpci.h b/drivers/bus/fslmc/mc/fsl_dpci.h
index f69ed3f33..9af9097e5 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci.h
@@ -6,6 +6,8 @@
#ifndef __FSL_DPCI_H
#define __FSL_DPCI_H
+#include <fsl_dpopr.h>
+
/* Data Path Communication Interface API
* Contains initialization APIs and runtime control APIs for DPCI
*/
@@ -17,7 +19,7 @@ struct fsl_mc_io;
/**
* Maximum number of Tx/Rx priorities per DPCI object
*/
-#define DPCI_PRIO_NUM 2
+#define DPCI_PRIO_NUM 4
/**
* Indicates an invalid frame queue
@@ -106,6 +108,27 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpci_attr *attr);
+/**
+ * struct dpci_peer_attr - Structure representing the peer DPCI attributes
+ * @peer_id: DPCI peer id; if no peer is connected returns (-1)
+ * @num_of_priorities: The pper's number of receive priorities; determines the
+ * number of transmit priorities for the local DPCI object
+ */
+struct dpci_peer_attr {
+ int peer_id;
+ uint8_t num_of_priorities;
+};
+
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr);
+
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up);
+
/**
* enum dpci_dest - DPCI destination types
* @DPCI_DEST_NONE: Unassigned destination; The queue is set in parked mode
@@ -153,6 +176,11 @@ struct dpci_dest_cfg {
*/
#define DPCI_QUEUE_OPT_DEST 0x00000002
+/**
+ * Set the queue to hold active mode.
+ */
+#define DPCI_QUEUE_OPT_HOLD_ACTIVE 0x00000004
+
/**
* struct dpci_rx_queue_cfg - Structure representing RX queue configuration
* @options: Flags representing the suggested modifications to the queue;
@@ -163,11 +191,14 @@ struct dpci_dest_cfg {
* 'options'
* @dest_cfg: Queue destination parameters;
* valid only if 'DPCI_QUEUE_OPT_DEST' is contained in 'options'
+ * @order_preservation_en: order preservation configuration for the rx queue
+ * valid only if 'DPCI_QUEUE_OPT_HOLD_ACTIVE' is contained in 'options'
*/
struct dpci_rx_queue_cfg {
uint32_t options;
uint64_t user_ctx;
struct dpci_dest_cfg dest_cfg;
+ int order_preservation_en;
};
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
@@ -217,4 +248,18 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
uint16_t *major_ver,
uint16_t *minor_ver);
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg);
+
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry);
+
#endif /* __FSL_DPCI_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
index 634248ac0..92b85a820 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
@@ -8,7 +8,7 @@
/* DPCI Version */
#define DPCI_VER_MAJOR 3
-#define DPCI_VER_MINOR 3
+#define DPCI_VER_MINOR 4
#define DPCI_CMD_BASE_VERSION 1
#define DPCI_CMD_BASE_VERSION_V2 2
@@ -35,6 +35,8 @@
#define DPCI_CMDID_GET_PEER_ATTR DPCI_CMD_V1(0x0e2)
#define DPCI_CMDID_GET_RX_QUEUE DPCI_CMD_V1(0x0e3)
#define DPCI_CMDID_GET_TX_QUEUE DPCI_CMD_V1(0x0e4)
+#define DPCI_CMDID_SET_OPR DPCI_CMD_V1(0x0e5)
+#define DPCI_CMDID_GET_OPR DPCI_CMD_V1(0x0e6)
/* Macros for accessing command fields smaller than 1byte */
#define DPCI_MASK(field) \
@@ -90,6 +92,8 @@ struct dpci_rsp_get_link_state {
#define DPCI_DEST_TYPE_SHIFT 0
#define DPCI_DEST_TYPE_SIZE 4
+#define DPCI_ORDER_PRESERVATION_SHIFT 4
+#define DPCI_ORDER_PRESERVATION_SIZE 1
struct dpci_cmd_set_rx_queue {
uint32_t dest_id;
@@ -128,5 +132,61 @@ struct dpci_rsp_get_api_version {
uint16_t minor;
};
+struct dpci_cmd_set_opr {
+ uint16_t pad0;
+ uint8_t index;
+ uint8_t options;
+ uint8_t pad1[7];
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+};
+
+struct dpci_cmd_get_opr {
+ uint16_t pad;
+ uint8_t index;
+};
+
+#define DPCI_RIP_SHIFT 0
+#define DPCI_RIP_SIZE 1
+#define DPCI_OPR_ENABLE_SHIFT 1
+#define DPCI_OPR_ENABLE_SIZE 1
+#define DPCI_TSEQ_NLIS_SHIFT 0
+#define DPCI_TSEQ_NLIS_SIZE 1
+#define DPCI_HSEQ_NLIS_SHIFT 0
+#define DPCI_HSEQ_NLIS_SIZE 1
+
+struct dpci_rsp_get_opr {
+ uint64_t pad0;
+ /* from LSB: rip:1 enable:1 */
+ uint8_t flags;
+ uint16_t pad1;
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+ uint16_t nesn;
+ uint16_t pad8;
+ uint16_t ndsn;
+ uint16_t pad2;
+ uint16_t ea_tseq;
+ /* only the LSB */
+ uint8_t tseq_nlis;
+ uint8_t pad3;
+ uint16_t ea_hseq;
+ /* only the LSB */
+ uint8_t hseq_nlis;
+ uint8_t pad4;
+ uint16_t ea_hptr;
+ uint16_t pad5;
+ uint16_t ea_tptr;
+ uint16_t pad6;
+ uint16_t opr_vid;
+ uint16_t pad7;
+ uint16_t opr_id;
+};
#pragma pack(pop)
#endif /* _FSL_DPCI_CMD_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpcon.h b/drivers/bus/fslmc/mc/fsl_dpcon.h
index 36dd5f3c1..fc0430dc1 100644
--- a/drivers/bus/fslmc/mc/fsl_dpcon.h
+++ b/drivers/bus/fslmc/mc/fsl_dpcon.h
@@ -81,6 +81,25 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpcon_attr *attr);
+/**
+ * struct dpcon_notification_cfg - Structure representing notification params
+ * @dpio_id: DPIO object ID; must be configured with a notification channel;
+ * to disable notifications set it to 'DPCON_INVALID_DPIO_ID';
+ * @priority: Priority selection within the DPIO channel; valid values
+ * are 0-7, depending on the number of priorities in that channel
+ * @user_ctx: User context value provided with each CDAN message
+ */
+struct dpcon_notification_cfg {
+ int dpio_id;
+ uint8_t priority;
+ uint64_t user_ctx;
+};
+
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg);
+
int dpcon_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai.h b/drivers/bus/fslmc/mc/fsl_dpdmai.h
index 03e46ec14..40469cc13 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai.h
@@ -39,6 +39,7 @@ int dpdmai_close(struct fsl_mc_io *mc_io,
* should be configured with 0
*/
struct dpdmai_cfg {
+ uint8_t num_queues;
uint8_t priorities[DPDMAI_PRIO_NUM];
};
@@ -78,6 +79,7 @@ int dpdmai_reset(struct fsl_mc_io *mc_io,
struct dpdmai_attr {
int id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
@@ -149,6 +151,7 @@ struct dpdmai_rx_queue_cfg {
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg);
@@ -168,6 +171,7 @@ struct dpdmai_rx_queue_attr {
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr);
@@ -183,6 +187,7 @@ struct dpdmai_tx_queue_attr {
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr);
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
index 618e19eae..7e122de4e 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
@@ -7,30 +7,32 @@
/* DPDMAI Version */
#define DPDMAI_VER_MAJOR 3
-#define DPDMAI_VER_MINOR 2
+#define DPDMAI_VER_MINOR 3
/* Command versioning */
#define DPDMAI_CMD_BASE_VERSION 1
+#define DPDMAI_CMD_VERSION_2 2
#define DPDMAI_CMD_ID_OFFSET 4
#define DPDMAI_CMD(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_BASE_VERSION)
+#define DPDMAI_CMD_V2(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_VERSION_2)
/* Command IDs */
#define DPDMAI_CMDID_CLOSE DPDMAI_CMD(0x800)
#define DPDMAI_CMDID_OPEN DPDMAI_CMD(0x80E)
-#define DPDMAI_CMDID_CREATE DPDMAI_CMD(0x90E)
+#define DPDMAI_CMDID_CREATE DPDMAI_CMD_V2(0x90E)
#define DPDMAI_CMDID_DESTROY DPDMAI_CMD(0x98E)
#define DPDMAI_CMDID_GET_API_VERSION DPDMAI_CMD(0xa0E)
#define DPDMAI_CMDID_ENABLE DPDMAI_CMD(0x002)
#define DPDMAI_CMDID_DISABLE DPDMAI_CMD(0x003)
-#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD(0x004)
+#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD_V2(0x004)
#define DPDMAI_CMDID_RESET DPDMAI_CMD(0x005)
#define DPDMAI_CMDID_IS_ENABLED DPDMAI_CMD(0x006)
-#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD(0x1A0)
-#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD(0x1A1)
-#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD(0x1A2)
+#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD_V2(0x1A0)
+#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD_V2(0x1A1)
+#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD_V2(0x1A2)
/* Macros for accessing command fields smaller than 1byte */
#define DPDMAI_MASK(field) \
@@ -47,7 +49,7 @@ struct dpdmai_cmd_open {
};
struct dpdmai_cmd_create {
- uint8_t pad;
+ uint8_t num_queues;
uint8_t priorities[2];
};
@@ -66,6 +68,7 @@ struct dpdmai_rsp_is_enabled {
struct dpdmai_rsp_get_attr {
uint32_t id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
#define DPDMAI_DEST_TYPE_SHIFT 0
@@ -77,7 +80,7 @@ struct dpdmai_cmd_set_rx_queue {
uint8_t priority;
/* from LSB: dest_type:4 */
uint8_t dest_type;
- uint8_t pad;
+ uint8_t queue_idx;
uint64_t user_ctx;
uint32_t options;
};
@@ -85,6 +88,7 @@ struct dpdmai_cmd_set_rx_queue {
struct dpdmai_cmd_get_queue {
uint8_t pad[5];
uint8_t priority;
+ uint8_t queue_idx;
};
struct dpdmai_rsp_get_rx_queue {
diff --git a/drivers/bus/fslmc/mc/fsl_dpmng.h b/drivers/bus/fslmc/mc/fsl_dpmng.h
index afaf9b711..8559bef87 100644
--- a/drivers/bus/fslmc/mc/fsl_dpmng.h
+++ b/drivers/bus/fslmc/mc/fsl_dpmng.h
@@ -18,7 +18,7 @@ struct fsl_mc_io;
* Management Complex firmware version information
*/
#define MC_VER_MAJOR 10
-#define MC_VER_MINOR 3
+#define MC_VER_MINOR 10
/**
* struct mc_version
diff --git a/drivers/bus/fslmc/mc/fsl_dpopr.h b/drivers/bus/fslmc/mc/fsl_dpopr.h
new file mode 100644
index 000000000..fd727e011
--- /dev/null
+++ b/drivers/bus/fslmc/mc/fsl_dpopr.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
+ *
+ * Copyright 2013-2015 Freescale Semiconductor Inc.
+ * Copyright 2018 NXP
+ *
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/** @addtogroup dpopr Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ * @{
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ * 0 - Window size is 32 frames.
+ * 1 - Window size is 64 frames.
+ * 2 - Window size is 128 frames.
+ * 3 - Window size is 256 frames.
+ * 4 - Window size is 512 frames.
+ * 5 - Window size is 1024 frames.
+ *@oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ *@olws: OPR acceptable late arrival window size (0 to 3)
+ * 0 - Disabled. Late arrivals are always rejected.
+ * 1 - Window size is 32 frames.
+ * 2 - Window size is the same as the OPR restoration
+ * window size configured in the OPRRWS field.
+ * 3 - Window size is 8192 frames.
+ * Late arrivals are always accepted.
+ *@oeane: Order restoration list (ORL) resource exhaustion
+ * advance NESN enable (0 disabled, 1 enabled)
+ *@oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+ uint8_t oprrws;
+ uint8_t oa;
+ uint8_t olws;
+ uint8_t oeane;
+ uint8_t oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+ char enable;
+ char rip;
+ uint16_t ndsn;
+ uint16_t nesn;
+ uint16_t ea_hseq;
+ char hseq_nlis;
+ uint16_t ea_tseq;
+ char tseq_nlis;
+ uint16_t ea_tptr;
+ uint16_t ea_hptr;
+ uint16_t opr_id;
+ uint16_t opr_vid;
+};
+
+#endif /* __FSL_DPOPR_H_ */
diff --git a/drivers/bus/fslmc/meson.build b/drivers/bus/fslmc/meson.build
index 22a56a6fc..54ca92d0c 100644
--- a/drivers/bus/fslmc/meson.build
+++ b/drivers/bus/fslmc/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index b4a881704..8717373dd 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -117,3 +117,13 @@ DPDK_18.05 {
rte_dpaa2_memsegs;
} DPDK_18.02;
+
+DPDK_18.11 {
+ global:
+
+ dpci_get_link_state;
+ dpci_get_opr;
+ dpci_get_peer_attributes;
+ dpci_set_opr;
+
+} DPDK_18.05;
diff --git a/drivers/crypto/dpaa2_sec/Makefile b/drivers/crypto/dpaa2_sec/Makefile
index da3d8f84f..a61be49db 100644
--- a/drivers/crypto/dpaa2_sec/Makefile
+++ b/drivers/crypto/dpaa2_sec/Makefile
@@ -41,7 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_sec_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# library source files
SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC) += dpaa2_sec_dpseci.c
diff --git a/drivers/crypto/dpaa2_sec/meson.build b/drivers/crypto/dpaa2_sec/meson.build
index 01afc5877..8fa4827ed 100644
--- a/drivers/crypto/dpaa2_sec/meson.build
+++ b/drivers/crypto/dpaa2_sec/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/event/dpaa2/Makefile b/drivers/event/dpaa2/Makefile
index 5e1a63200..3f85dd2be 100644
--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc
# versioning export map
EXPORT_MAP := rte_pmd_dpaa2_event_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/event/dpaa2/meson.build b/drivers/event/dpaa2/meson.build
index de7a46155..c46b39e9d 100644
--- a/drivers/event/dpaa2/meson.build
+++ b/drivers/event/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/mempool/dpaa2/Makefile b/drivers/mempool/dpaa2/Makefile
index 9e4c87d79..4996a2cd1 100644
--- a/drivers/mempool/dpaa2/Makefile
+++ b/drivers/mempool/dpaa2/Makefile
@@ -19,7 +19,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_mempool_dpaa2_version.map
# Lbrary version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/mempool/dpaa2/meson.build b/drivers/mempool/dpaa2/meson.build
index 90bab6069..6b6ead617 100644
--- a/drivers/mempool/dpaa2/meson.build
+++ b/drivers/mempool/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
index 9b0b14331..1d46f7f25 100644
--- a/drivers/net/dpaa2/Makefile
+++ b/drivers/net/dpaa2/Makefile
@@ -25,7 +25,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/net/dpaa2/meson.build b/drivers/net/dpaa2/meson.build
index 213f0d72f..b34595258 100644
--- a/drivers/net/dpaa2/meson.build
+++ b/drivers/net/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/raw/dpaa2_cmdif/Makefile b/drivers/raw/dpaa2_cmdif/Makefile
index 9b863dda2..0dbe5c821 100644
--- a/drivers/raw/dpaa2_cmdif/Makefile
+++ b/drivers/raw/dpaa2_cmdif/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_rawdev
EXPORT_MAP := rte_pmd_dpaa2_cmdif_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_cmdif/meson.build b/drivers/raw/dpaa2_cmdif/meson.build
index 1d146872e..37bb24a1b 100644
--- a/drivers/raw/dpaa2_cmdif/meson.build
+++ b/drivers/raw/dpaa2_cmdif/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'bus_vdev']
sources = files('dpaa2_cmdif.c')
diff --git a/drivers/raw/dpaa2_qdma/Makefile b/drivers/raw/dpaa2_qdma/Makefile
index d88809ead..645220772 100644
--- a/drivers/raw/dpaa2_qdma/Makefile
+++ b/drivers/raw/dpaa2_qdma/Makefile
@@ -25,7 +25,7 @@ LDLIBS += -lrte_ring
EXPORT_MAP := rte_pmd_dpaa2_qdma_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index 2787d3028..44503331e 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -805,7 +805,7 @@ dpaa2_dpdmai_dev_uninit(struct rte_rawdev *rawdev)
DPAA2_QDMA_ERR("dmdmai disable failed");
/* Set up the DQRR storage for Rx */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq = &(dpdmai_dev->rx_queue[i]);
if (rxq->q_storage) {
@@ -856,17 +856,17 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
ret);
goto init_err;
}
- dpdmai_dev->num_queues = attr.num_of_priorities;
+ dpdmai_dev->num_queues = attr.num_of_queues;
/* Set up Rx Queues */
- for (i = 0; i < attr.num_of_priorities; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq;
memset(&rx_queue_cfg, 0, sizeof(struct dpdmai_rx_queue_cfg));
ret = dpdmai_set_rx_queue(&dpdmai_dev->dpdmai,
CMD_PRI_LOW,
dpdmai_dev->token,
- i, &rx_queue_cfg);
+ i, 0, &rx_queue_cfg);
if (ret) {
DPAA2_QDMA_ERR("Setting Rx queue failed with err: %d",
ret);
@@ -893,9 +893,9 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
}
/* Get Rx and Tx queues FQID's */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
ret = dpdmai_get_rx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &rx_attr);
+ dpdmai_dev->token, i, 0, &rx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
@@ -904,7 +904,7 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
dpdmai_dev->rx_queue[i].fqid = rx_attr.fqid;
ret = dpdmai_get_tx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &tx_attr);
+ dpdmai_dev->token, i, 0, &tx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
index c6a057806..0cbe90255 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
@@ -11,6 +11,8 @@ struct qdma_io_meta;
#define DPAA2_QDMA_MAX_FLE 3
#define DPAA2_QDMA_MAX_SDD 2
+#define DPAA2_DPDMAI_MAX_QUEUES 8
+
/** FLE pool size: 3 Frame list + 2 source/destination descriptor */
#define QDMA_FLE_POOL_SIZE (sizeof(struct qdma_io_meta) + \
sizeof(struct qbman_fle) * DPAA2_QDMA_MAX_FLE + \
@@ -142,9 +144,9 @@ struct dpaa2_dpdmai_dev {
/** Number of queue in this DPDMAI device */
uint8_t num_queues;
/** RX queues */
- struct dpaa2_queue rx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue rx_queue[DPAA2_DPDMAI_MAX_QUEUES];
/** TX queues */
- struct dpaa2_queue tx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue tx_queue[DPAA2_DPDMAI_MAX_QUEUES];
};
#endif /* __DPAA2_QDMA_H__ */
diff --git a/drivers/raw/dpaa2_qdma/meson.build b/drivers/raw/dpaa2_qdma/meson.build
index b6a081f11..2a4b69c16 100644
--- a/drivers/raw/dpaa2_qdma/meson.build
+++ b/drivers/raw/dpaa2_qdma/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'ring']
sources = files('dpaa2_qdma.c')
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v6 00/21] Support externally allocated memory in DPDK
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-27 10:40 2% ` Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (4 more replies)
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
4 siblings, 5 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 305 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 14 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 316 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
47 files changed, 1913 insertions(+), 139 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 10:41 4% ` Anatoly Burakov
2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
` (2 preceding siblings ...)
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 10:41 9% ` Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1020,6 +1019,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
` (3 preceding siblings ...)
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-27 10:41 4% ` Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-09-27 10:40 16% ` Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-29 0:09 0% ` Yongseok Koh
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
4 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 134 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-29 0:09 0% ` Yongseok Koh
1 sibling, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:03 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>
Specifically for bus/fslmc perspective and generically for others:
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:03 0% ` Shreyansh Jain
@ 2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-27 11:12 0% ` Shreyansh Jain
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:08 UTC (permalink / raw)
To: Shreyansh Jain, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes. This also breaks a few
>> internal assumptions about memory contiguousness, so adjust
>> malloc code in a few places.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>
>
> Specifically for bus/fslmc perspective and generically for others:
>
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>
>
Actually, this patch may need some further adjustment, since it makes
assumption about not wanting to map external memory for DMA.
Specifically - there's an fslmc dma map function that now skips external
memory segments. Are you sure that's how it's supposed to be?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:08 0% ` Burakov, Anatoly
@ 2018-09-27 11:12 0% ` Shreyansh Jain
2018-09-27 11:29 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:12 UTC (permalink / raw)
To: Burakov, Anatoly, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>> When we allocate and use DPDK memory, we need to be able to
>>> differentiate between DPDK hugepage segments and segments that
>>> were made part of DPDK but are externally allocated. Add such
>>> a property to memseg lists.
>>>
>>> This breaks the ABI, so bump the EAL library ABI version and
>>> document the change in release notes. This also breaks a few
>>> internal assumptions about memory contiguousness, so adjust
>>> malloc code in a few places.
>>>
>>> All current calls for memseg walk functions were adjusted to
>>> ignore external segments where it made sense.
>>>
>>> Mempools is a special case, because we may be asked to allocate
>>> a mempool on a specific socket, and we need to ignore all page
>>> sizes on other heaps or other sockets. Previously, this
>>> assumption of knowing all page sizes was not a problem, but it
>>> will be now, so we have to match socket ID with page size when
>>> calculating minimum page size for a mempool.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> ---
>>>
>>
>> Specifically for bus/fslmc perspective and generically for others:
>>
>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>>
>
> Actually, this patch may need some further adjustment, since it makes
> assumption about not wanting to map external memory for DMA.
>
> Specifically - there's an fslmc dma map function that now skips external
> memory segments. Are you sure that's how it's supposed to be?
>
I thought over that.
For now yes. If we need to map external memory, and there is an event
that would be called back, it should be handled separately. So, for
example, a PMD level API to handle such requests from applications.
The point is that how the external memory is handled is use-case
specific - the need to have its events reported back is definitely
there, but its handling is still a grey area.
Once the patches make their way in, I can always come back and tune that.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:12 0% ` Shreyansh Jain
@ 2018-09-27 11:29 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:29 UTC (permalink / raw)
To: Shreyansh Jain, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On 27-Sep-18 12:12 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
>> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>>> When we allocate and use DPDK memory, we need to be able to
>>>> differentiate between DPDK hugepage segments and segments that
>>>> were made part of DPDK but are externally allocated. Add such
>>>> a property to memseg lists.
>>>>
>>>> This breaks the ABI, so bump the EAL library ABI version and
>>>> document the change in release notes. This also breaks a few
>>>> internal assumptions about memory contiguousness, so adjust
>>>> malloc code in a few places.
>>>>
>>>> All current calls for memseg walk functions were adjusted to
>>>> ignore external segments where it made sense.
>>>>
>>>> Mempools is a special case, because we may be asked to allocate
>>>> a mempool on a specific socket, and we need to ignore all page
>>>> sizes on other heaps or other sockets. Previously, this
>>>> assumption of knowing all page sizes was not a problem, but it
>>>> will be now, so we have to match socket ID with page size when
>>>> calculating minimum page size for a mempool.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>
>>>
>>> Specifically for bus/fslmc perspective and generically for others:
>>>
>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>>
>>>
>>
>> Actually, this patch may need some further adjustment, since it makes
>> assumption about not wanting to map external memory for DMA.
>>
>> Specifically - there's an fslmc dma map function that now skips
>> external memory segments. Are you sure that's how it's supposed to be?
>>
>
> I thought over that.
> For now yes. If we need to map external memory, and there is an event
> that would be called back, it should be handled separately. So, for
> example, a PMD level API to handle such requests from applications.
Well, technically such an event is already available, now that external
memory allocations trigger mem events :)
>
> The point is that how the external memory is handled is use-case
> specific - the need to have its events reported back is definitely
> there, but its handling is still a grey area.
>
> Once the patches make their way in, I can always come back and tune that.
>
OK, fair enough.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 13:21 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:14 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:
> We will be assigning "invalid" socket ID's to external heap, and
> malloc will now be able to verify if a supplied socket ID is in
> fact a valid one, rendering parameter checks for sockets
> obsolete.
>
> This changes the semantics of what we understand by "socket ID",
> so document the change in the release notes.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> lib/librte_eal/common/malloc_heap.c | 2 +-
> lib/librte_eal/common/rte_malloc.c | 4 ----
> 4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 5fc71e208..6ee236302 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -98,6 +98,13 @@ API Changes
> users of memseg-walk-related functions, as they will now have to skip
> externally allocated segments in most cases if the intent is to only
> iterate
> over internal DPDK memory.
> + - ``socket_id`` parameter across the entire DPDK has gained additional
> + meaning, as some socket ID's will now be representing externally
> allocated
> + memory. No changes will be required for existing code as backwards
> + compatibility will be kept, and those who do not use this feature
> will not
> + see these extra socket ID's. Any new API's must not check socket ID
> + parameters themselves, and must instead leave it to the memory
> subsystem to
> + decide whether socket ID is a valid one.
>
> ABI Changes
> -----------
> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> b/lib/librte_eal/common/eal_common_memzone.c
> index 7300fe05d..b7081afbf 100644
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> *name, size_t len,
> return NULL;
> }
>
> - if ((socket_id != SOCKET_ID_ANY) &&
> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>
Should not it be better to use RTE_MAX_HEAP instead of removing the check?
> rte_errno = EINVAL;
> return NULL;
> }
>
> - if (!rte_eal_has_hugepages())
> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
> + * external heap.
> + */
> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> socket_id = SOCKET_ID_ANY;
>
> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 1d1e35708..73e478076 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
> if (size == 0 || (align && !rte_is_power_of_2(align)))
> return NULL;
>
> - if (!rte_eal_has_hugepages())
> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> socket_arg = SOCKET_ID_ANY;
>
> if (socket_arg == SOCKET_ID_ANY)
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 73d6df31d..9ba1472c3 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> unsigned int align,
> if (!rte_eal_has_hugepages())
> socket_arg = SOCKET_ID_ANY;
>
> - /* Check socket parameter */
> - if (socket_arg >= RTE_MAX_NUMA_NODES)
> - return NULL;
> -
>
Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
> return malloc_heap_alloc(type, size, socket_arg, 0,
> align == 0 ? 1 : align, 0, false);
> }
> --
> 2.17.1
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:14 0% ` Alejandro Lucero
@ 2018-09-27 13:21 0% ` Burakov, Anatoly
2018-09-27 13:42 0% ` Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 13:21 UTC (permalink / raw)
To: Alejandro Lucero
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
>
>> We will be assigning "invalid" socket ID's to external heap, and
>> malloc will now be able to verify if a supplied socket ID is in
>> fact a valid one, rendering parameter checks for sockets
>> obsolete.
>>
>> This changes the semantics of what we understand by "socket ID",
>> so document the change in the release notes.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
>> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>> lib/librte_eal/common/malloc_heap.c | 2 +-
>> lib/librte_eal/common/rte_malloc.c | 4 ----
>> 4 files changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_11.rst
>> b/doc/guides/rel_notes/release_18_11.rst
>> index 5fc71e208..6ee236302 100644
>> --- a/doc/guides/rel_notes/release_18_11.rst
>> +++ b/doc/guides/rel_notes/release_18_11.rst
>> @@ -98,6 +98,13 @@ API Changes
>> users of memseg-walk-related functions, as they will now have to skip
>> externally allocated segments in most cases if the intent is to only
>> iterate
>> over internal DPDK memory.
>> + - ``socket_id`` parameter across the entire DPDK has gained additional
>> + meaning, as some socket ID's will now be representing externally
>> allocated
>> + memory. No changes will be required for existing code as backwards
>> + compatibility will be kept, and those who do not use this feature
>> will not
>> + see these extra socket ID's. Any new API's must not check socket ID
>> + parameters themselves, and must instead leave it to the memory
>> subsystem to
>> + decide whether socket ID is a valid one.
>>
>> ABI Changes
>> -----------
>> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>> b/lib/librte_eal/common/eal_common_memzone.c
>> index 7300fe05d..b7081afbf 100644
>> --- a/lib/librte_eal/common/eal_common_memzone.c
>> +++ b/lib/librte_eal/common/eal_common_memzone.c
>> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
>> *name, size_t len,
>> return NULL;
>> }
>>
>> - if ((socket_id != SOCKET_ID_ANY) &&
>> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>>
>
> Should not it be better to use RTE_MAX_HEAP instead of removing the check?
First of all, maximum number of heaps should not concern the rest of the
code - this is purely internal detail of rte_malloc.
More importantly, socket ID is completely independent from number of
heaps. Socket ID is incremented each time a new heap is created, and
they are not reused. If you create and destroy a heap 100 times - you'll
get 100 different socket ID's, even though max number of heaps is less
than that.
>
>
>
>> rte_errno = EINVAL;
>> return NULL;
>> }
>>
>> - if (!rte_eal_has_hugepages())
>> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
>> + * external heap.
>> + */
>> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>> socket_id = SOCKET_ID_ANY;
>>
>> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
>> diff --git a/lib/librte_eal/common/malloc_heap.c
>> b/lib/librte_eal/common/malloc_heap.c
>> index 1d1e35708..73e478076 100644
>> --- a/lib/librte_eal/common/malloc_heap.c
>> +++ b/lib/librte_eal/common/malloc_heap.c
>> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
>> socket_arg,
>> if (size == 0 || (align && !rte_is_power_of_2(align)))
>> return NULL;
>>
>> - if (!rte_eal_has_hugepages())
>> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>> socket_arg = SOCKET_ID_ANY;
>>
>> if (socket_arg == SOCKET_ID_ANY)
>> diff --git a/lib/librte_eal/common/rte_malloc.c
>> b/lib/librte_eal/common/rte_malloc.c
>> index 73d6df31d..9ba1472c3 100644
>> --- a/lib/librte_eal/common/rte_malloc.c
>> +++ b/lib/librte_eal/common/rte_malloc.c
>> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
>> unsigned int align,
>> if (!rte_eal_has_hugepages())
>> socket_arg = SOCKET_ID_ANY;
>>
>> - /* Check socket parameter */
>> - if (socket_arg >= RTE_MAX_NUMA_NODES)
>> - return NULL;
>> -
>>
>
> Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
same as above :)
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:21 0% ` Burakov, Anatoly
@ 2018-09-27 13:42 0% ` Alejandro Lucero
2018-09-27 14:04 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:42 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <
> anatoly.burakov@intel.com>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> >> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >> lib/librte_eal/common/malloc_heap.c | 2 +-
> >> lib/librte_eal/common/rte_malloc.c | 4 ----
> >> 4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >> users of memseg-walk-related functions, as they will now have to
> skip
> >> externally allocated segments in most cases if the intent is to
> only
> >> iterate
> >> over internal DPDK memory.
> >> + - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> + meaning, as some socket ID's will now be representing externally
> >> allocated
> >> + memory. No changes will be required for existing code as backwards
> >> + compatibility will be kept, and those who do not use this feature
> >> will not
> >> + see these extra socket ID's. Any new API's must not check socket ID
> >> + parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> + decide whether socket ID is a valid one.
> >>
> >> ABI Changes
> >> -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >> return NULL;
> >> }
> >>
> >> - if ((socket_id != SOCKET_ID_ANY) &&
> >> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing the
> check?
>
> First of all, maximum number of heaps should not concern the rest of the
> code - this is purely internal detail of rte_malloc.
>
>
In a previous patch you say that:
"Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation."
If I understand this right, heaps linked to physical sockets get a heap ID,
and then external heaps will get IDs starting from the higher socket/heap
ID + 1.
So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
(which does not seem so reading your next paragraph), it would be a good
sanity check to use RTE_MAX_HEAPS for the socket id.
More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times - you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
I do not understand this. It is true there is no check regarding
RTE_MAX_HEAPS when creating new heaps, then nor sure what the limit refers
to. And then there is code like dumping heaps info or getting info from the
heap based on socket id that will not work.
> >
> >
> >
> >> rte_errno = EINVAL;
> >> return NULL;
> >> }
> >>
> >> - if (!rte_eal_has_hugepages())
> >> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for
> an
> >> + * external heap.
> >> + */
> >> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> >> socket_id = SOCKET_ID_ANY;
> >>
> >> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> >> diff --git a/lib/librte_eal/common/malloc_heap.c
> >> b/lib/librte_eal/common/malloc_heap.c
> >> index 1d1e35708..73e478076 100644
> >> --- a/lib/librte_eal/common/malloc_heap.c
> >> +++ b/lib/librte_eal/common/malloc_heap.c
> >> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> >> socket_arg,
> >> if (size == 0 || (align && !rte_is_power_of_2(align)))
> >> return NULL;
> >>
> >> - if (!rte_eal_has_hugepages())
> >> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> >> socket_arg = SOCKET_ID_ANY;
> >>
> >> if (socket_arg == SOCKET_ID_ANY)
> >> diff --git a/lib/librte_eal/common/rte_malloc.c
> >> b/lib/librte_eal/common/rte_malloc.c
> >> index 73d6df31d..9ba1472c3 100644
> >> --- a/lib/librte_eal/common/rte_malloc.c
> >> +++ b/lib/librte_eal/common/rte_malloc.c
> >> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> >> unsigned int align,
> >> if (!rte_eal_has_hugepages())
> >> socket_arg = SOCKET_ID_ANY;
> >>
> >> - /* Check socket parameter */
> >> - if (socket_arg >= RTE_MAX_NUMA_NODES)
> >> - return NULL;
> >> -
> >>
> >
> > Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
>
> same as above :)
>
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:42 0% ` Alejandro Lucero
@ 2018-09-27 14:04 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 14:04 UTC (permalink / raw)
To: Alejandro Lucero
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On 27-Sep-18 2:42 PM, Alejandro Lucero wrote:
>
>
> On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>
> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com
> <mailto:anatoly.burakov@intel.com>>
> >> ---
> >> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> >> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >> lib/librte_eal/common/malloc_heap.c | 2 +-
> >> lib/librte_eal/common/rte_malloc.c | 4 ----
> >> 4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >> users of memseg-walk-related functions, as they will now
> have to skip
> >> externally allocated segments in most cases if the intent
> is to only
> >> iterate
> >> over internal DPDK memory.
> >> + - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> + meaning, as some socket ID's will now be representing
> externally
> >> allocated
> >> + memory. No changes will be required for existing code as
> backwards
> >> + compatibility will be kept, and those who do not use this
> feature
> >> will not
> >> + see these extra socket ID's. Any new API's must not check
> socket ID
> >> + parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> + decide whether socket ID is a valid one.
> >>
> >> ABI Changes
> >> -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@
> memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >> return NULL;
> >> }
> >>
> >> - if ((socket_id != SOCKET_ID_ANY) &&
> >> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing
> the check?
>
> First of all, maximum number of heaps should not concern the rest of
> the
> code - this is purely internal detail of rte_malloc.
>
>
> In a previous patch you say that:
>
> "Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list. Heap ID for
> external heaps will be order of their creation."
>
> If I understand this right, heaps linked to physical sockets get a heap
> ID, and then external heaps will get IDs starting from the higher
> socket/heap ID + 1.
Yes and no.
Socket ID is an externally visible identification of "where to allocate
from" (a heap). Heap ID is used internally. Normally, there is a 1:1
correspondence of NUMA node to heap ID, but there may be cases where
e.g. only NUMA nodes 0 and 7 are detected, so you'll have socket 0 and 7
as valid socket ID's. However, these socket ID's will be internally
resolved into heap ID's 0 and 1, not 0 and 7.
So, in *most* cases, socket ID for an internal heap is equivalent to its
heap ID, but it is by accident. Heap ID is an internal identifier used
by the malloc heap, and it is not visible externally - it is only known
to malloc itself. Even memzone knows nothing about heap ID's - only
socket ID's.
> So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
> (which does not seem so reading your next paragraph), it would be a good
> sanity check to use RTE_MAX_HEAPS for the socket id.
>
> More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times -
> you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
> I do not understand this. It is true there is no check regarding
> RTE_MAX_HEAPS when creating new heaps,
There is one :) RTE_MAX_HEAPS is length of malloc heaps array (shared in
memory). If we cannot find a vacant spot in heaps array, the heap will
not be created.
However, *socket ID* is indeed limited only to INT_MAX. Socket ID is not
heap ID - socket ID is an externally visible identifier. Multiple socket
ID's can resolve to the same heap ID.
For example, if you create and destroy a heap 5 times one after the
other, you'll get 5 different socket ID's, but all of them would have
pointed to the same heap ID (but not at the same time).
So, semantically speaking, heap ID isn't really "an ID" as such, it's an
index into heap array. Unlike socket ID, it has no meaning.
> then nor sure what the limit
> refers to. And then there is code like dumping heaps info or getting
> info from the heap based on socket id that will not work.
It is probably unclear because the ordering of this patchset is not
ideal (and i'm not sure how to make it any better).
The code for dumping or getting heap info's accepts socket ID, but it
translates it into heap ID, because that's what malloc uses internally
to differentiate between the heaps. Heap ID is there to break dependency
between NUMA node ID and position in the malloc heap array.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
@ 2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
2018-09-30 23:05 0% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Wang, Yipeng1 @ 2018-09-28 1:00 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd
Reply inlined:
>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>Reader-writer concurrency issue, caused by moving the keys
>to their alternative locations during key insert, is solved
>by introducing a global counter(tbl_chng_cnt) indicating a
>change in table.
>
>@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> curr_bkt = curr_node->bkt;
> }
>
>+ /* Inform the previous move. The current move need
>+ * not be informed now as the current bucket entry
>+ * is present in both primary and secondary.
>+ * Since there is one writer, load acquires on
>+ * tbl_chng_cnt are not required.
>+ */
>+ __atomic_store_n(&h->tbl_chng_cnt,
>+ h->tbl_chng_cnt + 1,
>+ __ATOMIC_RELEASE);
>+ /* The stores to sig_alt and sig_current should not
>+ * move above the store to tbl_chng_cnt.
>+ */
>+ __atomic_thread_fence(__ATOMIC_RELEASE);
>+
[Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
we need macros for the compile time check.
>@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> uint32_t bucket_idx;
> hash_sig_t alt_hash;
> struct rte_hash_bucket *bkt;
>+ uint32_t cnt_b, cnt_a;
> int ret;
>
>- bucket_idx = sig & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>-
> __hash_rw_reader_lock(h);
>
>- /* Check if key is in primary location */
>- ret = search_one_bucket(h, key, sig, data, bkt);
>- if (ret != -1) {
>- __hash_rw_reader_unlock(h);
>- return ret;
>- }
>- /* Calculate secondary hash */
>- alt_hash = rte_hash_secondary_hash(sig);
>- bucket_idx = alt_hash & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>+ do {
[Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
MemCache with Dumber Caching and Smarter Hashing"
as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
but one difference is reader checks even/odd of the version counter to make sure there is no
concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>--- a/lib/librte_hash/rte_hash.h
>+++ b/lib/librte_hash/rte_hash.h
>@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int
>-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
>+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
>
> /**
> * Add a key-value pair with a pre-computed hash value
>@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
>+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> hash_sig_t sig, void *data);
>
> /**
>@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key(const struct rte_hash *h, const void *key);
>+rte_hash_add_key(struct rte_hash *h, const void *key);
>
> /**
> * Add a key to an existing hash table.
>@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
>
> /
I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process
@ 2018-09-28 4:23 1% ` Qi Zhang
2018-09-28 4:23 2% ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
1 sibling, 1 reply; 200+ results
From: Qi Zhang @ 2018-09-28 4:23 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
v16:
- rebase to patch "simplify parameters of hotplug functions"
http://patchwork.dpdk.org/patch/45463/ include:
* keep rte_eal_hotplug_add/rte_eal_hotplug_move unchanged.
* the IPC sync logic is moved to rte_dev_probe/rte_dev_remove.
* simplify the IPC message by removing busname and devname from
eal_dev_mp_req, since devargs string will encode those information
already.
- combined release notes with related code changes.
- replace do_ prefix to local_ for local process only probe/remove function.
- improve comments
v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.
v14:
- rebase.
- All changes belongs to patch 1/6.
1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
since it is only used by secondary process.
2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
I think its better to call rte_eth_dev_release_port_secondary after
dev_uninit since it is possible that secondary process need to release
some local resources in dev_uninit before release the port and return.
Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
because there is no special handle in all exist dev_uninit for secondary
process.
3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
general step, so we don't need patches for i40e and ixgbe.
4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
when detaching a device. It is possible that device is not synced during
some situation, so attach an exist device in primary still need to sync
with secondary. Also, it's not necessary to rollback if we fail to
attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.
v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
so, modify the sample code to use rte_eal_hotplug_add and
rte_eal_hotplug_remove to attach/detach device.
v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.
v11: - move out common code from pci_vfio_unmap_secondary and
pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
to patch 5/19 to improve code readability.
v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
all processes now, it is not necessary to support private vdev for
a secondary process which is identified by a not-NULL devargs in
"--vdev". So re-work on all vdev driver changes to simpified device
probe scenario on a secondary process, devargs will be ignored on
secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.
v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
1) The device can't be detached clearly in a secondary process, which
also cause it can't be attached again, due to the error that
/dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
2) repeat detach/attach device will cause "cannot find TAILQ entry
for PCI device" due to incorrect PCI address compare.
(see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue
v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.
NOTE:
Some issues which is not related with this patchset is expected when
play with hotplug_mp sample as belows.
- Attach a PCI device twice may cause device can't be detached
below fix is required:
https://patches.dpdk.org/patch/42030/
- ixgbe device can't detached, below fix is required
https://patches.dpdk.org/patch/42031/
v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.
NOTE: current patchset depends on below IPC fix, or it may not be able
to attach a shared vdev.
https://patches.dpdk.org/patch/41647/
v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
secondary process, so we don't need to patch on PMD if PMD use the
default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
thread.
v5:
- since we will keep mp thread separate from interrupt thread,
it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
way to prevent rte_eth_dev_release_port be called after
rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
previous re-work
- fix the issue when the first attached device is a private device from
secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
an open and in discussion as below.
https://mails.dpdk.org/archives/dev/2018-June/105359.html
v4:
- since mp thread will be merged to interrupt thread, the fix on v3
for sync IPC deadlock will not work. the new version enable the
machanism to invoke a mp action callback in a temporary thread to
avoid the IPC deadlock, with this, secondary to primary request
impelemtation also be simplified, since we can use sync request
directly in a separate thread.
v3:
- enable mp init callback register to help non-eal module to initialize
mp channel during rte_eal_init
- fix when attach share device from secondary.
1) dead lock due to sync IPC be invoked in rte_malloc in primary
process when handle secondary request to attach device, the
solution is primary process to issue share device attach/detach
in interrupt thread.
2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
1) rename sample to "hotplug_mp", and move to example/multi-process.
2) cleanup header include.
3) call rte_eal_cleanup before exit.
v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling
- fix meson.build.
- improve comments.
Background:
===========
Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.
While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.
Solution:
=========
So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
Scenario for Case 1, 2:
attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed
Scenario for case 3, 4:
attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
goto i).
c) primary forward attach request to all secondary as async request
(because this in mp thread context, use sync request will deadlock,
same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
detach device from secondary:
a) secondary send async request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and perform pre-detach check, if device
is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.
APIs chenages:
==============
scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.
PMD Impact:
===========
Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required
Example:
========
The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.
/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto
Command Line Example:
>help
>list
/* attach a pci device */
> attach 0000:81:00.0
/* detach the pci device */
> detach 0000:81:00.0
/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0
/* detach the vdev af_packet device */
> detach net_af_packet
Qi Zhang (6):
ethdev: add function to release port in secondary process
eal: enable hotplug on multi-process
eal: support attach or detach share device from secondary
drivers/net: enable hotplug on secondary process
drivers/net: enable device detach on secondary
examples/multi_process: add hotplug sample
doc/guides/rel_notes/release_18_11.rst | 11 +
drivers/net/af_packet/rte_eth_af_packet.c | 6 +-
drivers/net/bnxt/bnxt_ethdev.c | 6 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 6 +-
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/kni/rte_eth_kni.c | 6 +-
drivers/net/liquidio/lio_ethdev.c | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +
drivers/net/pcap/rte_eth_pcap.c | 6 +-
drivers/net/tap/rte_eth_tap.c | 8 +-
drivers/net/vhost/rte_eth_vhost.c | 6 +-
drivers/net/virtio/virtio_ethdev.c | 2 +-
examples/multi_process/Makefile | 1 +
examples/multi_process/hotplug_mp/Makefile | 23 ++
examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++
examples/multi_process/hotplug_mp/commands.h | 10 +
examples/multi_process/hotplug_mp/main.c | 41 +++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 225 +++++++++++++-
lib/librte_eal/common/eal_private.h | 30 ++
lib/librte_eal/common/hotplug_mp.c | 426 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++
lib/librte_eal/common/include/rte_dev.h | 9 +
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
lib/librte_ethdev/rte_ethdev.c | 17 +-
lib/librte_ethdev/rte_ethdev_driver.h | 16 +-
lib/librte_ethdev/rte_ethdev_pci.h | 10 +-
lib/librte_ethdev/rte_ethdev_version.map | 7 +
31 files changed, 1126 insertions(+), 33 deletions(-)
create mode 100644 examples/multi_process/hotplug_mp/Makefile
create mode 100644 examples/multi_process/hotplug_mp/commands.c
create mode 100644 examples/multi_process/hotplug_mp/commands.h
create mode 100644 examples/multi_process/hotplug_mp/main.c
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
--
2.13.6
^ permalink raw reply [relevance 1%]
* [dpdk-dev] [PATCH v16 2/6] eal: enable hotplug on multi-process
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
@ 2018-09-28 4:23 2% ` Qi Zhang
0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-09-28 4:23 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.
IPC scenario for Case 1, 2:
attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 11 ++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 225 ++++++++++++++++++++++++++++++--
lib/librte_eal/common/eal_private.h | 30 +++++
lib/librte_eal/common/hotplug_mp.c | 221 +++++++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++++++
lib/librte_eal/common/include/rte_dev.h | 9 ++
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
10 files changed, 542 insertions(+), 9 deletions(-)
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..f88910c7f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,12 @@ New Features
SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
vdev_netvsc, tap, and failsafe drivers combination.
+* **Support device multi-process hotplug.**
+
+ Hotplug and hot-unplug for devices will now be supported in multiprocessing
+ scenario. Any ethdev devices created in the primary process will be regarded
+ as shared and will be available for all DPDK processes. Synchronization
+ between processes will be done using DPDK IPC.
API Changes
-----------
@@ -91,6 +97,11 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+ In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+ that device be attached on all processes, while ``rte_eal_hotplug_remove``
+ will guarantee device be detached on all processes.
ABI Changes
-----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..4351c6a20 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -62,6 +62,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_mp.c
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 85eb1569f..314266041 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -19,8 +19,10 @@
#include <rte_log.h>
#include <rte_spinlock.h>
#include <rte_malloc.h>
+#include <rte_string_fns.h>
#include "eal_private.h"
+#include "hotplug_mp.h"
/**
* The device event callback description.
@@ -127,9 +129,10 @@ int rte_eal_dev_detach(struct rte_device *dev)
return ret;
}
-int
-rte_eal_hotplug_add(const char *busname, const char *devname,
- const char *drvargs)
+/* help funciton to build devargs, caller should free the memory */
+static char *
+build_devargs(const char *busname, const char *devname,
+ const char *drvargs)
{
char *devargs = NULL;
int size, length = -1;
@@ -140,19 +143,33 @@ rte_eal_hotplug_add(const char *busname, const char *devname,
if (length >= size)
devargs = malloc(length + 1);
if (devargs == NULL)
- return -ENOMEM;
+ break;
} while (size == 0);
+ return devargs;
+}
+
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+ const char *drvargs)
+{
+ char *devargs = build_devargs(busname, devname, drvargs);
+
+ if (devargs == NULL)
+ return -ENOMEM;
+
return rte_dev_probe(devargs);
}
-int __rte_experimental
-rte_dev_probe(const char *devargs)
+/* probe device at local process. */
+int
+local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
struct rte_device *dev;
struct rte_devargs *da;
int ret;
+ *new_dev = NULL;
da = calloc(1, sizeof(*da));
if (da == NULL)
return -ENOMEM;
@@ -195,6 +212,8 @@ rte_dev_probe(const char *devargs)
dev->name);
goto err_devarg;
}
+
+ *new_dev = dev;
return 0;
err_devarg:
@@ -226,8 +245,9 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
return rte_dev_remove(dev);
}
-int __rte_experimental
-rte_dev_remove(struct rte_device *dev)
+/* remove device at local process. */
+int
+local_dev_remove(struct rte_device *dev)
{
struct rte_bus *bus;
int ret;
@@ -248,7 +268,194 @@ rte_dev_remove(struct rte_device *dev)
if (ret)
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
dev->name);
- rte_devargs_remove(dev->devargs);
+ else
+ rte_devargs_remove(dev->devargs);
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_probe(const char *devargs)
+{
+ struct eal_dev_mp_req req;
+ struct rte_device *dev;
+ int ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_ATTACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug add device\n");
+ return req.result;
+ }
+
+ /* attach a shared device from primary start from here: */
+
+ /* primary attach the new device itself. */
+ ret = local_dev_probe(devargs, &dev);
+
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on primary process\n");
+
+ /**
+ * it is possible that secondary process failed to attached a
+ * device that primary process have during initialization,
+ * so for -EEXIST case, we still need to sync with secondary
+ * process.
+ */
+ if (ret != -EEXIST)
+ return ret;
+ }
+
+ /* primary send attach sync request to secondary. */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /* if any commnunication error, we need to rollback. */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug add request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to attach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on secondary process\n");
+ ret = req.result;
+
+ /* for -EEXIST, we don't need to rollback. */
+ if (ret == -EEXIST)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
+ /* primary rollback itself. */
+ if (local_dev_remove(dev))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on primary."
+ "Devices in secondary may not sync with primary\n");
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_remove(struct rte_device *dev)
+{
+ struct eal_dev_mp_req req;
+ char *devargs;
+ int ret;
+
+ devargs = build_devargs(dev->devargs->bus->name, dev->name, "");
+ if (devargs == NULL)
+ return -ENOMEM;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_DETACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+ free(devargs);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug remove device\n");
+ return req.result;
+ }
+
+ /* detach a device from primary start from here: */
+
+ /* primary send detach sync request to secondary */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /**
+ * if communication error, we need to rollback, because it is possible
+ * part of the secondary processes still detached it successfully.
+ */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send device detach request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to detach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on secondary process\n");
+ ret = req.result;
+ /**
+ * if -ENOENT, we don't need to rollback, since devices is
+ * already detached on secondary process.
+ */
+ if (ret != -ENOENT)
+ goto rollback;
+ }
+
+ /* primary detach the device itself. */
+ ret = local_dev_remove(dev);
+
+ /* if primary failed, still need to consider if rollback is necessary */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on primary process\n");
+ /* if -ENOENT, we don't need to rollback */
+ if (ret == -ENOENT)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device detach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
return ret;
}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a83c..83f10a9f8 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,34 @@ int
rte_devargs_layers_parse(struct rte_devargs *devargs,
const char *devstr);
+/*
+ * probe a device at local process.
+ *
+ * @param devargs
+ * Device arguments including bus, class and driver properties.
+ * @param new_dev
+ * new device be probed as output.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_probe(const char *devargs, struct rte_device **new_dev);
+
+/**
+ * Hotplug remove a given device from a specific bus at local process.
+ *
+ * @param dev
+ * Data structure of the device to remove.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_remove(struct rte_device *dev);
+
+/**
+ * Register all mp action callbacks for hotplug.
+ *
+ * @return
+ * 0 on success, negative on error.
+ */
+int rte_dev_hotplug_mp_init(void);
+
#endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
new file mode 100644
index 000000000..1c92e44cb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <string.h>
+
+#include <rte_eal.h>
+#include <rte_alarm.h>
+#include <rte_string_fns.h>
+#include <rte_devargs.h>
+
+#include "hotplug_mp.h"
+#include "eal_private.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+{
+ const char *name = _name;
+
+ return strcmp(dev->name, name);
+}
+
+struct mp_reply_bundle {
+ struct rte_mp_msg msg;
+ void *peer;
+};
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ RTE_SET_USED(msg);
+ RTE_SET_USED(peer);
+ return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+ struct mp_reply_bundle *bundle = param;
+ struct rte_mp_msg *msg = &bundle->msg;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct rte_mp_msg mp_resp;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct rte_devargs *da;
+ struct rte_device *dev;
+ struct rte_bus *bus;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+
+ switch (req->t) {
+ case EAL_DEV_REQ_TYPE_ATTACH:
+ case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
+ ret = local_dev_probe(req->devargs, &dev);
+ break;
+ case EAL_DEV_REQ_TYPE_DETACH:
+ case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
+ da = calloc(1, sizeof(*da));
+ if (da == NULL) {
+ ret = -ENOMEM;
+ goto quit;
+ }
+
+ ret = rte_devargs_parse(da, req->devargs);
+ if (ret)
+ goto quit;
+
+ bus = rte_bus_find_by_name(da->bus->name);
+ if (bus == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", da->bus->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ dev = bus->find_device(NULL, cmp_dev_name, da->name);
+ if (dev == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find plugged device (%s)\n", da->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ ret = local_dev_remove(dev);
+quit:
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+ resp->result = ret;
+ if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+
+ free(bundle->peer);
+ free(bundle);
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ struct rte_mp_msg mp_resp;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct mp_reply_bundle *bundle;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+
+ bundle = calloc(1, sizeof(*bundle));
+ if (bundle == NULL) {
+ resp->result = -ENOMEM;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+
+ bundle->msg = *msg;
+ /**
+ * We need to send reply on interrupt thread, but peer can't be
+ * parsed directly, so this is a temporal hack, need to be fixed
+ * when it is ready.
+ */
+ bundle->peer = (void *)strdup(peer);
+
+ /**
+ * We are at IPC callback thread, sync IPC is not allowed due to
+ * dead lock, so we delegate the task to interrupt thread.
+ */
+ ret = rte_eal_alarm_set(1, __handle_primary_request, bundle);
+ if (ret) {
+ resp->result = ret;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+ }
+ return 0;
+}
+
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req)
+{
+ RTE_SET_USED(req);
+ return -ENOTSUP;
+}
+
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
+{
+ struct rte_mp_msg mp_req;
+ struct rte_mp_reply mp_reply;
+ struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+ int ret;
+ int i;
+
+ memset(&mp_req, 0, sizeof(mp_req));
+ memcpy(mp_req.param, req, sizeof(*req));
+ mp_req.len_param = sizeof(*req);
+ strlcpy(mp_req.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+ ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "rte_mp_request_sync failed\n");
+ return ret;
+ }
+
+ if (mp_reply.nb_sent != mp_reply.nb_received) {
+ RTE_LOG(ERR, EAL, "not all secondary reply\n");
+ return -1;
+ }
+
+ req->result = 0;
+ for (i = 0; i < mp_reply.nb_received; i++) {
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_reply.msgs[i].param;
+ if (resp->result) {
+ req->result = resp->result;
+ if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+ req->result != -EEXIST)
+ break;
+ if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
+ req->result != -ENOENT)
+ break;
+ }
+ }
+
+ return 0;
+}
+
+int rte_dev_hotplug_mp_init(void)
+{
+ int ret;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_secondary_request);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ } else {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_primary_request);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ }
+
+ return 0;
+}
diff --git a/lib/librte_eal/common/hotplug_mp.h b/lib/librte_eal/common/hotplug_mp.h
new file mode 100644
index 000000000..c95c8f1fb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _HOTPLUG_MP_H_
+#define _HOTPLUG_MP_H_
+
+#include <rte_dev.h>
+#include <rte_bus.h>
+
+#define EAL_DEV_MP_ACTION_REQUEST "eal_dev_mp_request"
+#define EAL_DEV_MP_ACTION_RESPONSE "eal_dev_mp_response"
+
+#define EAL_DEV_MP_DEV_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
+#define EAL_DEV_MP_BUS_NAME_MAX_LEN 32
+#define EAL_DEV_MP_DEV_ARGS_MAX_LEN 128
+
+enum eal_dev_req_type {
+ EAL_DEV_REQ_TYPE_ATTACH,
+ EAL_DEV_REQ_TYPE_DETACH,
+ EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK,
+ EAL_DEV_REQ_TYPE_DETACH_ROLLBACK,
+};
+
+struct eal_dev_mp_req {
+ enum eal_dev_req_type t;
+ char devargs[EAL_DEV_MP_DEV_ARGS_MAX_LEN];
+ int result;
+};
+
+/**
+ * this is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request issued from primary process.
+ */
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req);
+
+
+#endif /* _HOTPLUG_MP_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 7a30362c0..266331acd 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -190,6 +190,9 @@ int rte_eal_dev_detach(struct rte_device *dev);
/**
* Hotplug add a given device to a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug add the same device. Any failure on other process rollback
+ * the action.
*
* @param busname
* The bus name the device is added to.
@@ -219,6 +222,9 @@ int __rte_experimental rte_dev_probe(const char *devargs);
/**
* Hotplug remove a given device from a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
*
* @param busname
* The bus name the device is removed from.
@@ -234,6 +240,9 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
* @b EXPERIMENTAL: this API may change without prior notice
*
* Remove one device.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
*
* @param dev
* Data structure of the device to remove.
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index b7fc98499..04c414356 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -28,6 +28,7 @@ common_sources = files(
'eal_common_thread.c',
'eal_common_timer.c',
'eal_common_uuid.c',
+ 'hotplug_mp.c',
'malloc_elem.c',
'malloc_heap.c',
'malloc_mp.c',
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..58455c1a6 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -70,6 +70,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_mp.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..f2c90c528 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -865,6 +865,12 @@ rte_eal_init(int argc, char **argv)
}
}
+ /* register mp action callbacks for hotplug */
+ if (rte_dev_hotplug_mp_init() < 0) {
+ rte_eal_init_alert("failed to register mp callback for hotplug\n");
+ return -1;
+ }
+
if (rte_bus_scan()) {
rte_eal_init_alert("Cannot scan the buses for devices\n");
rte_errno = ENODEV;
--
2.13.6
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 3% ` Wang, Yipeng1
@ 2018-09-28 8:26 4% ` Bruce Richardson
2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 23:05 0% ` Honnappa Nagarahalli
1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-28 8:26 UTC (permalink / raw)
To: Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> Reply inlined:
>
> >-----Original Message-----
> >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> >Sent: Thursday, September 6, 2018 10:12 AM
> >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
> >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
> >
> >Reader-writer concurrency issue, caused by moving the keys
> >to their alternative locations during key insert, is solved
> >by introducing a global counter(tbl_chng_cnt) indicating a
> >change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
> we need macros for the compile time check.
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
> MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
> but one difference is reader checks even/odd of the version counter to make sure there is no
> concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value
> >@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
Just removing a const should not change the ABI, I believe, since the const
is just advisory hint to the compiler. Actual parameter size and count
remains unchanged so I don't believe there is an issue.
[ABI experts, please correct me if I'm wrong on this]
/Bruce
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:26 4% ` Bruce Richardson
@ 2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 22:33 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-09-28 8:55 UTC (permalink / raw)
To: Richardson, Bruce, Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Friday, September 28, 2018 9:26 AM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
> On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> > Reply inlined:
> >
> > >-----Original Message-----
> > >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> > >Sent: Thursday, September 6, 2018 10:12 AM
> > >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> > >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com;
> > >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
> > >
> > >Reader-writer concurrency issue, caused by moving the keys
> > >to their alternative locations during key insert, is solved
> > >by introducing a global counter(tbl_chng_cnt) indicating a
> > >change in table.
<snip>
> > > /**
> > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash
> *h, const void *key,
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > >
> > > /**
> > > * Add a key to an existing hash table.
> > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void
> *key);
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >
> > > /
> >
> > I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
>
> Just removing a const should not change the ABI, I believe, since the const
> is just advisory hint to the compiler. Actual parameter size and count
> remains unchanged so I don't believe there is an issue.
> [ABI experts, please correct me if I'm wrong on this]
[Certainly no ABI expert, but...]
I think this is an API break, not ABI break.
Given application code as follows, it will fail to compile - even though
running the new code as a .so wouldn't cause any issues (AFAIK).
void do_hash_stuff(const struct rte_hash *h, ...)
{
/* parameter passed in is const, but updated function prototype is non-const */
rte_hash_add_key_with_hash(h, ...);
}
This means that we can't recompile apps against latest patch without application
code changes, if the app was passing a const rte_hash struct as the first parameter.
-Harry
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
@ 2018-09-29 0:09 0% ` Yongseok Koh
1 sibling, 0 replies; 200+ results
From: Yongseok Koh @ 2018-09-29 0:09 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, Thomas Monjalon, alejandro.lucero
On Thu, Sep 27, 2018 at 11:40:59AM +0100, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>
> Notes:
> v3:
> - Add comment to explain the process of picking up minimum
> page sizes for mempool
>
> v2:
> - Add documentation changes and ABI break
>
> v1:
> - Adjust all calls to memseg walk functions to ignore external
> segments where it made sense to do so
>
> doc/guides/rel_notes/deprecation.rst | 15 --------
> doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
> drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
> drivers/net/mlx4/mlx4_mr.c | 3 ++
> drivers/net/mlx5/mlx5.c | 5 ++-
> drivers/net/mlx5/mlx5_mr.c | 3 ++
> drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
> lib/librte_eal/bsdapp/eal/Makefile | 2 +-
> lib/librte_eal/bsdapp/eal/eal.c | 3 ++
> lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
> lib/librte_eal/common/eal_common_memory.c | 3 ++
> .../common/include/rte_eal_memconfig.h | 1 +
> lib/librte_eal/common/include/rte_memory.h | 9 +++++
> lib/librte_eal/common/malloc_elem.c | 10 ++++--
> lib/librte_eal/common/malloc_heap.c | 9 +++--
> lib/librte_eal/common/rte_malloc.c | 2 +-
> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
> lib/librte_eal/meson.build | 2 +-
> lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
> test/test/test_malloc.c | 3 ++
> test/test/test_memzone.c | 3 ++
> 24 files changed, 134 insertions(+), 44 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 138335dfb..d2aec64d1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
> Deprecation Notices
> -------------------
>
> -* eal: certain structures will change in EAL on account of upcoming external
> - memory support. Aside from internal changes leading to an ABI break, the
> - following externally visible changes will also be implemented:
> -
> - - ``rte_memseg_list`` will change to include a boolean flag indicating
> - whether a particular memseg list is externally allocated. This will have
> - implications for any users of memseg-walk-related functions, as they will
> - now have to skip externally allocated segments in most cases if the intent
> - is to only iterate over internal DPDK memory.
> - - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> - as some socket ID's will now be representing externally allocated memory. No
> - changes will be required for existing code as backwards compatibility will
> - be kept, and those who do not use this feature will not see these extra
> - socket ID's.
> -
> * eal: both declaring and identifying devices will be streamlined in v18.11.
> New functions will appear to query a specific port from buses, classes of
> device and device drivers. Device declaration will be made coherent with the
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index bc9b74ec4..5fc71e208 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -91,6 +91,13 @@ API Changes
> flag the MAC can be properly configured in any case. This is particularly
> important for bonding.
>
> +* eal: The following API changes were made in 18.11:
> +
> + - ``rte_memseg_list`` structure now has an additional flag indicating whether
> + the memseg list is externally allocated. This will have implications for any
> + users of memseg-walk-related functions, as they will now have to skip
> + externally allocated segments in most cases if the intent is to only iterate
> + over internal DPDK memory.
>
> ABI Changes
> -----------
> @@ -107,6 +114,10 @@ ABI Changes
> =========================================================
>
>
> +* eal: EAL library ABI version was changed due to previously announced work on
> + supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
> + a new flag indicating whether the memseg list refers to external memory.
> +
> Removed Items
> -------------
>
> @@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
> librte_compressdev.so.1
> librte_cryptodev.so.5
> librte_distributor.so.1
> - librte_eal.so.8
> + + librte_eal.so.9
> librte_ethdev.so.10
> librte_eventdev.so.4
> librte_flow_classify.so.1
> diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
> index 4c2cd2a87..2e9244fb7 100644
> --- a/drivers/bus/fslmc/fslmc_vfio.c
> +++ b/drivers/bus/fslmc/fslmc_vfio.c
> @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
> }
>
> static int
> -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
> - const struct rte_memseg *ms, void *arg)
> +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
> + void *arg)
> {
> int *n_segs = arg;
> int ret;
>
> + if (msl->external)
> + return 0;
> +
> ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
> if (ret)
> DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
> diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
> index d23d3c613..9f5d790b6 100644
> --- a/drivers/net/mlx4/mlx4_mr.c
> +++ b/drivers/net/mlx4/mlx4_mr.c
> @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
> {
> struct mr_find_contig_memsegs_data *data = arg;
>
> + if (msl->external)
> + return 0;
> +
Because memory free event for external memory is available, current design of
mlx4/mlx5 memory mgmt can accommodate the new external memory support. So,
please remove it so that PMD can traverse external memory as well.
> if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
> return 0;
> /* Found, save it and stop walking. */
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 30d4e70a7..c90e1d8ce 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
> static void *uar_base;
>
> static int
> -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
> +find_lower_va_bound(const struct rte_memseg_list *msl,
> const struct rte_memseg *ms, void *arg)
> {
> void **addr = arg;
>
> + if (msl->external)
> + return 0;
> +
This one is fine.
But can you please remove the blank line?
That's a rule by former maintainers. :-)
> if (*addr == NULL)
> *addr = ms->addr;
> else
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 1d1bcb5fe..fd4345f9c 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
> {
> struct mr_find_contig_memsegs_data *data = arg;
>
> + if (msl->external)
> + return 0;
> +
Like I mentioned, please remove it.
If those two changes in mlx4/5_mr.c are removed, for the whole patch,
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Thanks
Yongseok
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
2018-09-26 12:22 3% ` Burakov, Anatoly
@ 2018-09-29 6:15 3% ` Jeff Guo
2018-10-01 7:51 3% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Jeff Guo @ 2018-09-29 6:15 UTC (permalink / raw)
To: Burakov, Anatoly, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 9/26/2018 8:22 PM, Burakov, Anatoly wrote:
> On 17-Aug-18 11:51 AM, Jeff Guo wrote:
>> There are some extended interrupt types in vfio pci device except
>> from the
>> existing interrupts, such as err and req notifier, it could be useful
>> for
>> device error monitoring. And these corresponding interrupt handler is
>> different from the other interrupt handler that register in PMDs, so
>> a new
>> interrupt handler should be added. This patch will add specific req
>> handler
>> in generic pci device.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> drivers/bus/pci/rte_bus_pci.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/bus/pci/rte_bus_pci.h
>> b/drivers/bus/pci/rte_bus_pci.h
>> index 0d1955f..c45a820 100644
>> --- a/drivers/bus/pci/rte_bus_pci.h
>> +++ b/drivers/bus/pci/rte_bus_pci.h
>> @@ -66,6 +66,7 @@ struct rte_pci_device {
>> uint16_t max_vfs; /**< sriov enable if not
>> zero */
>> enum rte_kernel_driver kdrv; /**< Kernel driver
>> passthrough */
>> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
>> + struct rte_intr_handle req_notifier_handler;/**< Req notifier
>> handle */
>> };
>> /**
>>
>
> Does this break ABI?
>
If add a variable in struct would break ABI, it does.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:55 4% ` Van Haaren, Harry
@ 2018-09-30 22:33 0% ` Honnappa Nagarahalli
2018-10-02 13:17 3% ` Van Haaren, Harry
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-30 22:33 UTC (permalink / raw)
To: Van Haaren, Harry, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> > > >
> > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > >alternative locations during key insert, is solved by introducing a
> > > >global counter(tbl_chng_cnt) indicating a change in table.
>
> <snip>
>
> > > > /**
> > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > >rte_hash
> > *h, const void *key,
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > >
> > > > /**
> > > > * Add a key to an existing hash table.
> > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > >const void
> > *key);
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > >*key,
> > hash_sig_t sig);
> > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > hash_sig_t sig);
> > > >
> > > > /
> > >
> > > I think the above changes will break ABI by changing the parameter type?
> > Other people may know better on this.
> >
> > Just removing a const should not change the ABI, I believe, since the
> > const is just advisory hint to the compiler. Actual parameter size and
> > count remains unchanged so I don't believe there is an issue.
> > [ABI experts, please correct me if I'm wrong on this]
>
>
> [Certainly no ABI expert, but...]
>
> I think this is an API break, not ABI break.
>
> Given application code as follows, it will fail to compile - even though running
> the new code as a .so wouldn't cause any issues (AFAIK).
>
> void do_hash_stuff(const struct rte_hash *h, ...) {
> /* parameter passed in is const, but updated function prototype is non-
> const */
> rte_hash_add_key_with_hash(h, ...);
> }
>
> This means that we can't recompile apps against latest patch without
> application code changes, if the app was passing a const rte_hash struct as
> the first parameter.
>
Agree. Do we need to do anything for this?
>
> -Harry
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
@ 2018-09-30 23:05 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-09-30 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, Ola Liljedahl, nd
> >
> >Reader-writer concurrency issue, caused by moving the keys to their
> >alternative locations during key insert, is solved by introducing a
> >global counter(tbl_chng_cnt) indicating a change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any
> code, otherwise we need macros for the compile time check.
'__atomic_thread_fence(__ATOMIC_RELEASE)' provides load-load and load-store fence [1]. Hence, it should not add any barriers for x86.
[1] https://preshing.com/20130922/acquire-and-release-fences/
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct
> rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and
> Concurrent MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write
> concurrency for hash table, but one difference is reader checks even/odd of
> the version counter to make sure there is no concurrent writer. Could you just
> double check and confirm that this is not needed for your implementation?
>
I relooked at this paper. My patch makes use of the fact that during the process of shifting the key will be present in both primary and secondary buckets. The check for odd version counter is not required as the full key comparison would have identified any false signature matches.
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void
> >*data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void
> >+*data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value @@ -180,7
> >+180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void
> *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void
> >*key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const
> void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> >hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> >+hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
2018-09-29 6:15 3% ` Jeff Guo
@ 2018-10-01 7:51 3% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-01 7:51 UTC (permalink / raw)
To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 29-Sep-18 7:15 AM, Jeff Guo wrote:
>
> On 9/26/2018 8:22 PM, Burakov, Anatoly wrote:
>> On 17-Aug-18 11:51 AM, Jeff Guo wrote:
>>> There are some extended interrupt types in vfio pci device except
>>> from the
>>> existing interrupts, such as err and req notifier, it could be useful
>>> for
>>> device error monitoring. And these corresponding interrupt handler is
>>> different from the other interrupt handler that register in PMDs, so
>>> a new
>>> interrupt handler should be added. This patch will add specific req
>>> handler
>>> in generic pci device.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>> ---
>>> drivers/bus/pci/rte_bus_pci.h | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/bus/pci/rte_bus_pci.h
>>> b/drivers/bus/pci/rte_bus_pci.h
>>> index 0d1955f..c45a820 100644
>>> --- a/drivers/bus/pci/rte_bus_pci.h
>>> +++ b/drivers/bus/pci/rte_bus_pci.h
>>> @@ -66,6 +66,7 @@ struct rte_pci_device {
>>> uint16_t max_vfs; /**< sriov enable if not
>>> zero */
>>> enum rte_kernel_driver kdrv; /**< Kernel driver
>>> passthrough */
>>> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
>>> + struct rte_intr_handle req_notifier_handler;/**< Req notifier
>>> handle */
>>> };
>>> /**
>>>
>>
>> Does this break ABI?
>>
>
> If add a variable in struct would break ABI, it does.
>
>
Then it probably does. So, should probably bump PCI driver ABI version?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
@ 2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
0 siblings, 2 replies; 200+ results
From: Luca Boccassi @ 2018-10-01 9:46 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev, tredaelli, mvarlese, christian.ehrhardt
On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > Allow users and packagers to override the default dpdk/drivers
> > > subdirectory where the PMDs get installed under $lib.
> > >
> > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > ---
> >
> > I'm ok with this change, but what is the current location used by
> > distro's
> > right now? I mistakenly never checked what was done before I used
> > dpdk/drivers as a default value, and I'd like the default to match
> > the
> > common option if possible.
> >
> > /Bruce
> >
>
> Replying to my own question, I've just checked on CentOS and Debian,
> and it
> appears both are using directory "dpdk-pmds" as the subdir name.
> Therefore,
> let's just make that the default. [Does it need to be configurable in
> that
> case?]
>
> /Bruce
If the default is the one I expect then I'm fine without having an
option (actually happier - less things to configure).
But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
We changed because using a single directory creates problems when
multiple different ABI versions are installed, due to the EAL autoload
from that directory. So we need a different subdirectory per ABI
revision.
We were actually talking with Timothy a while ago to make this
consistent across our distros, and perhaps Marco can chip in as well.
Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
too fussy on $something, it can be drivers or pmds or something else.
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 9:46 4% ` Luca Boccassi
@ 2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2018-10-01 10:01 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, tredaelli, mvarlese, christian.ehrhardt
On Mon, Oct 01, 2018 at 10:46:02AM +0100, Luca Boccassi wrote:
> On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > Allow users and packagers to override the default dpdk/drivers
> > > > subdirectory where the PMDs get installed under $lib.
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > ---
> > >
> > > I'm ok with this change, but what is the current location used by
> > > distro's
> > > right now? I mistakenly never checked what was done before I used
> > > dpdk/drivers as a default value, and I'd like the default to match
> > > the
> > > common option if possible.
> > >
> > > /Bruce
> > >
> >
> > Replying to my own question, I've just checked on CentOS and Debian,
> > and it
> > appears both are using directory "dpdk-pmds" as the subdir name.
> > Therefore,
> > let's just make that the default. [Does it need to be configurable in
> > that
> > case?]
> >
> > /Bruce
>
> If the default is the one I expect then I'm fine without having an
> option (actually happier - less things to configure).
>
> But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> We changed because using a single directory creates problems when
> multiple different ABI versions are installed, due to the EAL autoload
> from that directory. So we need a different subdirectory per ABI
> revision.
>
> We were actually talking with Timothy a while ago to make this
> consistent across our distros, and perhaps Marco can chip in as well.
>
> Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> too fussy on $something, it can be drivers or pmds or something else.
>
Sounds like it needs to be configurable, just in case.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
@ 2018-10-01 10:42 0% ` Timothy Redaelli
2018-10-01 11:06 0% ` Bruce Richardson
1 sibling, 1 reply; 200+ results
From: Timothy Redaelli @ 2018-10-01 10:42 UTC (permalink / raw)
To: Luca Boccassi; +Cc: Bruce Richardson, dev, mvarlese, christian.ehrhardt
On Mon, 01 Oct 2018 10:46:02 +0100
Luca Boccassi <bluca@debian.org> wrote:
> On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > Allow users and packagers to override the default dpdk/drivers
> > > > subdirectory where the PMDs get installed under $lib.
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > ---
> > >
> > > I'm ok with this change, but what is the current location used by
> > > distro's
> > > right now? I mistakenly never checked what was done before I used
> > > dpdk/drivers as a default value, and I'd like the default to match
> > > the
> > > common option if possible.
> > >
> > > /Bruce
> > >
> >
> > Replying to my own question, I've just checked on CentOS and Debian,
> > and it
> > appears both are using directory "dpdk-pmds" as the subdir name.
> > Therefore,
> > let's just make that the default. [Does it need to be configurable in
> > that
> > case?]
> >
> > /Bruce
>
> If the default is the one I expect then I'm fine without having an
> option (actually happier - less things to configure).
>
> But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> We changed because using a single directory creates problems when
> multiple different ABI versions are installed, due to the EAL autoload
> from that directory. So we need a different subdirectory per ABI
> revision.
>
> We were actually talking with Timothy a while ago to make this
> consistent across our distros, and perhaps Marco can chip in as well.
>
> Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> too fussy on $something, it can be drivers or pmds or something else.
>
LGTM.
If needed, we can just do a compatibility symlink using the current
dpdk-pmds path
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v7 00/21] Support externally allocated memory in DPDK
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-10-01 11:04 2% ` Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (5 more replies)
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
4 siblings, 6 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 318 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1923 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 11:04 4% ` Anatoly Burakov
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a9cfa423f..09b06061d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 11:05 4% ` Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-10-01 11:04 9% ` Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 10:42 0% ` Timothy Redaelli
@ 2018-10-01 11:06 0% ` Bruce Richardson
2018-10-01 11:24 0% ` Luca Boccassi
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-01 11:06 UTC (permalink / raw)
To: Timothy Redaelli; +Cc: Luca Boccassi, dev, mvarlese, christian.ehrhardt
On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> On Mon, 01 Oct 2018 10:46:02 +0100
> Luca Boccassi <bluca@debian.org> wrote:
>
> > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > > Allow users and packagers to override the default dpdk/drivers
> > > > > subdirectory where the PMDs get installed under $lib.
> > > > >
> > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > ---
> > > >
> > > > I'm ok with this change, but what is the current location used by
> > > > distro's
> > > > right now? I mistakenly never checked what was done before I used
> > > > dpdk/drivers as a default value, and I'd like the default to match
> > > > the
> > > > common option if possible.
> > > >
> > > > /Bruce
> > > >
> > >
> > > Replying to my own question, I've just checked on CentOS and Debian,
> > > and it
> > > appears both are using directory "dpdk-pmds" as the subdir name.
> > > Therefore,
> > > let's just make that the default. [Does it need to be configurable in
> > > that
> > > case?]
> > >
> > > /Bruce
> >
> > If the default is the one I expect then I'm fine without having an
> > option (actually happier - less things to configure).
> >
> > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> > We changed because using a single directory creates problems when
> > multiple different ABI versions are installed, due to the EAL autoload
> > from that directory. So we need a different subdirectory per ABI
> > revision.
> >
> > We were actually talking with Timothy a while ago to make this
> > consistent across our distros, and perhaps Marco can chip in as well.
> >
> > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> > too fussy on $something, it can be drivers or pmds or something else.
> >
>
> LGTM.
> If needed, we can just do a compatibility symlink using the current
> dpdk-pmds path
>
One suggestion/comment. Would using a unique directory per release not lead
to clobbering up the lib directory unnecessarily? How about having a single
"dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a subdir
under that?
E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
dpdk-pmds/18.11
[The former of the above would be my preference, since I don't like having
hypenated names, and like having "dpdk" alone as a folder name :-)]
/Bruce
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 11:04 16% ` Anatoly Burakov
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
22 files changed, 127 insertions(+), 43 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..fc3cb1b49 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 11:06 0% ` Bruce Richardson
@ 2018-10-01 11:24 0% ` Luca Boccassi
2018-10-02 11:02 0% ` Marco Varlese
0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2018-10-01 11:24 UTC (permalink / raw)
To: Bruce Richardson, Timothy Redaelli; +Cc: dev, mvarlese, christian.ehrhardt
On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > On Mon, 01 Oct 2018 10:46:02 +0100
> > Luca Boccassi <bluca@debian.org> wrote:
> >
> > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > wrote:
> > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > wrote:
> > > > > > Allow users and packagers to override the default
> > > > > > dpdk/drivers
> > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > >
> > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > ---
> > > > >
> > > > > I'm ok with this change, but what is the current location
> > > > > used by
> > > > > distro's
> > > > > right now? I mistakenly never checked what was done before I
> > > > > used
> > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > match
> > > > > the
> > > > > common option if possible.
> > > > >
> > > > > /Bruce
> > > > >
> > > >
> > > > Replying to my own question, I've just checked on CentOS and
> > > > Debian,
> > > > and it
> > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > name.
> > > > Therefore,
> > > > let's just make that the default. [Does it need to be
> > > > configurable in
> > > > that
> > > > case?]
> > > >
> > > > /Bruce
> > >
> > > If the default is the one I expect then I'm fine without having
> > > an
> > > option (actually happier - less things to configure).
> > >
> > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > January :-)
> > > We changed because using a single directory creates problems when
> > > multiple different ABI versions are installed, due to the EAL
> > > autoload
> > > from that directory. So we need a different subdirectory per ABI
> > > revision.
> > >
> > > We were actually talking with Timothy a while ago to make this
> > > consistent across our distros, and perhaps Marco can chip in as
> > > well.
> > >
> > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > not
> > > too fussy on $something, it can be drivers or pmds or something
> > > else.
> > >
> >
> > LGTM.
> > If needed, we can just do a compatibility symlink using the current
> > dpdk-pmds path
> >
>
> One suggestion/comment. Would using a unique directory per release
> not lead
> to clobbering up the lib directory unnecessarily? How about having a
> single
> "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> subdir
> under that?
>
> E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> dpdk-pmds/18.11
>
> [The former of the above would be my preference, since I don't like
> having
> hypenated names, and like having "dpdk" alone as a folder name :-)]
>
> /Bruce
dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-01 12:56 12% ` Anatoly Burakov
2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
doc/guides/rel_notes/release_18_11.rst | 8 +++++++-
drivers/bus/pci/linux/pci.c | 2 +-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++
lib/librte_eal/common/eal_common_memory.c | 5 ++---
lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
lib/librte_eal/meson.build | 2 +-
10 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c00e33cc..9c17762a5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -134,6 +134,12 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK:
+ - structure ``rte_memseg_list`` now has a new field indicating length
+ of memory addressed by the segment list
+
+
Removed Items
-------------
@@ -179,7 +185,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
static int
find_max_end_va(const struct rte_memseg_list *msl, void *arg)
{
- size_t sz = msl->memseg_arr.len * msl->page_sz;
+ size_t sz = msl->len;
void *end_va = RTE_PTR_ADD(msl->base_va, sz);
void **max_va = arg;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
}
msl->base_va = addr;
msl->page_sz = page_sz;
+ msl->len = internal_config.memory;
msl->socket_id = 0;
/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
/* a memseg list was specified, check if it's the right one */
start = msl->base_va;
- end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr < start || addr >= end)
return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
msl = &mcfg->memsegs[msl_idx];
start = msl->base_va;
- end = RTE_PTR_ADD(start,
- (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr >= start && addr < end)
break;
}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
+ size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
int msl_idx, seg_idx, ret, dir_fd = -1;
start_addr = (uintptr_t) msl->base_va;
- end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+ end_addr = start_addr + msl->len;
if ((uintptr_t)wa->ms->addr < start_addr ||
(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
return -1;
}
local_msl->base_va = primary_msl->base_va;
+ local_msl->len = primary_msl->len;
return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
msl->base_va = addr;
msl->page_sz = page_sz;
msl->socket_id = 0;
+ msl->len = internal_config.memory;
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
if (msl->memseg_arr.count > 0)
continue;
/* this is an unused list, deallocate it */
- mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+ mem_sz = msl->len;
munmap(msl->base_va, mem_sz);
msl->base_va = NULL;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
--
2.17.1
^ permalink raw reply [relevance 12%]
* [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-01 12:56 8% ` Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9583f3eda..a6bddaaf4 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -156,6 +156,8 @@ ABI Changes
ID the malloc heap belongs to
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+ - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 8%]
* [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (4 preceding siblings ...)
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 12:56 7% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a6bddaaf4..cb6308b1f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -157,6 +157,8 @@ ABI Changes
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
- structure ``rte_malloc_heap`` now has a ``heap_name`` member
+ - structure ``rte_eal_memconfig`` has been extended to contain next
+ socket ID for externally allocated segments
Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 7%]
* [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 12:56 5% ` Anatoly Burakov
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, shreyansh.jain, shahafs, arybchenko,
alejandro.lucero
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.
This breaks the ABI, so document the changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
config/common_base | 1 +
config/rte_config.h | 1 +
doc/guides/rel_notes/release_18_11.rst | 5 +-
.../common/include/rte_eal_memconfig.h | 4 +-
.../common/include/rte_malloc_heap.h | 1 +
lib/librte_eal/common/malloc_heap.c | 102 +++++++++++++-----
lib/librte_eal/common/malloc_heap.h | 3 +
lib/librte_eal/common/rte_malloc.c | 41 ++++---
8 files changed, 114 insertions(+), 44 deletions(-)
diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_MAX_LCORE=128
CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
CONFIG_RTE_MAX_MEMSEG_LISTS=64
# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
#define RTE_BUILD_SHARED_LIB
/* EAL defines */
+#define RTE_MAX_HEAPS 32
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d55e12a27..c627c1e88 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -145,7 +145,10 @@ ABI Changes
of memory addressed by the segment list
- structure ``rte_memseg_list`` now has a new flag indicating whether
the memseg list refers to external memory
-
+ - structure ``rte_malloc_heap`` now has a new field indicating socket
+ ID the malloc heap belongs to
+ - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+ resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
- /* Heaps of Malloc per socket */
- struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+ /* Heaps of Malloc */
+ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
+ unsigned int socket_id;
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
return check_flag & flags;
}
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (heap->socket_id == socket_id)
+ return i;
+ }
+ return -1;
+}
+
/*
* Expand the heap with a memory area.
*/
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *found_msl;
struct malloc_heap *heap;
- int msl_idx;
+ int msl_idx, heap_idx;
if (msl->external)
return 0;
- heap = &mcfg->malloc_heaps[msl->socket_id];
+ heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+ if (heap_idx < 0) {
+ RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+ return -1;
+ }
+ heap = &mcfg->malloc_heaps[heap_idx];
/* msl is const, so find it */
msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
+ heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
/* this will try lower page sizes first */
static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
- unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+ unsigned int heap_id, unsigned int flags, size_t align,
+ size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+ int socket_id;
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
* we may still be able to allocate memory from appropriate page sizes,
* we just need to request more memory first.
*/
+
+ socket_id = rte_socket_id_by_idx(heap_id);
+ /*
+ * if socket ID is negative, we cannot find a socket ID for this heap -
+ * which means it's an external heap. those can have unexpected page
+ * sizes, so if the user asked to allocate from there - assume user
+ * knows what they're doing, and allow allocating from there with any
+ * page size flags.
+ */
+ if (socket_id < 0)
+ size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
if (ret != NULL)
goto alloc_unlock;
- if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
- contig)) {
+ /* if socket ID is invalid, this is an external heap */
+ if (socket_id < 0)
+ goto alloc_unlock;
+
+ if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+ bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);
/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
- int socket, i, cur_socket;
+ int socket, heap_id, i;
void *ret;
/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
- contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+ bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
- /* try other heaps */
+ /* try other heaps. we are only iterating through native DPDK sockets,
+ * so external heaps won't be included.
+ */
for (i = 0; i < (int) rte_socket_count(); i++) {
- cur_socket = rte_socket_id_by_idx(i);
- if (cur_socket == socket)
+ if (i == heap_id)
continue;
- ret = heap_alloc_on_socket(type, size, cur_socket, flags,
- align, bound, contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+ bound, contig);
if (ret != NULL)
return ret;
}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
}
static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
- size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+ unsigned int flags, size_t align, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
size_t align, bool contig)
{
- int socket, i, cur_socket;
+ int socket, i, cur_socket, heap_id;
void *ret;
/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+ ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
cur_socket = rte_socket_id_by_idx(i);
if (cur_socket == socket)
continue;
- ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
- align, contig);
+ ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+ contig);
if (ret != NULL)
return ret;
}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
/* ...of which we can't avail if we are in legacy mode, or if this is an
* externally allocated segment.
*/
- if (internal_config.legacy_mem || msl->external)
+ if (internal_config.legacy_mem || (msl->external > 0))
goto free_unlock;
/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
int
malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f);
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
int
rte_eal_malloc_heap_init(void);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int heap_idx, ret = -1;
- if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
- return -1;
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+ heap_idx = malloc_socket_to_heap_id(socket);
+ if (heap_idx < 0)
+ goto unlock;
+
+ ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+ socket_stats);
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
}
/*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned int idx;
- for (idx = 0; idx < rte_socket_count(); idx++) {
- unsigned int socket = rte_socket_id_by_idx(idx);
- fprintf(f, "Heap on socket %i:\n", socket);
- malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ fprintf(f, "Heap id: %u\n", idx);
+ malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}
/*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
void
rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
{
- unsigned int socket;
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int heap_id;
struct rte_malloc_socket_stats sock_stats;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
/* Iterate through all initialised heaps */
- for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
- if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
- continue;
+ for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
- fprintf(f, "Socket:%u\n", socket);
+ malloc_heap_get_stats(heap, &sock_stats);
+
+ fprintf(f, "Heap id:%u\n", heap_id);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
return;
}
--
2.17.1
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [PATCH v8 00/21] Support externally allocated memory in DPDK
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 12:56 3% ` Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
` (5 more replies)
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
` (4 subsequent siblings)
5 siblings, 6 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 320 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 37 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 +
.../net/virtio/virtio_user/virtio_user_dev.c | 6 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1930 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 12:56 13% ` Anatoly Burakov
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 9 ++++-
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 ++
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
19 files changed, 119 insertions(+), 39 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c17762a5..d55e12a27 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -102,6 +102,12 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+ whether the memseg list is externally allocated. This will have implications
+ for any users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
+
* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
functions were deprecated since 17.05 and are replaced by
``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -118,7 +124,6 @@ API Changes
To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
offload.
-
ABI Changes
-----------
@@ -138,6 +143,8 @@ ABI Changes
supporting external memory in DPDK:
- structure ``rte_memseg_list`` now has a new field indicating length
of memory addressed by the segment list
+ - structure ``rte_memseg_list`` now has a new flag indicating whether
+ the memseg list refers to external memory
Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
void *start_addr;
uint64_t len;
+ if (msl->external)
+ return 0;
+
if (vm->nregions >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 13%]
* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-02 9:03 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-01 17:01 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
On Mon, 1 Oct 2018 13:56:09 +0100
Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688dd..1d8b0a6fe 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -30,6 +30,7 @@ struct rte_memseg_list {
> uint64_t addr_64;
> /**< Makes sure addr is always 64-bits */
> };
> + size_t len; /**< Length of memory area covered by this memseg list. */
> int socket_id; /**< Socket ID for all memsegs in this list. */
> uint64_t page_sz; /**< Page size for all memsegs in this list. */
> volatile uint32_t version; /**< version number for multiprocess sync. */
If you are going to break ABI, why not try and rearrange to eliminate holes:
Output of pahole (on x86 64 bit):
struct rte_memseg_list {
union {
void * base_va; /* 0 8 */
uint64_t addr_64; /* 0 8 */
}; /* 0 8 */
size_t len; /* 8 8 */
int socket_id; /* 16 4 */
/* XXX 4 bytes hole, try to pack */
uint64_t page_sz; /* 24 8 */
volatile uint32_t version; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct rte_fbarray memseg_arr; /* 40 96 */
/* XXX last struct has 4 bytes of padding */
/* size: 136, cachelines: 3, members: 6 */
/* sum members: 128, holes: 2, sum holes: 8 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 8 bytes */
};
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 17:01 3% ` Stephen Hemminger
@ 2018-10-02 9:03 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-02 9:03 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
On 01-Oct-18 6:01 PM, Stephen Hemminger wrote:
> On Mon, 1 Oct 2018 13:56:09 +0100
> Anatoly Burakov <anatoly.burakov@intel.com> wrote:
>
>> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> index aff0688dd..1d8b0a6fe 100644
>> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
>> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> @@ -30,6 +30,7 @@ struct rte_memseg_list {
>> uint64_t addr_64;
>> /**< Makes sure addr is always 64-bits */
>> };
>> + size_t len; /**< Length of memory area covered by this memseg list. */
>> int socket_id; /**< Socket ID for all memsegs in this list. */
>> uint64_t page_sz; /**< Page size for all memsegs in this list. */
>> volatile uint32_t version; /**< version number for multiprocess sync. */
>
> If you are going to break ABI, why not try and rearrange to eliminate holes:
>
> Output of pahole (on x86 64 bit):
>
> struct rte_memseg_list {
> union {
> void * base_va; /* 0 8 */
> uint64_t addr_64; /* 0 8 */
> }; /* 0 8 */
> size_t len; /* 8 8 */
> int socket_id; /* 16 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> uint64_t page_sz; /* 24 8 */
> volatile uint32_t version; /* 32 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct rte_fbarray memseg_arr; /* 40 96 */
>
> /* XXX last struct has 4 bytes of padding */
>
> /* size: 136, cachelines: 3, members: 6 */
> /* sum members: 128, holes: 2, sum holes: 8 */
> /* paddings: 1, sum paddings: 4 */
> /* last cacheline: 8 bytes */
> };
>
Hi Stephen,
This data structure isn't performance-critical in any remote sense, but
sure, I can do that.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 11:24 0% ` Luca Boccassi
@ 2018-10-02 11:02 0% ` Marco Varlese
2018-10-02 12:23 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Marco Varlese @ 2018-10-02 11:02 UTC (permalink / raw)
To: Luca Boccassi, Bruce Richardson, Timothy Redaelli; +Cc: dev, christian.ehrhardt
On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > Luca Boccassi <bluca@debian.org> wrote:
> > >
> > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > wrote:
> > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > wrote:
> > > > > > > Allow users and packagers to override the default
> > > > > > > dpdk/drivers
> > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > >
> > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > ---
> > > > > >
> > > > > > I'm ok with this change, but what is the current location
> > > > > > used by
> > > > > > distro's
> > > > > > right now? I mistakenly never checked what was done before I
> > > > > > used
> > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > match
> > > > > > the
> > > > > > common option if possible.
> > > > > >
> > > > > > /Bruce
> > > > > >
> > > > >
> > > > > Replying to my own question, I've just checked on CentOS and
> > > > > Debian,
> > > > > and it
> > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > name.
> > > > > Therefore,
> > > > > let's just make that the default. [Does it need to be
> > > > > configurable in
> > > > > that
> > > > > case?]
> > > > >
> > > > > /Bruce
> > > >
> > > > If the default is the one I expect then I'm fine without having
> > > > an
> > > > option (actually happier - less things to configure).
> > > >
> > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > January :-)
> > > > We changed because using a single directory creates problems when
> > > > multiple different ABI versions are installed, due to the EAL
> > > > autoload
> > > > from that directory. So we need a different subdirectory per ABI
> > > > revision.
> > > >
> > > > We were actually talking with Timothy a while ago to make this
> > > > consistent across our distros, and perhaps Marco can chip in as
> > > > well.
> > > >
> > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > not
> > > > too fussy on $something, it can be drivers or pmds or something
> > > > else.
> > > >
> > >
> > > LGTM.
> > > If needed, we can just do a compatibility symlink using the current
> > > dpdk-pmds path
> > >
> >
> > One suggestion/comment. Would using a unique directory per release
> > not lead
> > to clobbering up the lib directory unnecessarily? How about having a
> > single
> > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > subdir
> > under that?
> >
> > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > dpdk-pmds/18.11
> >
> > [The former of the above would be my preference, since I don't like
> > having
> > hypenated names, and like having "dpdk" alone as a folder name :-)]
> >
> > /Bruce
>
> dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
That would work for us.
However, I would suggest to have the path to be configurable (feature to be
dropped in maybe next release). Just to make sure the transition can happen
without pain in the remote circumstance that something goes wrong with
packaging...
>
--
Marco V
SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-02 11:02 0% ` Marco Varlese
@ 2018-10-02 12:23 0% ` Bruce Richardson
2018-10-02 13:07 0% ` Luca Boccassi
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-02 12:23 UTC (permalink / raw)
To: Marco Varlese; +Cc: Luca Boccassi, Timothy Redaelli, dev, christian.ehrhardt
On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote:
> On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > > Luca Boccassi <bluca@debian.org> wrote:
> > > >
> > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > > wrote:
> > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > > wrote:
> > > > > > > > Allow users and packagers to override the default
> > > > > > > > dpdk/drivers
> > > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > >
> > > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > > ---
> > > > > > >
> > > > > > > I'm ok with this change, but what is the current location
> > > > > > > used by
> > > > > > > distro's
> > > > > > > right now? I mistakenly never checked what was done before I
> > > > > > > used
> > > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > > match
> > > > > > > the
> > > > > > > common option if possible.
> > > > > > >
> > > > > > > /Bruce
> > > > > > >
> > > > > >
> > > > > > Replying to my own question, I've just checked on CentOS and
> > > > > > Debian,
> > > > > > and it
> > > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > > name.
> > > > > > Therefore,
> > > > > > let's just make that the default. [Does it need to be
> > > > > > configurable in
> > > > > > that
> > > > > > case?]
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > If the default is the one I expect then I'm fine without having
> > > > > an
> > > > > option (actually happier - less things to configure).
> > > > >
> > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > > January :-)
> > > > > We changed because using a single directory creates problems when
> > > > > multiple different ABI versions are installed, due to the EAL
> > > > > autoload
> > > > > from that directory. So we need a different subdirectory per ABI
> > > > > revision.
> > > > >
> > > > > We were actually talking with Timothy a while ago to make this
> > > > > consistent across our distros, and perhaps Marco can chip in as
> > > > > well.
> > > > >
> > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > > not
> > > > > too fussy on $something, it can be drivers or pmds or something
> > > > > else.
> > > > >
> > > >
> > > > LGTM.
> > > > If needed, we can just do a compatibility symlink using the current
> > > > dpdk-pmds path
> > > >
> > >
> > > One suggestion/comment. Would using a unique directory per release
> > > not lead
> > > to clobbering up the lib directory unnecessarily? How about having a
> > > single
> > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > > subdir
> > > under that?
> > >
> > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > > dpdk-pmds/18.11
> > >
> > > [The former of the above would be my preference, since I don't like
> > > having
> > > hypenated names, and like having "dpdk" alone as a folder name :-)]
> > >
> > > /Bruce
> >
> > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
> That would work for us.
> However, I would suggest to have the path to be configurable (feature to be
> dropped in maybe next release). Just to make sure the transition can happen
> without pain in the remote circumstance that something goes wrong with
> packaging...
> >
> --
> Marco V
>
Yes, I think it needs to be configurable for the forseeable future. If the
DPDK version is to be put in the path then we either need to always use a
configurable version, since we can't hardcode a version number in the
default, or else we need to put logic in the meson.build file to always
insert a version number.
/Bruce
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
@ 2018-10-02 13:06 3% ` Luca Boccassi
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
3 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 13:06 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-02 12:23 0% ` Bruce Richardson
@ 2018-10-02 13:07 0% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 13:07 UTC (permalink / raw)
To: Bruce Richardson, Marco Varlese; +Cc: Timothy Redaelli, dev, christian.ehrhardt
On Tue, 2018-10-02 at 13:23 +0100, Bruce Richardson wrote:
> On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote:
> > On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> > > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli
> > > > wrote:
> > > > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > > > Luca Boccassi <bluca@debian.org> wrote:
> > > > >
> > > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce
> > > > > > > Richardson
> > > > > > > wrote:
> > > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > > > wrote:
> > > > > > > > > Allow users and packagers to override the default
> > > > > > > > > dpdk/drivers
> > > > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > I'm ok with this change, but what is the current
> > > > > > > > location
> > > > > > > > used by
> > > > > > > > distro's
> > > > > > > > right now? I mistakenly never checked what was done
> > > > > > > > before I
> > > > > > > > used
> > > > > > > > dpdk/drivers as a default value, and I'd like the
> > > > > > > > default to
> > > > > > > > match
> > > > > > > > the
> > > > > > > > common option if possible.
> > > > > > > >
> > > > > > > > /Bruce
> > > > > > > >
> > > > > > >
> > > > > > > Replying to my own question, I've just checked on CentOS
> > > > > > > and
> > > > > > > Debian,
> > > > > > > and it
> > > > > > > appears both are using directory "dpdk-pmds" as the
> > > > > > > subdir
> > > > > > > name.
> > > > > > > Therefore,
> > > > > > > let's just make that the default. [Does it need to be
> > > > > > > configurable in
> > > > > > > that
> > > > > > > case?]
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > If the default is the one I expect then I'm fine without
> > > > > > having
> > > > > > an
> > > > > > option (actually happier - less things to configure).
> > > > > >
> > > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > > > January :-)
> > > > > > We changed because using a single directory creates
> > > > > > problems when
> > > > > > multiple different ABI versions are installed, due to the
> > > > > > EAL
> > > > > > autoload
> > > > > > from that directory. So we need a different subdirectory
> > > > > > per ABI
> > > > > > revision.
> > > > > >
> > > > > > We were actually talking with Timothy a while ago to make
> > > > > > this
> > > > > > consistent across our distros, and perhaps Marco can chip
> > > > > > in as
> > > > > > well.
> > > > > >
> > > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for
> > > > > > you? I'm
> > > > > > not
> > > > > > too fussy on $something, it can be drivers or pmds or
> > > > > > something
> > > > > > else.
> > > > > >
> > > > >
> > > > > LGTM.
> > > > > If needed, we can just do a compatibility symlink using the
> > > > > current
> > > > > dpdk-pmds path
> > > > >
> > > >
> > > > One suggestion/comment. Would using a unique directory per
> > > > release
> > > > not lead
> > > > to clobbering up the lib directory unnecessarily? How about
> > > > having a
> > > > single
> > > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as
> > > > a
> > > > subdir
> > > > under that?
> > > >
> > > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > > > dpdk-pmds/18.11
> > > >
> > > > [The former of the above would be my preference, since I don't
> > > > like
> > > > having
> > > > hypenated names, and like having "dpdk" alone as a folder name
> > > > :-)]
> > > >
> > > > /Bruce
> > >
> > > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
> >
> > That would work for us.
> > However, I would suggest to have the path to be configurable
> > (feature to be
> > dropped in maybe next release). Just to make sure the transition
> > can happen
> > without pain in the remote circumstance that something goes wrong
> > with
> > packaging...
> > >
> >
> > --
> > Marco V
> >
>
> Yes, I think it needs to be configurable for the forseeable future.
> If the
> DPDK version is to be put in the path then we either need to always
> use a
> configurable version, since we can't hardcode a version number in the
> default, or else we need to put logic in the meson.build file to
> always
> insert a version number.
>
> /Bruce
Ok, in v2 I added a small bit of logic to set the default to the major
version number (and also the override option).
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-30 22:33 0% ` Honnappa Nagarahalli
@ 2018-10-02 13:17 3% ` Van Haaren, Harry
2018-10-02 23:58 0% ` Wang, Yipeng1
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-10-02 13:17 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Sunday, September 30, 2018 11:33 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
> Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd
> <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
>
> > > > >
> > > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > > >alternative locations during key insert, is solved by introducing a
> > > > >global counter(tbl_chng_cnt) indicating a change in table.
> >
> > <snip>
> >
> > > > > /**
> > > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > > >rte_hash
> > > *h, const void *key,
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > > >
> > > > > /**
> > > > > * Add a key to an existing hash table.
> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > > >const void
> > > *key);
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > > >*key,
> > > hash_sig_t sig);
> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > > hash_sig_t sig);
> > > > >
> > > > > /
> > > >
> > > > I think the above changes will break ABI by changing the parameter
> type?
> > > Other people may know better on this.
> > >
> > > Just removing a const should not change the ABI, I believe, since the
> > > const is just advisory hint to the compiler. Actual parameter size and
> > > count remains unchanged so I don't believe there is an issue.
> > > [ABI experts, please correct me if I'm wrong on this]
> >
> >
> > [Certainly no ABI expert, but...]
> >
> > I think this is an API break, not ABI break.
> >
> > Given application code as follows, it will fail to compile - even though
> running
> > the new code as a .so wouldn't cause any issues (AFAIK).
> >
> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > /* parameter passed in is const, but updated function prototype is
> non-
> > const */
> > rte_hash_add_key_with_hash(h, ...);
> > }
> >
> > This means that we can't recompile apps against latest patch without
> > application code changes, if the app was passing a const rte_hash struct
> as
> > the first parameter.
> >
> Agree. Do we need to do anything for this?
I think we should try to avoid breaking API wherever possible.
If we must, then I suppose we could follow the ABI process of
a deprecation notice.
>From my reading of the versioning docs, it doesn't document this case:
https://doc.dpdk.org/guides/contributing/versioning.html
I don't recall a similar situation in DPDK previously - so I suggest
you ask Tech board for input here.
Hope that helps! -Harry
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-02 13:34 10% ` Anatoly Burakov
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 8 +++++
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 ++
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
19 files changed, 119 insertions(+), 38 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 58bb79022..bc1d56130 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -118,6 +118,12 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+ whether the memseg list is externally allocated. This will have implications
+ for any users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
+
* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
functions were deprecated since 17.05 and are replaced by
``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -157,6 +163,8 @@ ABI Changes
supporting external memory in DPDK:
- structure ``rte_memseg_list`` now has a new field indicating length
of memory addressed by the segment list
+ - structure ``rte_memseg_list`` now has a new flag indicating whether
+ the memseg list refers to external memory
Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
void *start_addr;
uint64_t len;
+ if (msl->external)
+ return 0;
+
if (vm->nregions >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d2362985..645288b02 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -34,6 +34,7 @@ struct rte_memseg_list {
int socket_id; /**< Socket ID for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
size_t len; /**< Length of memory area covered by this memseg list. */
+ unsigned int external; /**< 1 if this list points to external memory */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 10%]
* [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-02 13:34 5% ` Anatoly Burakov
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, shreyansh.jain, shahafs, arybchenko,
alejandro.lucero
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.
This breaks the ABI, so document the changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
config/common_base | 1 +
config/rte_config.h | 1 +
doc/guides/rel_notes/release_18_11.rst | 5 +-
.../common/include/rte_eal_memconfig.h | 4 +-
.../common/include/rte_malloc_heap.h | 1 +
lib/librte_eal/common/malloc_heap.c | 102 +++++++++++++-----
lib/librte_eal/common/malloc_heap.h | 3 +
lib/librte_eal/common/rte_malloc.c | 41 ++++---
8 files changed, 114 insertions(+), 44 deletions(-)
diff --git a/config/common_base b/config/common_base
index acc5211bc..83350e0b1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_MAX_LCORE=128
CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
CONFIG_RTE_MAX_MEMSEG_LISTS=64
# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 20c58dff1..816e6f879 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
#define RTE_BUILD_SHARED_LIB
/* EAL defines */
+#define RTE_MAX_HEAPS 32
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc1d56130..0607a3980 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -165,7 +165,10 @@ ABI Changes
of memory addressed by the segment list
- structure ``rte_memseg_list`` now has a new flag indicating whether
the memseg list refers to external memory
-
+ - structure ``rte_malloc_heap`` now has a new field indicating socket
+ ID the malloc heap belongs to
+ - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+ resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 645288b02..7634bff5d 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
- /* Heaps of Malloc per socket */
- struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+ /* Heaps of Malloc */
+ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..d432cef88 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -26,6 +26,7 @@ struct malloc_heap {
struct malloc_elem *volatile last;
unsigned alloc_count;
+ unsigned int socket_id;
size_t total_size;
} __rte_cache_aligned;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
return check_flag & flags;
}
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (heap->socket_id == socket_id)
+ return i;
+ }
+ return -1;
+}
+
/*
* Expand the heap with a memory area.
*/
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *found_msl;
struct malloc_heap *heap;
- int msl_idx;
+ int msl_idx, heap_idx;
if (msl->external)
return 0;
- heap = &mcfg->malloc_heaps[msl->socket_id];
+ heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+ if (heap_idx < 0) {
+ RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+ return -1;
+ }
+ heap = &mcfg->malloc_heaps[heap_idx];
/* msl is const, so find it */
msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
+ heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
/* this will try lower page sizes first */
static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
- unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+ unsigned int heap_id, unsigned int flags, size_t align,
+ size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+ int socket_id;
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
* we may still be able to allocate memory from appropriate page sizes,
* we just need to request more memory first.
*/
+
+ socket_id = rte_socket_id_by_idx(heap_id);
+ /*
+ * if socket ID is negative, we cannot find a socket ID for this heap -
+ * which means it's an external heap. those can have unexpected page
+ * sizes, so if the user asked to allocate from there - assume user
+ * knows what they're doing, and allow allocating from there with any
+ * page size flags.
+ */
+ if (socket_id < 0)
+ size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
if (ret != NULL)
goto alloc_unlock;
- if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
- contig)) {
+ /* if socket ID is invalid, this is an external heap */
+ if (socket_id < 0)
+ goto alloc_unlock;
+
+ if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+ bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);
/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
- int socket, i, cur_socket;
+ int socket, heap_id, i;
void *ret;
/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
- contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+ bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
- /* try other heaps */
+ /* try other heaps. we are only iterating through native DPDK sockets,
+ * so external heaps won't be included.
+ */
for (i = 0; i < (int) rte_socket_count(); i++) {
- cur_socket = rte_socket_id_by_idx(i);
- if (cur_socket == socket)
+ if (i == heap_id)
continue;
- ret = heap_alloc_on_socket(type, size, cur_socket, flags,
- align, bound, contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+ bound, contig);
if (ret != NULL)
return ret;
}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
}
static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
- size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+ unsigned int flags, size_t align, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
size_t align, bool contig)
{
- int socket, i, cur_socket;
+ int socket, i, cur_socket, heap_id;
void *ret;
/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+ ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
cur_socket = rte_socket_id_by_idx(i);
if (cur_socket == socket)
continue;
- ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
- align, contig);
+ ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+ contig);
if (ret != NULL)
return ret;
}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
/* ...of which we can't avail if we are in legacy mode, or if this is an
* externally allocated segment.
*/
- if (internal_config.legacy_mem || msl->external)
+ if (internal_config.legacy_mem || (msl->external > 0))
goto free_unlock;
/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
int
malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f);
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
int
rte_eal_malloc_heap_init(void);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int heap_idx, ret = -1;
- if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
- return -1;
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+ heap_idx = malloc_socket_to_heap_id(socket);
+ if (heap_idx < 0)
+ goto unlock;
+
+ ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+ socket_stats);
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
}
/*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned int idx;
- for (idx = 0; idx < rte_socket_count(); idx++) {
- unsigned int socket = rte_socket_id_by_idx(idx);
- fprintf(f, "Heap on socket %i:\n", socket);
- malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ fprintf(f, "Heap id: %u\n", idx);
+ malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}
/*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
void
rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
{
- unsigned int socket;
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int heap_id;
struct rte_malloc_socket_stats sock_stats;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
/* Iterate through all initialised heaps */
- for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
- if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
- continue;
+ for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
- fprintf(f, "Socket:%u\n", socket);
+ malloc_heap_get_stats(heap, &sock_stats);
+
+ fprintf(f, "Heap id:%u\n", heap_id);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
return;
}
--
2.17.1
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-02 13:34 8% ` Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 172c42f71..754c41755 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -176,6 +176,8 @@ ABI Changes
ID the malloc heap belongs to
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+ - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d432cef88..4a7e0eb1d 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
unsigned int socket_id;
size_t total_size;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 8%]
* [dpdk-dev] [PATCH v9 00/21] Support externally allocated memory in DPDK
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-02 13:34 3% ` Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
` (4 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v9 -> v8 changes:
- Rebase on latest master
- Minor cosmetic testpmd changes as per Bernard's feedback
- Pack structures better (Stephen's suggestion)
- Touch pages before finding their IOVA address
v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 325 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 36 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 +
.../net/virtio/virtio_user/virtio_user_dev.c | 6 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 11 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1936 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
@ 2018-10-02 13:34 13% ` Anatoly Burakov
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
doc/guides/rel_notes/release_18_11.rst | 8 +++++++-
drivers/bus/pci/linux/pci.c | 2 +-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++
lib/librte_eal/common/eal_common_memory.c | 5 ++---
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
lib/librte_eal/meson.build | 2 +-
10 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a8327ea77..58bb79022 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -153,6 +153,12 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK:
+ - structure ``rte_memseg_list`` now has a new field indicating length
+ of memory addressed by the segment list
+
+
Removed Items
-------------
@@ -198,7 +204,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
+ librte_eventdev.so.6
librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
static int
find_max_end_va(const struct rte_memseg_list *msl, void *arg)
{
- size_t sz = msl->memseg_arr.len * msl->page_sz;
+ size_t sz = msl->len;
void *end_va = RTE_PTR_ADD(msl->base_va, sz);
void **max_va = arg;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
}
msl->base_va = addr;
msl->page_sz = page_sz;
+ msl->len = internal_config.memory;
msl->socket_id = 0;
/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
/* a memseg list was specified, check if it's the right one */
start = msl->base_va;
- end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr < start || addr >= end)
return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
msl = &mcfg->memsegs[msl_idx];
start = msl->base_va;
- end = RTE_PTR_ADD(start,
- (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr >= start && addr < end)
break;
}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d2362985 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,9 +30,10 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
- int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ int socket_id; /**< Socket ID for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
+ size_t len; /**< Length of memory area covered by this memseg list. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
int msl_idx, seg_idx, ret, dir_fd = -1;
start_addr = (uintptr_t) msl->base_va;
- end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+ end_addr = start_addr + msl->len;
if ((uintptr_t)wa->ms->addr < start_addr ||
(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
return -1;
}
local_msl->base_va = primary_msl->base_va;
+ local_msl->len = primary_msl->len;
return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
msl->base_va = addr;
msl->page_sz = page_sz;
msl->socket_id = 0;
+ msl->len = internal_config.memory;
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
if (msl->memseg_arr.count > 0)
continue;
/* this is an unused list, deallocate it */
- mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+ mem_sz = msl->len;
munmap(msl->base_va, mem_sz);
msl->base_va = NULL;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
--
2.17.1
^ permalink raw reply [relevance 13%]
* [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (4 preceding siblings ...)
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-02 13:34 7% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 754c41755..e7674adb9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -177,6 +177,8 @@ ABI Changes
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
- structure ``rte_malloc_heap`` now has a ``heap_name`` member
+ - structure ``rte_eal_memconfig`` has been extended to contain next
+ socket ID for externally allocated segments
Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 7634bff5d..fc44c4e5f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 7%]
* [dpdk-dev] [PATCH v3 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 13:06 3% ` [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY Luca Boccassi
@ 2018-10-02 15:25 3% ` Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
3 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 15:25 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
` (2 preceding siblings ...)
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
@ 2018-10-02 16:20 3% ` Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
` (2 more replies)
3 siblings, 3 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 16:20 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
@ 2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-10-02 16:28 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, tredaelli, christian.ehrhardt, mvarlese
On Tue, Oct 02, 2018 at 05:20:45PM +0100, Luca Boccassi wrote:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 13:17 3% ` Van Haaren, Harry
@ 2018-10-02 23:58 0% ` Wang, Yipeng1
2018-10-03 17:32 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Wang, Yipeng1 @ 2018-10-02 23:58 UTC (permalink / raw)
To: Van Haaren, Harry, Honnappa Nagarahalli, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Van Haaren, Harry
>> > > > > /**
>> > > > > * Add a key to an existing hash table.
>> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> > > > >const void
>> > > *key);
>> > > > > * array of user data. This value is unique for this key.
>> > > > > */
>> > > > > int32_t
>> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
>> > > > >*key,
>> > > hash_sig_t sig);
>> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
>> > > hash_sig_t sig);
>> > > > >
>> > > > > /
>> > > >
>> > > > I think the above changes will break ABI by changing the parameter
>> type?
>> > > Other people may know better on this.
>> > >
>> > > Just removing a const should not change the ABI, I believe, since the
>> > > const is just advisory hint to the compiler. Actual parameter size and
>> > > count remains unchanged so I don't believe there is an issue.
>> > > [ABI experts, please correct me if I'm wrong on this]
>> >
>> >
>> > [Certainly no ABI expert, but...]
>> >
>> > I think this is an API break, not ABI break.
>> >
>> > Given application code as follows, it will fail to compile - even though
>> running
>> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >
>> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > /* parameter passed in is const, but updated function prototype is
>> non-
>> > const */
>> > rte_hash_add_key_with_hash(h, ...);
>> > }
>> >
>> > This means that we can't recompile apps against latest patch without
>> > application code changes, if the app was passing a const rte_hash struct
>> as
>> > the first parameter.
>> >
>> Agree. Do we need to do anything for this?
>
>I think we should try to avoid breaking API wherever possible.
>If we must, then I suppose we could follow the ABI process of
>a deprecation notice.
>
>From my reading of the versioning docs, it doesn't document this case:
>https://doc.dpdk.org/guides/contributing/versioning.html
>
>I don't recall a similar situation in DPDK previously - so I suggest
>you ask Tech board for input here.
>
>Hope that helps! -Harry
[Wang, Yipeng]
Honnappa, how about use a pointer to the counter in the rte_hash struct instead of the counter? Will this avoid
API change?
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses
@ 2018-10-03 12:43 3% ` Burakov, Anatoly
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-10-03 12:43 UTC (permalink / raw)
To: Alejandro Lucero, dev; +Cc: stable
On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> A device can suffer addressing limitations. This functions checks
> memsegs have iovas within the supported range based on dma mask.
>
> PMD should use this during initialization if supported devices
> suffer addressing limitations, returning an error if this function
> returns memsegs out of range.
>
> Another potential usage is for emulated IOMMU hardware with addressing
> limitations.
>
> It is necessary to save the most restricted dma mask for checking
> memory allocated dynamically after initialization.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
> lib/librte_eal/common/eal_common_memory.c | 56 +++++++++++++++++++++++
> lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> lib/librte_eal/common/include/rte_memory.h | 3 ++
> lib/librte_eal/common/malloc_heap.c | 12 +++++
> lib/librte_eal/linuxapp/eal/eal.c | 2 +
> lib/librte_eal/rte_eal_version.map | 1 +
> 6 files changed, 77 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index fbfb1b0..bdd8f44 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -383,6 +383,62 @@ struct virtiova {
> rte_memseg_walk(dump_memseg, f);
> }
>
> +static int
> +check_iova(const struct rte_memseg_list *msl __rte_unused,
> + const struct rte_memseg *ms, void *arg)
> +{
> + uint64_t *mask = arg;
> + rte_iova_t iova;
> +
> + /* higher address within segment */
> + iova = (ms->iova + ms->len) - 1;
> + if (!(iova & *mask))
> + return 0;
> +
> + RTE_LOG(INFO, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
> + ms->iova, ms->len);
> +
> + RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
IMO putting these as INFO is overkill. I'd prefer not to spam the output
unless it's really important. Can this go under DEBUG?
Also, the message is misleading. You stop before you have a chance to
check other masks, which may restrict them even further. You're
outputting the message about using DMA mask XXX but this may not be the
final DMA mask.
> + /* Stop the walk and change mask */
> + *mask = 0;
> + return 1;
> +}
> +
> +#if defined(RTE_ARCH_64)
> +#define MAX_DMA_MASK_BITS 63
> +#else
> +#define MAX_DMA_MASK_BITS 31
> +#endif
> +
> +/* check memseg iovas are within the required range based on dma mask */
> +int __rte_experimental
> +rte_eal_check_dma_mask(uint8_t maskbits)
> +{
> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> + uint64_t mask;
> +
> + /* sanity check */
> + if (maskbits > MAX_DMA_MASK_BITS) {
> + RTE_LOG(INFO, EAL, "wrong dma mask size %u (Max: %u)\n",
> + maskbits, MAX_DMA_MASK_BITS);
Should be ERR, not INFO.
> + return -1;
> + }
> +
> + /* keep the more restricted maskbit */
> + if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
> + mcfg->dma_maskbits = maskbits;
Do we need to modify mcfg->dma_maskbits before we know if we're going to
fail? Suggest using a local variable maybe?
Also, i think it's a good case for ternary:
bits = mcfg->dma_maskbits == 0 ?
maskbits :
RTE_MIN(maskbits, mcfg->dma_maskbits);
IMO the intention looks much clearer.
> +
> + /* create dma mask */
> + mask = ~((1ULL << maskbits) - 1);
> +
> + rte_memseg_walk(check_iova, &mask);
> +
> + if (!mask)
> + return -1;
> +
> + return 0;
> +}
> +
> /* return the number of memory channels */
> unsigned rte_memory_get_nchannel(void)
> {
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688..aea44cb 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -77,6 +77,9 @@ struct rte_mem_config {
> * exact same address the primary process maps it.
> */
> uint64_t mem_cfg_addr;
> +
> + /* keeps the more restricted dma mask */
> + uint8_t dma_maskbits;
This needs to be documented as an ABI break in the 18.11 release notes.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 14:34 0% ` Ananyev, Konstantin
@ 2018-10-03 16:18 0% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-03 16:18 UTC (permalink / raw)
To: Ananyev, Konstantin, Thomas Monjalon; +Cc: dev
On Tue, 2018-09-25 at 14:34 +0000, Ananyev, Konstantin wrote:
> Hi Luca,
>
> >
> > On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > > 24/08/2018 18:47, Konstantin Ananyev:
> > > > If user specifies priority=0 for some of ACL rules
> > > > that can cause rte_acl_classify to return wrong results.
> > > > The reason is that priority zero is used internally for no-
> > > > match
> > > > nodes.
> > > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > > The simplest way to overcome the issue is just not allow zero
> > > > to be a valid priority for the rule.
> > > >
> > > > Fixes: dc276b5780c2 ("acl: new library")
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com
> > > > >
> > >
> > > Cc: stable@dpdk.org
> > >
> > > Applied with below title, thanks
> > > acl: forbid rule with priority zero
> >
> > Hi,
> >
> > This patch is marked for stable, but it changes an enum in a public
> > header
>
> Yes it does.
>
> > so it looks like an ABI breakage? Have I got it wrong?
>
> Strictly speaking - yes, but priority=0 is invalid value with current
> implementation.
> I don't think someone uses it - as in that case acl library simply
> wouldn't work
> correctly.
> Konstantin
Ok, I'll include this patch in 16.11.9 then, thanks for clarifying.
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 23:58 0% ` Wang, Yipeng1
@ 2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-04 3:54 0% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-03 17:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >-----Original Message-----
> >From: Van Haaren, Harry
> >> > > > > /**
> >> > > > > * Add a key to an existing hash table.
> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> >> > > > >const void
> >> > > *key);
> >> > > > > * array of user data. This value is unique for this key.
> >> > > > > */
> >> > > > > int32_t
> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> > > > >void *key,
> >> > > hash_sig_t sig);
> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> > > > >+*key,
> >> > > hash_sig_t sig);
> >> > > > >
> >> > > > > /
> >> > > >
> >> > > > I think the above changes will break ABI by changing the
> >> > > > parameter
> >> type?
> >> > > Other people may know better on this.
> >> > >
> >> > > Just removing a const should not change the ABI, I believe, since
> >> > > the const is just advisory hint to the compiler. Actual parameter
> >> > > size and count remains unchanged so I don't believe there is an issue.
> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >
> >> >
> >> > [Certainly no ABI expert, but...]
> >> >
> >> > I think this is an API break, not ABI break.
> >> >
> >> > Given application code as follows, it will fail to compile - even
> >> > though
> >> running
> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >
> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> > /* parameter passed in is const, but updated function prototype
> >> > is
> >> non-
> >> > const */
> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >
> >> > This means that we can't recompile apps against latest patch
> >> > without application code changes, if the app was passing a const
> >> > rte_hash struct
> >> as
> >> > the first parameter.
> >> >
> >> Agree. Do we need to do anything for this?
> >
> >I think we should try to avoid breaking API wherever possible.
> >If we must, then I suppose we could follow the ABI process of a
> >deprecation notice.
> >
> >From my reading of the versioning docs, it doesn't document this case:
> >https://doc.dpdk.org/guides/contributing/versioning.html
> >
> >I don't recall a similar situation in DPDK previously - so I suggest
> >you ask Tech board for input here.
> >
> >Hope that helps! -Harry
> [Wang, Yipeng]
> Honnappa, how about use a pointer to the counter in the rte_hash struct
> instead of the counter? Will this avoid API change?
I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 0% ` Honnappa Nagarahalli
@ 2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
2018-10-04 3:54 0% ` Honnappa Nagarahalli
1 sibling, 2 replies; 200+ results
From: Wang, Yipeng1 @ 2018-10-03 17:56 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
@ 2018-10-03 22:05 0% ` Thomas Monjalon
2018-10-04 9:17 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-03 22:05 UTC (permalink / raw)
To: Anatoly Burakov; +Cc: dev, John McNamara, Marko Kovacevic
20/09/2018 17:41, Anatoly Burakov:
> Currently, command-line switches for legacy mem mode or single-file
> segments mode are only stored in internal config. This leads to a
> situation where these flags have to always match between primary
> and secondary, which is bad for usability.
>
> Fix this by storing these flags in the shared config as well, so
> that secondary process can know if the primary was launched in
> single-file segments or legacy mem mode.
>
> This bumps the EAL ABI, however there's an EAL deprecation notice
> already in place[1] for a different feature, so that's OK.
>
> [1] http://patches.dpdk.org/patch/43502/
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> v2:
> - Added documentation on ABI break
>
> doc/guides/rel_notes/rel_description.rst | 5 +++++
Removed change in this file (dup of release note).
> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
> .../common/include/rte_eal_memconfig.h | 4 ++++
> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
> lib/librte_eal/meson.build | 2 +-
> 6 files changed, 36 insertions(+), 3 deletions(-)
Applied (without extra note), thanks.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 3% ` Wang, Yipeng1
@ 2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Ola Liljedahl @ 2018-10-03 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Honnappa Nagarahalli, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, nd, Gobriel, Sameh
On 03/10/2018, 20:00, "Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
[Ola] Having only a single generation counter can easily become a scalability bottleneck due to write contention to the cache line.
Ideally you want each gen counter to be located in its own cache line (multiple gen counters in the same cache line will experience the same write contention). But that seems a bit wasteful of memory.
Ideally (in order to avoid accessing more cache lines), the gen counter should be located in the hash bucket. But as you write below, this would create problems for the layout of the hash bucket, there isn't any room for another field.
So I propose an array of gen. counters, indexed by the hash (of somekind) of primary and alternate hashes (signatures) of the element (modulo the array size). So another hash table.
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
@ 2018-10-03 23:10 4% ` Thomas Monjalon
2018-10-04 9:31 4% ` Gaëtan Rivet
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-03 23:10 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.
It will make possible to remove a rte_device,
using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless,
and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
drivers/bus/dpaa/dpaa_bus.c | 2 ++
drivers/bus/fslmc/fslmc_bus.c | 2 ++
drivers/bus/ifpga/ifpga_bus.c | 1 +
drivers/bus/pci/bsd/pci.c | 2 ++
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/private.h | 2 ++
drivers/bus/vdev/vdev.c | 1 +
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/private.h | 3 +++
lib/librte_eal/common/include/rte_dev.h | 1 +
11 files changed, 18 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d534bb71c..2c6791e5e 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -164,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: The structure ``rte_device`` got a new field to reference a ``rte_bus``.
+
Removed Items
-------------
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index 49cd04dbb..138e0f98d 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -165,6 +165,8 @@ dpaa_create_device_list(void)
goto cleanup;
}
+ dev->device.bus = &rte_dpaa_bus.bus;
+
cfg = &dpaa_netcfg->port_cfg[i];
fman_intf = cfg->fman_if;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index bfe81e236..960f55071 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -161,6 +161,8 @@ scan_one_fslmc_device(char *dev_name)
return -ENOMEM;
}
+ dev->device.bus = &rte_fslmc_bus.bus;
+
/* Parse the device name and ID */
t_ptr = strtok(dup_dev_name, ".");
if (!t_ptr) {
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 3ef035b7e..80663328a 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -142,6 +142,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
if (!afu_dev)
goto end;
+ afu_dev->device.bus = &rte_ifpga_bus;
afu_dev->device.devargs = devargs;
afu_dev->device.numa_node = SOCKET_ID_ANY;
afu_dev->device.name = devargs->name;
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b7e..40641cad4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -223,6 +223,8 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
+
dev->addr.domain = conf->pc_sel.pc_domain;
dev->addr.bus = conf->pc_sel.pc_bus;
dev->addr.devid = conf->pc_sel.pc_dev;
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..e31bbb370 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -228,6 +228,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
return -1;
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
/* get vendor id */
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa74..04bffa6e7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -15,6 +15,8 @@ extern struct rte_pci_bus rte_pci_bus;
struct rte_pci_driver;
struct rte_pci_device;
+extern struct rte_pci_bus rte_pci_bus;
+
/**
* Probe the PCI bus
*
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index efca962f7..0142fb2c8 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -456,6 +456,7 @@ vdev_scan(void)
continue;
}
+ dev->device.bus = &rte_vdev_bus;
dev->device.devargs = devargs;
dev->device.numa_node = SOCKET_ID_ANY;
dev->device.name = devargs->name;
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
index 527a6a39f..a4755a387 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -229,6 +229,7 @@ vmbus_scan_one(const char *name)
if (dev == NULL)
return -1;
+ dev->device.bus = &rte_vmbus_bus.bus;
dev->device.name = strdup(name);
if (!dev->device.name)
goto error;
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index f2022a68c..211127dd8 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -10,11 +10,14 @@
#include <sys/uio.h>
#include <rte_log.h>
#include <rte_vmbus_reg.h>
+#include <rte_bus_vmbus.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
extern int vmbus_logtype_bus;
#define VMBUS_LOG(level, fmt, args...) \
rte_log(RTE_LOG_ ## level, vmbus_logtype_bus, "%s(): " fmt "\n", \
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a80598..d82cba847 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -157,6 +157,7 @@ struct rte_device {
TAILQ_ENTRY(rte_device) next; /**< Next device */
const char *name; /**< Device name */
const struct rte_driver *driver;/**< Associated driver */
+ const struct rte_bus *bus; /**< Bus handle assigned on scan */
int numa_node; /**< NUMA node connection */
struct rte_devargs *devargs; /**< Device user arguments */
};
--
2.19.0
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
@ 2018-10-04 3:32 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-04 3:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh,
Honnappa Nagarahalli
> >
> >> >-----Original Message-----
> >> >From: Van Haaren, Harry
> >> >> > > > > /**
> >> >> > > > > * Add a key to an existing hash table.
> >> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> >> >> > > > >*h, const void
> >> >> > > *key);
> >> >> > > > > * array of user data. This value is unique for this key.
> >> >> > > > > */
> >> >> > > > > int32_t
> >> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> >> > > > >void *key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> >> > > > >+*key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >
> >> >> > > > > /
> >> >> > > >
> >> >> > > > I think the above changes will break ABI by changing the
> >> >> > > > parameter
> >> >> type?
> >> >> > > Other people may know better on this.
> >> >> > >
> >> >> > > Just removing a const should not change the ABI, I believe,
> >> >> > > since the const is just advisory hint to the compiler. Actual
> >> >> > > parameter size and count remains unchanged so I don't believe
> there is an issue.
> >> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >> >
> >> >> >
> >> >> > [Certainly no ABI expert, but...]
> >> >> >
> >> >> > I think this is an API break, not ABI break.
> >> >> >
> >> >> > Given application code as follows, it will fail to compile -
> >> >> > even though
> >> >> running
> >> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >> >
> >> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> >> > /* parameter passed in is const, but updated function
> >> >> > prototype is
> >> >> non-
> >> >> > const */
> >> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >> >
> >> >> > This means that we can't recompile apps against latest patch
> >> >> > without application code changes, if the app was passing a const
> >> >> > rte_hash struct
> >> >> as
> >> >> > the first parameter.
> >> >> >
> >> >> Agree. Do we need to do anything for this?
> >> >
> >> >I think we should try to avoid breaking API wherever possible.
> >> >If we must, then I suppose we could follow the ABI process of a
> >> >deprecation notice.
> >> >
> >> >From my reading of the versioning docs, it doesn't document this case:
> >> >https://doc.dpdk.org/guides/contributing/versioning.html
> >> >
> >> >I don't recall a similar situation in DPDK previously - so I suggest
> >> >you ask Tech board for input here.
> >> >
> >> >Hope that helps! -Harry
> >> [Wang, Yipeng]
> >> Honnappa, how about use a pointer to the counter in the rte_hash
> >> struct instead of the counter? Will this avoid API change?
> >I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
> >IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> >We can also solve this by having counters on the bucket. I was planning
> >to do this little bit later. I will look at the effort involved and may be do it
> now.
> [Wang, Yipeng]
> I think with ABI/API change, you might need to announce it one release cycle
> ahead.
>
> In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding
> with CUCKOOSWITCH it separates the version counter array and the hash
> table. You can strike a balance between granularity of the version counter and
> the cache/memory requirement.
> Is it a better way?
This will introduce another cache line access. It would be good to stay within the single cacheline.
>
> Another consideration is current bucket is 64-byte exactly with the partial-
> key-hashing.
> To add another counter, we need to think about changing certain variables to
> still align cache line.
The 'flags' structure member is not being used. I plan to remove that. That will give us 8B, I will use 4B out of it for the counter.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
@ 2018-10-04 3:54 0% ` Honnappa Nagarahalli
2018-10-04 19:16 0% ` Wang, Yipeng1
1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-10-04 3:54 UTC (permalink / raw)
To: Honnappa Nagarahalli, Wang, Yipeng1, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>
> > >-----Original Message-----
> > >From: Van Haaren, Harry
> > >> > > > > /**
> > >> > > > > * Add a key to an existing hash table.
> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> > >> > > > >*h, const void
> > >> > > *key);
> > >> > > > > * array of user data. This value is unique for this key.
> > >> > > > > */
> > >> > > > > int32_t
> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> > >> > > > >void *key,
> > >> > > hash_sig_t sig);
> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> > >> > > > >+*key,
> > >> > > hash_sig_t sig);
> > >> > > > >
> > >> > > > > /
> > >> > > >
> > >> > > > I think the above changes will break ABI by changing the
> > >> > > > parameter
> > >> type?
> > >> > > Other people may know better on this.
> > >> > >
> > >> > > Just removing a const should not change the ABI, I believe,
> > >> > > since the const is just advisory hint to the compiler. Actual
> > >> > > parameter size and count remains unchanged so I don't believe there
> is an issue.
> > >> > > [ABI experts, please correct me if I'm wrong on this]
> > >> >
> > >> >
> > >> > [Certainly no ABI expert, but...]
> > >> >
> > >> > I think this is an API break, not ABI break.
> > >> >
> > >> > Given application code as follows, it will fail to compile - even
> > >> > though
> > >> running
> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
> > >> >
> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > >> > /* parameter passed in is const, but updated function
> > >> > prototype is
> > >> non-
> > >> > const */
> > >> > rte_hash_add_key_with_hash(h, ...); }
> > >> >
> > >> > This means that we can't recompile apps against latest patch
> > >> > without application code changes, if the app was passing a const
> > >> > rte_hash struct
> > >> as
> > >> > the first parameter.
> > >> >
> > >> Agree. Do we need to do anything for this?
> > >
> > >I think we should try to avoid breaking API wherever possible.
> > >If we must, then I suppose we could follow the ABI process of a
> > >deprecation notice.
> > >
> > >From my reading of the versioning docs, it doesn't document this case:
> > >https://doc.dpdk.org/guides/contributing/versioning.html
> > >
> > >I don't recall a similar situation in DPDK previously - so I suggest
> > >you ask Tech board for input here.
> > >
> > >Hope that helps! -Harry
> > [Wang, Yipeng]
> > Honnappa, how about use a pointer to the counter in the rte_hash
> > struct instead of the counter? Will this avoid API change?
> I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the pointer in the structure. Please correct me if I am wrong.
This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash table.
> IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> We can also solve this by having counters on the bucket. I was planning to do
> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-03 22:05 0% ` Thomas Monjalon
@ 2018-10-04 9:17 0% ` Burakov, Anatoly
2018-10-04 9:18 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-10-04 9:17 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, John McNamara, Marko Kovacevic
On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
> 20/09/2018 17:41, Anatoly Burakov:
>> Currently, command-line switches for legacy mem mode or single-file
>> segments mode are only stored in internal config. This leads to a
>> situation where these flags have to always match between primary
>> and secondary, which is bad for usability.
>>
>> Fix this by storing these flags in the shared config as well, so
>> that secondary process can know if the primary was launched in
>> single-file segments or legacy mem mode.
>>
>> This bumps the EAL ABI, however there's an EAL deprecation notice
>> already in place[1] for a different feature, so that's OK.
>>
>> [1] http://patches.dpdk.org/patch/43502/
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>> v2:
>> - Added documentation on ABI break
>>
>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>
> Removed change in this file (dup of release note).
>
>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>> .../common/include/rte_eal_memconfig.h | 4 ++++
>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>> lib/librte_eal/meson.build | 2 +-
>> 6 files changed, 36 insertions(+), 3 deletions(-)
>
> Applied (without extra note), thanks.
>
This will probably break external mem patches due to conflict in release
notes. Should i respin?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 9:17 0% ` Burakov, Anatoly
@ 2018-10-04 9:18 0% ` Thomas Monjalon
2018-10-04 10:46 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 9:18 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: dev, John McNamara, Marko Kovacevic
04/10/2018 11:17, Burakov, Anatoly:
> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
> > 20/09/2018 17:41, Anatoly Burakov:
> >> Currently, command-line switches for legacy mem mode or single-file
> >> segments mode are only stored in internal config. This leads to a
> >> situation where these flags have to always match between primary
> >> and secondary, which is bad for usability.
> >>
> >> Fix this by storing these flags in the shared config as well, so
> >> that secondary process can know if the primary was launched in
> >> single-file segments or legacy mem mode.
> >>
> >> This bumps the EAL ABI, however there's an EAL deprecation notice
> >> already in place[1] for a different feature, so that's OK.
> >>
> >> [1] http://patches.dpdk.org/patch/43502/
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>
> >> Notes:
> >> v2:
> >> - Added documentation on ABI break
> >>
> >> doc/guides/rel_notes/rel_description.rst | 5 +++++
> >
> > Removed change in this file (dup of release note).
> >
> >> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
> >> .../common/include/rte_eal_memconfig.h | 4 ++++
> >> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> >> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
> >> lib/librte_eal/meson.build | 2 +-
> >> 6 files changed, 36 insertions(+), 3 deletions(-)
> >
> > Applied (without extra note), thanks.
> >
>
> This will probably break external mem patches due to conflict in release
> notes. Should i respin?
No, conflicts in release notes are usual. I manage such conflict myself.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
2018-10-03 23:10 4% ` [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure Thomas Monjalon
@ 2018-10-04 9:31 4% ` Gaëtan Rivet
2018-10-04 9:48 3% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Gaëtan Rivet @ 2018-10-04 9:31 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
On Thu, Oct 04, 2018 at 01:10:44AM +0200, Thomas Monjalon wrote:
> When a device is added with a devargs (hotplug or whitelist),
> the bus pointer can be retrieved via its devargs.
> But there is no such devargs.bus in case of standard scan.
>
> A pointer to the rte_bus handle is added to rte_device.
> When a device is allocated (during a scan),
> the pointer to its bus is assigned.
>
> It will make possible to remove a rte_device,
> using the function pointer from its bus.
>
> The function rte_bus_find_by_device() becomes useless,
> and may be removed later.
>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
I agree with this change, but I think this can break ABI of
buses defining their structure by composition with rte_device (e.g. PCI
bus). Have you checked ABI?
Personally I don't care, I prefer a clean framework to a littered lib.
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
2018-10-04 9:31 4% ` Gaëtan Rivet
@ 2018-10-04 9:48 3% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-04 9:48 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: dev, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor, Rosen Xu,
Hemant Agrawal, Shreyansh Jain, Stephen Hemminger
04/10/2018 11:31, Gaëtan Rivet:
> On Thu, Oct 04, 2018 at 01:10:44AM +0200, Thomas Monjalon wrote:
> > When a device is added with a devargs (hotplug or whitelist),
> > the bus pointer can be retrieved via its devargs.
> > But there is no such devargs.bus in case of standard scan.
> >
> > A pointer to the rte_bus handle is added to rte_device.
> > When a device is allocated (during a scan),
> > the pointer to its bus is assigned.
> >
> > It will make possible to remove a rte_device,
> > using the function pointer from its bus.
> >
> > The function rte_bus_find_by_device() becomes useless,
> > and may be removed later.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>
> I agree with this change, but I think this can break ABI of
> buses defining their structure by composition with rte_device (e.g. PCI
> bus). Have you checked ABI?
Yes I forgot it changes the size of the bus structures.
I can spin a v6 with a bump of ABI version of the bus drivers,
and an additional note in release notes.
> Personally I don't care, I prefer a clean framework to a littered lib.
>
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Adding bus drivers maintainers to get more opinions.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 9:18 0% ` Thomas Monjalon
@ 2018-10-04 10:46 0% ` Ferruh Yigit
2018-10-05 9:04 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 10:46 UTC (permalink / raw)
To: Thomas Monjalon, Burakov, Anatoly; +Cc: dev, John McNamara, Marko Kovacevic
On 10/4/2018 10:18 AM, Thomas Monjalon wrote:
> 04/10/2018 11:17, Burakov, Anatoly:
>> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
>>> 20/09/2018 17:41, Anatoly Burakov:
>>>> Currently, command-line switches for legacy mem mode or single-file
>>>> segments mode are only stored in internal config. This leads to a
>>>> situation where these flags have to always match between primary
>>>> and secondary, which is bad for usability.
>>>>
>>>> Fix this by storing these flags in the shared config as well, so
>>>> that secondary process can know if the primary was launched in
>>>> single-file segments or legacy mem mode.
>>>>
>>>> This bumps the EAL ABI, however there's an EAL deprecation notice
>>>> already in place[1] for a different feature, so that's OK.
>>>>
>>>> [1] http://patches.dpdk.org/patch/43502/
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> ---
>>>>
>>>> Notes:
>>>> v2:
>>>> - Added documentation on ABI break
>>>>
>>>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>>>
>>> Removed change in this file (dup of release note).
>>>
>>>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>>>> .../common/include/rte_eal_memconfig.h | 4 ++++
>>>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>>>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>>>> lib/librte_eal/meson.build | 2 +-
>>>> 6 files changed, 36 insertions(+), 3 deletions(-)
>>>
>>> Applied (without extra note), thanks.
>>>
>>
>> This will probably break external mem patches due to conflict in release
>> notes. Should i respin?
>
> No, conflicts in release notes are usual. I manage such conflict myself.
It is common to have conflict in release notes and as Thomas said we resolve it
manually but now this is causing problem in automated per patch tests because
patch can't be applied.
We should think about a way to prevent these conflicts.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] Fwd: [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
@ 2018-10-04 12:59 0% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-04 12:59 UTC (permalink / raw)
To: dev
I sent this email only to Anatoly. Sending it again to mailing list.
On Wed, Oct 3, 2018 at 1:43 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > A device can suffer addressing limitations. This functions checks
> > memsegs have iovas within the supported range based on dma mask.
> >
> > PMD should use this during initialization if supported devices
> > suffer addressing limitations, returning an error if this function
> > returns memsegs out of range.
> >
> > Another potential usage is for emulated IOMMU hardware with addressing
> > limitations.
> >
> > It is necessary to save the most restricted dma mask for checking
> > memory allocated dynamically after initialization.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > ---
> > lib/librte_eal/common/eal_common_memory.c | 56
> +++++++++++++++++++++++
> > lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> > lib/librte_eal/common/include/rte_memory.h | 3 ++
> > lib/librte_eal/common/malloc_heap.c | 12 +++++
> > lib/librte_eal/linuxapp/eal/eal.c | 2 +
> > lib/librte_eal/rte_eal_version.map | 1 +
> > 6 files changed, 77 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> > index fbfb1b0..bdd8f44 100644
> > --- a/lib/librte_eal/common/eal_common_memory.c
> > +++ b/lib/librte_eal/common/eal_common_memory.c
> > @@ -383,6 +383,62 @@ struct virtiova {
> > rte_memseg_walk(dump_memseg, f);
> > }
> >
> > +static int
> > +check_iova(const struct rte_memseg_list *msl __rte_unused,
> > + const struct rte_memseg *ms, void *arg)
> > +{
> > + uint64_t *mask = arg;
> > + rte_iova_t iova;
> > +
> > + /* higher address within segment */
> > + iova = (ms->iova + ms->len) - 1;
> > + if (!(iova & *mask))
> > + return 0;
> > +
> > + RTE_LOG(INFO, EAL, "memseg iova %"PRIx64", len %zx, out of
> range\n",
> > + ms->iova, ms->len);
> > +
> > + RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
>
> IMO putting these as INFO is overkill. I'd prefer not to spam the output
> unless it's really important. Can this go under DEBUG?
>
>
This checks comes from a device or from the alloc_pages_on_heap when
expanding memory. If the check discovers an address out of mask, a device
can not be used or the new memory can not be allocated. I think having this
info will help to understand why the device initialization or the memory
allocation are failing.
> Also, the message is misleading. You stop before you have a chance to
> check other masks, which may restrict them even further. You're
> outputting the message about using DMA mask XXX but this may not be the
> final DMA mask.
>
Well, this is the first triggering, and it is enough for reporting the
problem and avoiding the device or the new memory to be used.
Note that the mask is per device, and for the memory allocation case, it is
the most restrictive dma mask. So there are no other masks to try.
>
> > + /* Stop the walk and change mask */
> > + *mask = 0;
> > + return 1;
> > +}
> > +
> > +#if defined(RTE_ARCH_64)
> > +#define MAX_DMA_MASK_BITS 63
> > +#else
> > +#define MAX_DMA_MASK_BITS 31
> > +#endif
> > +
> > +/* check memseg iovas are within the required range based on dma mask */
> > +int __rte_experimental
> > +rte_eal_check_dma_mask(uint8_t maskbits)
> > +{
> > + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> > + uint64_t mask;
> > +
> > + /* sanity check */
> > + if (maskbits > MAX_DMA_MASK_BITS) {
> > + RTE_LOG(INFO, EAL, "wrong dma mask size %u (Max: %u)\n",
> > + maskbits, MAX_DMA_MASK_BITS);
>
> Should be ERR, not INFO.
>
>
Right. I will change it.
> > + return -1;
> > + }
> > +
> > + /* keep the more restricted maskbit */
> > + if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
> > + mcfg->dma_maskbits = maskbits;
>
> Do we need to modify mcfg->dma_maskbits before we know if we're going to
> fail? Suggest using a local variable maybe?
>
>
Yes, that's true. If the check fails, the device will not be used therefore
we do not need to keep that dma mask at all.
I will change the order here.
Thanks!
> Also, i think it's a good case for ternary:
>
> bits = mcfg->dma_maskbits == 0 ?
> maskbits :
> RTE_MIN(maskbits, mcfg->dma_maskbits);
>
> IMO the intention looks much clearer.
>
>
Agree.
> > +
> > + /* create dma mask */
> > + mask = ~((1ULL << maskbits) - 1);
> > +
> > + rte_memseg_walk(check_iova, &mask);
> > +
> > + if (!mask)
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > /* return the number of memory channels */
> > unsigned rte_memory_get_nchannel(void)
> > {
> > diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > index aff0688..aea44cb 100644
> > --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > @@ -77,6 +77,9 @@ struct rte_mem_config {
> > * exact same address the primary process maps it.
> > */
> > uint64_t mem_cfg_addr;
> > +
> > + /* keeps the more restricted dma mask */
> > + uint8_t dma_maskbits;
>
> This needs to be documented as an ABI break in the 18.11 release notes.
>
>
Ok. I'll add that in the next version.
Thanks
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default
@ 2018-10-04 15:43 9% ` Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:43 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ferruh Yigit, Neil Horman, Luca Boccassi, Christian Ehrhardt
Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
the current release and these APIs are targetted for further release.
RTE_NEXT_ABI shouldn't be enabled by default.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
config/common_base | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/config/common_base b/config/common_base
index 2e888d13b..dbd0c9ae9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
#
# Use newest code breaking previous ABI
#
-CONFIG_RTE_NEXT_ABI=y
+CONFIG_RTE_NEXT_ABI=n
#
# Major ABI to overwrite library specific LIBABIVER
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
@ 2018-10-04 15:48 9% ` Ferruh Yigit
2018-10-04 15:10 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:48 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ferruh Yigit, Neil Horman, Luca Boccassi, Christian Ehrhardt
Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
the current release and these APIs are targeted for further release.
RTE_NEXT_ABI shouldn't be enabled by default.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
v2:
* fix typo in commit log
---
config/common_base | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/config/common_base b/config/common_base
index 2e888d13b..dbd0c9ae9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
#
# Use newest code breaking previous ABI
#
-CONFIG_RTE_NEXT_ABI=y
+CONFIG_RTE_NEXT_ABI=n
#
# Major ABI to overwrite library specific LIBABIVER
--
2.17.1
^ permalink raw reply [relevance 9%]
* Re: [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
@ 2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-04 14:49 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, Neil Horman, Christian Ehrhardt
On Thu, 2018-10-04 at 16:43 +0100, Ferruh Yigit wrote:
> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> the current release and these APIs are targetted for further release.
>
> RTE_NEXT_ABI shouldn't be enabled by default.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---
> config/common_base | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/config/common_base b/config/common_base
> index 2e888d13b..dbd0c9ae9 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
> #
> # Use newest code breaking previous ABI
> #
> -CONFIG_RTE_NEXT_ABI=y
> +CONFIG_RTE_NEXT_ABI=n
>
> #
> # Major ABI to overwrite library specific LIBABIVER
Acked-by: Luca Boccassi <bluca@debian.org>
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
@ 2018-10-04 15:10 0% ` Thomas Monjalon
2018-10-04 15:28 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 15:10 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
04/10/2018 17:48, Ferruh Yigit:
> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> the current release and these APIs are targeted for further release.
It seems nobody is using it in last releases.
> RTE_NEXT_ABI shouldn't be enabled by default.
The reason for having it enabled by default is that when you build DPDK
yourself, you probably want the latest features.
If packaged properly for stability, it is easy to disable it in
the package recipe.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:10 0% ` Thomas Monjalon
@ 2018-10-04 15:28 0% ` Ferruh Yigit
2018-10-04 15:55 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:28 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> 04/10/2018 17:48, Ferruh Yigit:
>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>> the current release and these APIs are targeted for further release.
>
> It seems nobody is using it in last releases.
>
>> RTE_NEXT_ABI shouldn't be enabled by default.
>
> The reason for having it enabled by default is that when you build DPDK
> yourself, you probably want the latest features.
> If packaged properly for stability, it is easy to disable it in
> the package recipe.
My concern was (if this has been used), user may get unstable APIs and without
explicitly being aware of it.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:28 0% ` Ferruh Yigit
@ 2018-10-04 15:55 0% ` Thomas Monjalon
2018-10-05 9:13 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 15:55 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
04/10/2018 17:28, Ferruh Yigit:
> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> > 04/10/2018 17:48, Ferruh Yigit:
> >> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> >> the current release and these APIs are targeted for further release.
> >
> > It seems nobody is using it in last releases.
> >
> >> RTE_NEXT_ABI shouldn't be enabled by default.
> >
> > The reason for having it enabled by default is that when you build DPDK
> > yourself, you probably want the latest features.
> > If packaged properly for stability, it is easy to disable it in
> > the package recipe.
>
> My concern was (if this has been used), user may get unstable APIs and without
> explicitly being aware of it.
I am OK with both defaults (enabled or disabled).
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-04 3:54 0% ` Honnappa Nagarahalli
@ 2018-10-04 19:16 0% ` Wang, Yipeng1
0 siblings, 0 replies; 200+ results
From: Wang, Yipeng1 @ 2018-10-04 19:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 8:54 PM
>To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>>
>> > >-----Original Message-----
>> > >From: Van Haaren, Harry
>> > >> > > > > /**
>> > >> > > > > * Add a key to an existing hash table.
>> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
>> > >> > > > >*h, const void
>> > >> > > *key);
>> > >> > > > > * array of user data. This value is unique for this key.
>> > >> > > > > */
>> > >> > > > > int32_t
>> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> > >> > > > >void *key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> > >> > > > >+*key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >
>> > >> > > > > /
>> > >> > > >
>> > >> > > > I think the above changes will break ABI by changing the
>> > >> > > > parameter
>> > >> type?
>> > >> > > Other people may know better on this.
>> > >> > >
>> > >> > > Just removing a const should not change the ABI, I believe,
>> > >> > > since the const is just advisory hint to the compiler. Actual
>> > >> > > parameter size and count remains unchanged so I don't believe there
>> is an issue.
>> > >> > > [ABI experts, please correct me if I'm wrong on this]
>> > >> >
>> > >> >
>> > >> > [Certainly no ABI expert, but...]
>> > >> >
>> > >> > I think this is an API break, not ABI break.
>> > >> >
>> > >> > Given application code as follows, it will fail to compile - even
>> > >> > though
>> > >> running
>> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> > >> >
>> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > >> > /* parameter passed in is const, but updated function
>> > >> > prototype is
>> > >> non-
>> > >> > const */
>> > >> > rte_hash_add_key_with_hash(h, ...); }
>> > >> >
>> > >> > This means that we can't recompile apps against latest patch
>> > >> > without application code changes, if the app was passing a const
>> > >> > rte_hash struct
>> > >> as
>> > >> > the first parameter.
>> > >> >
>> > >> Agree. Do we need to do anything for this?
>> > >
>> > >I think we should try to avoid breaking API wherever possible.
>> > >If we must, then I suppose we could follow the ABI process of a
>> > >deprecation notice.
>> > >
>> > >From my reading of the versioning docs, it doesn't document this case:
>> > >https://doc.dpdk.org/guides/contributing/versioning.html
>> > >
>> > >I don't recall a similar situation in DPDK previously - so I suggest
>> > >you ask Tech board for input here.
>> > >
>> > >Hope that helps! -Harry
>> > [Wang, Yipeng]
>> > Honnappa, how about use a pointer to the counter in the rte_hash
>> > struct instead of the counter? Will this avoid API change?
>> I think it defeats the purpose of 'const' parameter to the API and provides
>> incorrect information to the user.
>Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the
>pointer in the structure. Please correct me if I am wrong.
>This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash
>table.
[Wang, Yipeng] Yeah that is what I meant. It is an additional memory access but probably it will be in local cache.
Since time is tight, it could be a simple workaround for this version and in future you can extend this pointed counter to a counter array as Ola suggested and the
Cuckooo switch paper did for scaling issue.
>
>> IMO, DPDK should have guidelines on how to handle the API compatibility
>> breaks. I will send an email to tech board on this.
>> We can also solve this by having counters on the bucket. I was planning to do
>> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 10:46 0% ` Ferruh Yigit
@ 2018-10-05 9:04 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-05 9:04 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, John McNamara, Marko Kovacevic
On 04-Oct-18 11:46 AM, Ferruh Yigit wrote:
> On 10/4/2018 10:18 AM, Thomas Monjalon wrote:
>> 04/10/2018 11:17, Burakov, Anatoly:
>>> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
>>>> 20/09/2018 17:41, Anatoly Burakov:
>>>>> Currently, command-line switches for legacy mem mode or single-file
>>>>> segments mode are only stored in internal config. This leads to a
>>>>> situation where these flags have to always match between primary
>>>>> and secondary, which is bad for usability.
>>>>>
>>>>> Fix this by storing these flags in the shared config as well, so
>>>>> that secondary process can know if the primary was launched in
>>>>> single-file segments or legacy mem mode.
>>>>>
>>>>> This bumps the EAL ABI, however there's an EAL deprecation notice
>>>>> already in place[1] for a different feature, so that's OK.
>>>>>
>>>>> [1] http://patches.dpdk.org/patch/43502/
>>>>>
>>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> ---
>>>>>
>>>>> Notes:
>>>>> v2:
>>>>> - Added documentation on ABI break
>>>>>
>>>>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>>>>
>>>> Removed change in this file (dup of release note).
>>>>
>>>>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>>>>> .../common/include/rte_eal_memconfig.h | 4 ++++
>>>>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>>>>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>>>>> lib/librte_eal/meson.build | 2 +-
>>>>> 6 files changed, 36 insertions(+), 3 deletions(-)
>>>>
>>>> Applied (without extra note), thanks.
>>>>
>>>
>>> This will probably break external mem patches due to conflict in release
>>> notes. Should i respin?
>>
>> No, conflicts in release notes are usual. I manage such conflict myself.
>
> It is common to have conflict in release notes and as Thomas said we resolve it
> manually but now this is causing problem in automated per patch tests because
> patch can't be applied.
>
> We should think about a way to prevent these conflicts.
>
How about just ignore them? 'git status' will show you which particular
files cause conflicts. if it's anything in the doc/ directory, it's safe
to 'git add' those files and proceed with rebase/apply, no?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:55 0% ` Thomas Monjalon
@ 2018-10-05 9:13 0% ` Bruce Richardson
2018-10-05 10:17 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-05 9:13 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Ferruh Yigit, dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
> 04/10/2018 17:28, Ferruh Yigit:
> > On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> > > 04/10/2018 17:48, Ferruh Yigit:
> > >> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> > >> the current release and these APIs are targeted for further release.
> > >
> > > It seems nobody is using it in last releases.
> > >
> > >> RTE_NEXT_ABI shouldn't be enabled by default.
> > >
> > > The reason for having it enabled by default is that when you build DPDK
> > > yourself, you probably want the latest features.
> > > If packaged properly for stability, it is easy to disable it in
> > > the package recipe.
> >
> > My concern was (if this has been used), user may get unstable APIs and without
> > explicitly being aware of it.
>
> I am OK with both defaults (enabled or disabled).
>
I'd keep it as is. As said, I'm not sure it's being used right now anyway.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 9:13 0% ` Bruce Richardson
@ 2018-10-05 10:17 0% ` Ferruh Yigit
2018-10-05 11:30 3% ` Neil Horman
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-05 10:17 UTC (permalink / raw)
To: Bruce Richardson, Thomas Monjalon
Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On 10/5/2018 10:13 AM, Bruce Richardson wrote:
> On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
>> 04/10/2018 17:28, Ferruh Yigit:
>>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
>>>> 04/10/2018 17:48, Ferruh Yigit:
>>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>>>>> the current release and these APIs are targeted for further release.
>>>>
>>>> It seems nobody is using it in last releases.
>>>>
>>>>> RTE_NEXT_ABI shouldn't be enabled by default.
>>>>
>>>> The reason for having it enabled by default is that when you build DPDK
>>>> yourself, you probably want the latest features.
>>>> If packaged properly for stability, it is easy to disable it in
>>>> the package recipe.
>>>
>>> My concern was (if this has been used), user may get unstable APIs and without
>>> explicitly being aware of it.
>>
>> I am OK with both defaults (enabled or disabled).
>>
> I'd keep it as is. As said, I'm not sure it's being used right now anyway.
No, not used right now.
But I think we can use it, did you able to find chance to check:
https://mails.dpdk.org/archives/dev/2018-October/114372.html
Option D.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
@ 2018-10-05 11:05 0% ` Ananyev, Konstantin
2018-11-12 21:01 0% ` Trahe, Fiona
1 sibling, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-10-05 11:05 UTC (permalink / raw)
To: dev
Cc: De Lara Guarch, Pablo, Akhil Goyal, Doherty, Declan, Ravi Kumar,
Jerin Jacob, Zhang, Roy Fan, Trahe, Fiona, Tomasz Duszynski,
Hemant Agrawal, Natalie Samsonov, Dmitri Epshtein, Jay Zhou
Hi everyone,
> This RFC for proposes several changes inside rte_cryptodev_sym_session.
> Note that this is just RFC not a complete patch, so for now
> I modified only the librte_cryptodev itself,
> some cryptodev PMD, test-crypto-perf and ipsec-secgw example.
> Proposed changes means ABI/API breakage inside cryptodev,
> so looking for feedback from crypto-dev lib and crypto-PMD maintainiers.
> Below are details and reasoning for proposed changes.
>
> 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> operate based on cytpodev device id, though inside
> rte_cryptodev_sym_session device specific data is addressed
> by driver id (not device id).
> That creates a problem with current implementation when we have
> two or more devices with the same driver used by the same session.
> Consider the following example:
>
> struct rte_cryptodev_sym_session *sess;
> rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> rte_cryptodev_sym_session_clear(dev_id=X, sess);
>
> After that point if X and Y uses the same driver,
> then sess can't be used by device Y any more.
> The reason for that - driver specific (not device specific)
> data per session, plus there is no information
> how many device instances use that data.
> Probably the simplest way to deal with that issue -
> add a reference counter per each driver data.
>
> 2.rte_cryptodev_sym_session_set_user_data() and
> rte_cryptodev_sym_session_get_user_data() -
> with current implementation there is no defined way for the user to
> determine what is the max allowed size of the private data.
> Even within rte_cryptodev_sym_session_set_user_data() we just blindly
> copying user provided data without checking memory boundaries violation.
> To overcome that issue I added 'uint16_t priv_size' into
> rte_cryptodev_sym_session structure.
>
> 3.rte_cryptodev_sym_session contains an array of variable size for
> driver specific data.
> Though number of elements in that array is determined by static
> variable nb_drivers, that could be modified by
> rte_cryptodev_allocate_driver().
> That construction seems to work ok so far, as right now users register
> all their PMDs at startup, though it doesn't mean that it would always
> remain like that.
> To make it less error prone I added 'uint16_t nb_drivers' into the
> rte_cryptodev_sym_session structure.
> At least that allows related functions to check that provided
> driver id wouldn't overrun variable array boundaries,
> again it allows to determine size of already allocated session
> without accessing global variable.
>
> 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> would have sort of readonly type data (init once at allocation time,
> keep unmodified through session life-time).
> That requires more changes in current cryptodev implementation:
> Right now inside cryptodev framework both rte_cryptodev_sym_session
> and driver specific session data are two completely different sctrucures
> (e.g. struct struct null_crypto_session and struct null_crypto_session).
> Though current cryptodev implementation implicitly assumes that driver
> will allocate both of them from within the same mempool.
> Plus this is done in a manner that they override each other fields
> (reuse the same space - sort of implicit C union).
> That's probably not the best programming practice,
> plus make impossible to have readonly fields inside both of them.
> So to overcome that situation I changed an API a bit, to allow
> to use two different mempools for these two distinct data structures.
>
> 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> I suppose that self-explanatory, and might be used in a lot of places
> (would be quite useful for ipsec library we develop).
>
> So the new proposed layout for rte_cryptodev_sym_session:
> struct rte_cryptodev_sym_session {
> uint64_t userdata;
> /**< Can be used for external metadata */
> uint16_t nb_drivers;
> /**< number of elements in sess_data array */
> uint16_t priv_size;
> /**< session private data will be placed after sess_data */
> __extension__ struct {
> void *data;
> uint16_t refcnt;
> } sess_data[0];
> /**< Driver specific session material, variable size */
> };
>
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
Ok, didn't hear any objections, so far,
so I suppose everyone are ok in general with proposed changes.
Will go ahead with deprecation notice for 18.11 then.
Konstantin
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 10:17 0% ` Ferruh Yigit
@ 2018-10-05 11:30 3% ` Neil Horman
2018-10-05 12:35 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-10-05 11:30 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Bruce Richardson, Thomas Monjalon, dev, Luca Boccassi,
Christian Ehrhardt
On Fri, Oct 05, 2018 at 11:17:30AM +0100, Ferruh Yigit wrote:
> On 10/5/2018 10:13 AM, Bruce Richardson wrote:
> > On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
> >> 04/10/2018 17:28, Ferruh Yigit:
> >>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> >>>> 04/10/2018 17:48, Ferruh Yigit:
> >>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> >>>>> the current release and these APIs are targeted for further release.
> >>>>
> >>>> It seems nobody is using it in last releases.
> >>>>
> >>>>> RTE_NEXT_ABI shouldn't be enabled by default.
> >>>>
> >>>> The reason for having it enabled by default is that when you build DPDK
> >>>> yourself, you probably want the latest features.
> >>>> If packaged properly for stability, it is easy to disable it in
> >>>> the package recipe.
> >>>
> >>> My concern was (if this has been used), user may get unstable APIs and without
> >>> explicitly being aware of it.
> >>
> >> I am OK with both defaults (enabled or disabled).
> >>
> > I'd keep it as is. As said, I'm not sure it's being used right now anyway.
>
> No, not used right now.
> But I think we can use it, did you able to find chance to check:
>
> https://mails.dpdk.org/archives/dev/2018-October/114372.html
>
> Option D.
>
Just to propose something else, We also have the ALLOW_EXPERIMENTAL_API flag
that we IIRC default to on. Would it be worth consolidating these two
mechanisms into one? Currently ALLOW_EXPERIMENTAL_API lets us flag symbols that
are not yet stable, and it seems to work well. It does not however let us
simply define out structures/variables that might adversely affect the ABI.
Would it be worth considering adding a macro (something like
__rte_experimental_symbol()), that allows a variable/struct to be defined if
ALLOW_EXPERIMENTAL_API is set, and squashed otherwise?
Neil
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:06 3% [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-10-05 12:06 4% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:06 UTC (permalink / raw)
To: dev
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.
Another usage is for emulated IOMMU hardware with addressing
limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 64 +++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 95 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated memory is
+ not out of the device range. Because now memory can be dynamically allocated
+ after initialization, a dma mask is kept and any new allocated memory will be
+ checked out against that dma mask and rejected if out of range. If more than
+ one device has addressing limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..7555e76 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,70 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask */
+int __rte_experimental
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* keep the more restricted maskbit */
+ if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
+ mcfg->dma_maskbits = maskbits;
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */
unsigned rte_memory_get_nchannel(void)
{
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask
@ 2018-10-05 12:06 3% Alejandro Lucero
2018-10-05 12:06 4% ` [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:06 UTC (permalink / raw)
To: dev
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
v2:
- change logs from INFO to DEBUG
- only keeps dma mask if device capable of addressing allocated memory
- add ABI changes
- change hint address increment to page size
- split pci/bus commit in two
- fix commits
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 11:30 3% ` Neil Horman
@ 2018-10-05 12:35 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-10-05 12:35 UTC (permalink / raw)
To: Neil Horman
Cc: Bruce Richardson, Thomas Monjalon, dev, Luca Boccassi,
Christian Ehrhardt
On 10/5/2018 12:30 PM, Neil Horman wrote:
> On Fri, Oct 05, 2018 at 11:17:30AM +0100, Ferruh Yigit wrote:
>> On 10/5/2018 10:13 AM, Bruce Richardson wrote:
>>> On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
>>>> 04/10/2018 17:28, Ferruh Yigit:
>>>>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
>>>>>> 04/10/2018 17:48, Ferruh Yigit:
>>>>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>>>>>>> the current release and these APIs are targeted for further release.
>>>>>>
>>>>>> It seems nobody is using it in last releases.
>>>>>>
>>>>>>> RTE_NEXT_ABI shouldn't be enabled by default.
>>>>>>
>>>>>> The reason for having it enabled by default is that when you build DPDK
>>>>>> yourself, you probably want the latest features.
>>>>>> If packaged properly for stability, it is easy to disable it in
>>>>>> the package recipe.
>>>>>
>>>>> My concern was (if this has been used), user may get unstable APIs and without
>>>>> explicitly being aware of it.
>>>>
>>>> I am OK with both defaults (enabled or disabled).
>>>>
>>> I'd keep it as is. As said, I'm not sure it's being used right now anyway.
>>
>> No, not used right now.
>> But I think we can use it, did you able to find chance to check:
>>
>> https://mails.dpdk.org/archives/dev/2018-October/114372.html
>>
>> Option D.
>>
>
> Just to propose something else, We also have the ALLOW_EXPERIMENTAL_API flag
> that we IIRC default to on. Would it be worth consolidating these two
> mechanisms into one? Currently ALLOW_EXPERIMENTAL_API lets us flag symbols that
> are not yet stable, and it seems to work well. It does not however let us
> simply define out structures/variables that might adversely affect the ABI.
> Would it be worth considering adding a macro (something like
> __rte_experimental_symbol()), that allows a variable/struct to be defined if
> ALLOW_EXPERIMENTAL_API is set, and squashed otherwise?
RTE_NEXT_ABI is not just for symbols.
If there a new API foo(), __rte_experimental works fine to mark it experimental.
But if there is an _existing API_
"bar(char)",
and we plan to change it to
"bar(int, int)",
to publish the change early in this release we need RTE_NEXT_ABI ifdef since
both can't exist together, so it will be used as:
Release N:
#ifdef RTE_NEXT_ABI
bar(int, int);
#else
bar(char);
#endif
Release N + 1:
bar(int, int);
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:45 3% [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-10-05 12:45 4% ` Alejandro Lucero
2018-10-10 8:56 0% ` Tu, Lijuan
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:45 UTC (permalink / raw)
To: dev
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.
Another usage is for emulated IOMMU hardware with addressing
limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 60 +++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 91 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated memory is
+ not out of the device range. Because now memory can be dynamically allocated
+ after initialization, a dma mask is kept and any new allocated memory will be
+ checked out against that dma mask and rejected if out of range. If more than
+ one device has addressing limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..c482f0d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,66 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask */
+int __rte_experimental
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */
unsigned rte_memory_get_nchannel(void)
{
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
@ 2018-10-05 12:45 3% Alejandro Lucero
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:45 UTC (permalink / raw)
To: dev
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
v2:
- change logs from INFO to DEBUG
- only keeps dma mask if device capable of addressing allocated memory
- add ABI changes
- change hint address increment to page size
- split pci/bus commit in two
- fix commits
v3:
- remove previous code about keeping dma mask
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] ethdev: add field for device data per process
@ 2018-10-05 13:26 4% ` Ferruh Yigit
2018-10-05 14:47 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-05 13:26 UTC (permalink / raw)
To: Alejandro Lucero; +Cc: Thomas Monjalon, Andrew Rybchenko, dev, rasland
On 10/5/2018 2:17 PM, Alejandro Lucero wrote:
> On Fri, Oct 5, 2018 at 2:01 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
>> On 10/3/2018 9:44 PM, Thomas Monjalon wrote:
>>> + Cc more people
>>>
>>> 27/09/2018 13:26, Alejandro Lucero:
>>>> Primary and secondary processes share a per-device private data. With
>>>> current design it is not possible to have data per-device per-process.
>>>> This is required for handling properly the CPP interface inside the NFP
>>>> PMD with multiprocess support.
>>>>
>>>> There is also at least another PMD driver, tap, with similar
>>>> requirements for per-process device data.
>>>
>>> Yes, it is required to fix tap PMD for multi-process usage.
>>>
>>> I am in favor of accepting this change in 18.11.
>>>
>>> [...]
>>>> @@ -539,7 +539,13 @@ struct rte_eth_dev {
>>>> eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function.
>> */
>>>> eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
>> function. */
>>>> eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare
>> function. */
>>>> - struct rte_eth_dev_data *data; /**< Pointer to device data */
>>>> + /**
>>>> + * Next two fields are per-device data but *data is shared between
>>>
>>> All fields in rte_eth_dev are per-device.
>>>
>>>> + * primary and secondary processes and *process_private is
>> per-process
>>>> + * private.
>>>> + */
>>>> + struct rte_eth_dev_data *data; /**< Pointer to device data. */
>>>> + void *process_private; /**< Pointer to per-process device data. */
>>>
>>> We could explain here that this memory is allocated by the PMD.
>>
>> Will there be new version?
>>
>> Are we agree on name?
>>
>> Is LIBABIVER increase should be done in this patch, or will there be other
>> patch
>> already doing it?
>>
>
> I'm not familiar with LIBABIVER but just tell me to send it again with that
> change if you consider that is the right thing to do.
ABI breakage process:
- Increase LIBABIVER in library Makefile/meson.build
- Update lib in release notes "Shared Library Versions" section, with a "+" to
to indicate change
- Remove deprecation notice (seems not applies to this one)
Thomas mentioned there is another patch breaking the ABI for ethdev, I wonder
which patch will do the above process.
> About the name, I will let other to tell.
>
> Thanks
>
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] ethdev: add field for device data per process
2018-10-05 13:26 4% ` Ferruh Yigit
@ 2018-10-05 14:47 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-05 14:47 UTC (permalink / raw)
To: Ferruh Yigit, Alejandro Lucero; +Cc: Andrew Rybchenko, dev, rasland
05/10/2018 15:26, Ferruh Yigit:
> On 10/5/2018 2:17 PM, Alejandro Lucero wrote:
> > On Fri, Oct 5, 2018 at 2:01 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >> Will there be new version?
> >>
> >> Are we agree on name?
> >>
> >> Is LIBABIVER increase should be done in this patch, or will there be other
> >> patch
> >> already doing it?
> >>
> >
> > I'm not familiar with LIBABIVER but just tell me to send it again with that
> > change if you consider that is the right thing to do.
>
> ABI breakage process:
> - Increase LIBABIVER in library Makefile/meson.build
> - Update lib in release notes "Shared Library Versions" section, with a "+" to
> to indicate change
> - Remove deprecation notice (seems not applies to this one)
>
> Thomas mentioned there is another patch breaking the ABI for ethdev, I wonder
> which patch will do the above process.
There will be a patch to remove the attach/detach function.
But the patch for data per process will probably be applied first.
Please do the LIBABIVER bump as described by Ferruh.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
@ 2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Timothy Redaelli @ 2018-10-05 16:00 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, bruce.richardson, christian.ehrhardt, mvarlese
On Tue, 2 Oct 2018 17:20:45 +0100
Luca Boccassi <bluca@debian.org> wrote:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
> ---
Acked-by: Timothy Redaelli <tredaelli@redhat.com>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
@ 2018-10-07 9:32 3% ` Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
1 sibling, 2 replies; 200+ results
From: Thomas Monjalon @ 2018-10-07 9:32 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
This is a follow-up of an idea presented at Dublin
during the "hotplug talk".
Instead of changing the existing hotplug functions, as in the RFC,
some new experimental functions are added.
The old functions lose their experimental status in order to provide
a non-experimental replacement for deprecated attach/detach functions.
It has been discussed briefly in the latest technical board meeting.
Changes in v6 - after Gaetan's review:
- bump ABI version of all buses (because of rte_device change)
- unroll snprintf loop in rte_eal_hotplug_add
Changes in v5:
- rte_devargs_remove is fixed in case of null devargs (patch 2)
- a pointer to the bus is added in rte_device (patch 3)
- rte_dev_remove is fixed in case of no devargs (patch 5)
Changes in v4 - after Andrew's review:
- add API changes in release notes (patches 1 & 2)
- fix memory leak in rte_eal_hotplug_add (patch 4)
Change in v3:
- fix null dereferencing in error path (patch 2)
Thomas Monjalon (5):
devargs: remove deprecated functions
devargs: simplify parameters of removal function
eal: add bus pointer in device structure
eal: remove experimental flag of hotplug functions
eal: simplify parameters of hotplug functions
doc/guides/rel_notes/release_18_11.rst | 23 ++++--
drivers/bus/dpaa/Makefile | 2 +-
drivers/bus/dpaa/dpaa_bus.c | 2 +
drivers/bus/dpaa/meson.build | 2 +
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/fslmc_bus.c | 2 +
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/ifpga/Makefile | 2 +-
drivers/bus/ifpga/ifpga_bus.c | 6 +-
drivers/bus/ifpga/meson.build | 2 +
drivers/bus/pci/Makefile | 2 +-
drivers/bus/pci/bsd/pci.c | 2 +
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/meson.build | 2 +
drivers/bus/pci/private.h | 2 +
drivers/bus/vdev/Makefile | 2 +-
drivers/bus/vdev/meson.build | 2 +
drivers/bus/vdev/vdev.c | 9 +--
drivers/bus/vmbus/Makefile | 2 +-
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/meson.build | 2 +
drivers/bus/vmbus/private.h | 3 +
drivers/net/failsafe/failsafe_eal.c | 3 +-
drivers/net/failsafe/failsafe_ether.c | 3 +-
lib/librte_eal/common/eal_common_dev.c | 90 +++++++++++++--------
lib/librte_eal/common/eal_common_devargs.c | 41 ++--------
lib/librte_eal/common/include/rte_dev.h | 36 +++++++--
lib/librte_eal/common/include/rte_devargs.h | 81 +------------------
lib/librte_eal/rte_eal_version.map | 10 +--
29 files changed, 155 insertions(+), 184 deletions(-)
--
2.19.0
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
@ 2018-10-07 9:32 4% ` Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-07 9:32 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.
It will make possible to remove a rte_device,
using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless,
and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
doc/guides/rel_notes/release_18_11.rst | 15 ++++++++++-----
drivers/bus/dpaa/Makefile | 2 +-
drivers/bus/dpaa/dpaa_bus.c | 2 ++
drivers/bus/dpaa/meson.build | 2 ++
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/fslmc_bus.c | 2 ++
drivers/bus/fslmc/meson.build | 2 ++
drivers/bus/ifpga/Makefile | 2 +-
drivers/bus/ifpga/ifpga_bus.c | 1 +
drivers/bus/ifpga/meson.build | 2 ++
drivers/bus/pci/Makefile | 2 +-
drivers/bus/pci/bsd/pci.c | 2 ++
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/meson.build | 2 ++
drivers/bus/pci/private.h | 2 ++
drivers/bus/vdev/Makefile | 2 +-
drivers/bus/vdev/meson.build | 2 ++
drivers/bus/vdev/vdev.c | 1 +
drivers/bus/vmbus/Makefile | 2 +-
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/meson.build | 2 ++
drivers/bus/vmbus/private.h | 3 +++
lib/librte_eal/common/include/rte_dev.h | 1 +
23 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d534bb71c..c87522f27 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -164,6 +164,10 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: The structure ``rte_device`` got a new field to reference a ``rte_bus``.
+ It is changing the size of the ``struct rte_device`` and the inherited
+ device structures of all buses.
+
Removed Items
-------------
@@ -199,11 +203,12 @@ The libraries prepended with a plus sign were incremented in this version.
librte_bbdev.so.1
librte_bitratestats.so.2
librte_bpf.so.1
- librte_bus_dpaa.so.1
- librte_bus_fslmc.so.1
- librte_bus_pci.so.1
- librte_bus_vdev.so.1
- + librte_bus_vmbus.so.1
+ + librte_bus_dpaa.so.2
+ + librte_bus_fslmc.so.2
+ + librte_bus_ifpga.so.2
+ + librte_bus_pci.so.2
+ + librte_bus_vdev.so.2
+ + librte_bus_vmbus.so.2
librte_cfgfile.so.2
librte_cmdline.so.2
librte_common_octeontx.so.1
diff --git a/drivers/bus/dpaa/Makefile b/drivers/bus/dpaa/Makefile
index bffaa9d92..9337b5f92 100644
--- a/drivers/bus/dpaa/Makefile
+++ b/drivers/bus/dpaa/Makefile
@@ -24,7 +24,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
# versioning export map
EXPORT_MAP := rte_bus_dpaa_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# all source are stored in SRCS-y
#
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index 49cd04dbb..138e0f98d 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -165,6 +165,8 @@ dpaa_create_device_list(void)
goto cleanup;
}
+ dev->device.bus = &rte_dpaa_bus.bus;
+
cfg = &dpaa_netcfg->port_cfg[i];
fman_intf = cfg->fman_if;
diff --git a/drivers/bus/dpaa/meson.build b/drivers/bus/dpaa/meson.build
index d10b62c03..5e7705571 100644
--- a/drivers/bus/dpaa/meson.build
+++ b/drivers/bus/dpaa/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 515d0f534..e95551980 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_ethdev
EXPORT_MAP := rte_bus_fslmc_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
qbman/qbman_portal.c \
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index bfe81e236..960f55071 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -161,6 +161,8 @@ scan_one_fslmc_device(char *dev_name)
return -ENOMEM;
}
+ dev->device.bus = &rte_fslmc_bus.bus;
+
/* Parse the device name and ID */
t_ptr = strtok(dup_dev_name, ".");
if (!t_ptr) {
diff --git a/drivers/bus/fslmc/meson.build b/drivers/bus/fslmc/meson.build
index 22a56a6fc..54ca92d0c 100644
--- a/drivers/bus/fslmc/meson.build
+++ b/drivers/bus/fslmc/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile
index 3ff3bdb81..514452b39 100644
--- a/drivers/bus/ifpga/Makefile
+++ b/drivers/bus/ifpga/Makefile
@@ -19,7 +19,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := rte_bus_ifpga_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_IFPGA_BUS) += ifpga_bus.c
SRCS-$(CONFIG_RTE_LIBRTE_IFPGA_BUS) += ifpga_common.c
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 3ef035b7e..80663328a 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -142,6 +142,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
if (!afu_dev)
goto end;
+ afu_dev->device.bus = &rte_ifpga_bus;
afu_dev->device.devargs = devargs;
afu_dev->device.numa_node = SOCKET_ID_ANY;
afu_dev->device.name = devargs->name;
diff --git a/drivers/bus/ifpga/meson.build b/drivers/bus/ifpga/meson.build
index c9b08c862..0b5c38d54 100644
--- a/drivers/bus/ifpga/meson.build
+++ b/drivers/bus/ifpga/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2010-2018 Intel Corporation
+version = 2
+
deps += ['pci', 'kvargs', 'rawdev']
install_headers('rte_bus_ifpga.h')
sources = files('ifpga_common.c', 'ifpga_bus.c')
diff --git a/drivers/bus/pci/Makefile b/drivers/bus/pci/Makefile
index 4de953f8f..f33e0120f 100644
--- a/drivers/bus/pci/Makefile
+++ b/drivers/bus/pci/Makefile
@@ -4,7 +4,7 @@
include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_bus_pci.a
-LIBABIVER := 1
+LIBABIVER := 2
EXPORT_MAP := rte_bus_pci_version.map
CFLAGS := -I$(SRCDIR) $(CFLAGS)
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b7e..40641cad4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -223,6 +223,8 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
+
dev->addr.domain = conf->pc_sel.pc_domain;
dev->addr.bus = conf->pc_sel.pc_bus;
dev->addr.devid = conf->pc_sel.pc_dev;
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..e31bbb370 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -228,6 +228,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
return -1;
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
/* get vendor id */
diff --git a/drivers/bus/pci/meson.build b/drivers/bus/pci/meson.build
index 23d6a5fec..ef9492bb8 100644
--- a/drivers/bus/pci/meson.build
+++ b/drivers/bus/pci/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
+version = 2
+
deps += ['pci']
install_headers('rte_bus_pci.h')
sources = files('pci_common.c',
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa74..04bffa6e7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -15,6 +15,8 @@ extern struct rte_pci_bus rte_pci_bus;
struct rte_pci_driver;
struct rte_pci_device;
+extern struct rte_pci_bus rte_pci_bus;
+
/**
* Probe the PCI bus
*
diff --git a/drivers/bus/vdev/Makefile b/drivers/bus/vdev/Makefile
index 1f9cd7ebe..803b8ea7b 100644
--- a/drivers/bus/vdev/Makefile
+++ b/drivers/bus/vdev/Makefile
@@ -16,7 +16,7 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
EXPORT_MAP := rte_bus_vdev_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-y += vdev.c
SRCS-y += vdev_params.c
diff --git a/drivers/bus/vdev/meson.build b/drivers/bus/vdev/meson.build
index 12605e5c7..803785f10 100644
--- a/drivers/bus/vdev/meson.build
+++ b/drivers/bus/vdev/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
+version = 2
+
sources = files('vdev.c',
'vdev_params.c')
install_headers('rte_bus_vdev.h')
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index efca962f7..0142fb2c8 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -456,6 +456,7 @@ vdev_scan(void)
continue;
}
+ dev->device.bus = &rte_vdev_bus;
dev->device.devargs = devargs;
dev->device.numa_node = SOCKET_ID_ANY;
dev->device.name = devargs->name;
diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile
index deee9dd10..e54c557c6 100644
--- a/drivers/bus/vmbus/Makefile
+++ b/drivers/bus/vmbus/Makefile
@@ -3,7 +3,7 @@
include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_bus_vmbus.a
-LIBABIVER := 1
+LIBABIVER := 2
EXPORT_MAP := rte_bus_vmbus_version.map
CFLAGS += -I$(SRCDIR)
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
index 527a6a39f..a4755a387 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -229,6 +229,7 @@ vmbus_scan_one(const char *name)
if (dev == NULL)
return -1;
+ dev->device.bus = &rte_vmbus_bus.bus;
dev->device.name = strdup(name);
if (!dev->device.name)
goto error;
diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build
index 18daabecc..0e4d058ee 100644
--- a/drivers/bus/vmbus/meson.build
+++ b/drivers/bus/vmbus/meson.build
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
+version = 2
+
allow_experimental_apis = true
install_headers('rte_bus_vmbus.h','rte_vmbus_reg.h')
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index f2022a68c..211127dd8 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -10,11 +10,14 @@
#include <sys/uio.h>
#include <rte_log.h>
#include <rte_vmbus_reg.h>
+#include <rte_bus_vmbus.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
extern int vmbus_logtype_bus;
#define VMBUS_LOG(level, fmt, args...) \
rte_log(RTE_LOG_ ## level, vmbus_logtype_bus, "%s(): " fmt "\n", \
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a80598..d82cba847 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -157,6 +157,7 @@ struct rte_device {
TAILQ_ENTRY(rte_device) next; /**< Next device */
const char *name; /**< Device name */
const struct rte_driver *driver;/**< Associated driver */
+ const struct rte_bus *bus; /**< Bus handle assigned on scan */
int numa_node; /**< NUMA node connection */
struct rte_devargs *devargs; /**< Device user arguments */
};
--
2.19.0
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
@ 2018-10-08 21:45 0% ` Stephen Hemminger
2018-10-11 12:10 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-08 21:45 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
On Sun, 7 Oct 2018 11:32:39 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> This is a follow-up of an idea presented at Dublin
> during the "hotplug talk".
>
> Instead of changing the existing hotplug functions, as in the RFC,
> some new experimental functions are added.
> The old functions lose their experimental status in order to provide
> a non-experimental replacement for deprecated attach/detach functions.
>
> It has been discussed briefly in the latest technical board meeting.
>
>
> Changes in v6 - after Gaetan's review:
> - bump ABI version of all buses (because of rte_device change)
> - unroll snprintf loop in rte_eal_hotplug_add
>
> Changes in v5:
> - rte_devargs_remove is fixed in case of null devargs (patch 2)
> - a pointer to the bus is added in rte_device (patch 3)
> - rte_dev_remove is fixed in case of no devargs (patch 5)
>
> Changes in v4 - after Andrew's review:
> - add API changes in release notes (patches 1 & 2)
> - fix memory leak in rte_eal_hotplug_add (patch 4)
>
> Change in v3:
> - fix null dereferencing in error path (patch 2)
>
>
> Thomas Monjalon (5):
> devargs: remove deprecated functions
> devargs: simplify parameters of removal function
> eal: add bus pointer in device structure
> eal: remove experimental flag of hotplug functions
> eal: simplify parameters of hotplug functions
>
> doc/guides/rel_notes/release_18_11.rst | 23 ++++--
> drivers/bus/dpaa/Makefile | 2 +-
> drivers/bus/dpaa/dpaa_bus.c | 2 +
> drivers/bus/dpaa/meson.build | 2 +
> drivers/bus/fslmc/Makefile | 2 +-
> drivers/bus/fslmc/fslmc_bus.c | 2 +
> drivers/bus/fslmc/meson.build | 2 +
> drivers/bus/ifpga/Makefile | 2 +-
> drivers/bus/ifpga/ifpga_bus.c | 6 +-
> drivers/bus/ifpga/meson.build | 2 +
> drivers/bus/pci/Makefile | 2 +-
> drivers/bus/pci/bsd/pci.c | 2 +
> drivers/bus/pci/linux/pci.c | 1 +
> drivers/bus/pci/meson.build | 2 +
> drivers/bus/pci/private.h | 2 +
> drivers/bus/vdev/Makefile | 2 +-
> drivers/bus/vdev/meson.build | 2 +
> drivers/bus/vdev/vdev.c | 9 +--
> drivers/bus/vmbus/Makefile | 2 +-
> drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
> drivers/bus/vmbus/meson.build | 2 +
> drivers/bus/vmbus/private.h | 3 +
> drivers/net/failsafe/failsafe_eal.c | 3 +-
> drivers/net/failsafe/failsafe_ether.c | 3 +-
> lib/librte_eal/common/eal_common_dev.c | 90 +++++++++++++--------
> lib/librte_eal/common/eal_common_devargs.c | 41 ++--------
> lib/librte_eal/common/include/rte_dev.h | 36 +++++++--
> lib/librte_eal/common/include/rte_devargs.h | 81 +------------------
> lib/librte_eal/rte_eal_version.map | 10 +--
> 29 files changed, 155 insertions(+), 184 deletions(-)
>
I like these changes.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
I noticed there is only minimal places that devargs appear in the documentation.
The relationship between whitelist and devargs is not obvious for new users.
The one place is in the documentation of the documentation! So you want to pull
rte_eth_dev_attach from documentation.rst.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 00/10] introduce telemetry library
@ 2018-10-09 10:33 3% ` Van Haaren, Harry
2018-10-09 11:41 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-10-09 10:33 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Laatz, Kevin, dev, stephen, gaetan.rivet, shreyansh.jain,
Richardson, Bruce
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, October 4, 2018 4:54 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: Laatz, Kevin <kevin.laatz@intel.com>; dev@dpdk.org;
> stephen@networkplumber.org; gaetan.rivet@6wind.com; shreyansh.jain@nxp.com;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 00/10] introduce telemetry library
>
> 04/10/2018 15:25, Van Haaren, Harry:
> > From: Van Haaren, Harry
> > > From: Laatz, Kevin
> > > >
> > > > This patchset introduces a Telemetry library for DPDK Service
> Assurance.
> > > > This library provides an easy way to query DPDK Ethdev metrics.
> > >
> > > <snip>
> > >
> > > > Note: We are aware that the --telemetry flag is not working for meson
> > > > builds, we are working on it for a future patch. Despite opterr being
> set
> > > > to 0, --telemetry said to be 'unrecognized' as a startup print. This
> is a
> > > > cosmetic issue and will also be addressed.
> > > >
> > > > ---
> > > > v2:
> > > > - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
> > > > - Refactored rte_telemetry_command (Gaetan)
> > > > - Added MAINTAINERS file entry (Stephen)
> > > > - Updated docs to reflect vdev to eal rework
> > > > - Removed collectd patch from patchset (Thomas)
> > > > - General code clean up from v1 feedback
> > >
> > >
> > > Hi Gaetan, Thomas, Stephen and Shreyansh!
> > >
> > >
> > > goto TL_DR; // if time is short :)
> > >
> > >
> > > In this v2 patchset, we've reworked the Telemetry to no longer use the
> vdev
> > > infrastructure, instead having EAL enable it directly. This was
> requested as
> > > feedback to the v1 patchset. I'll detail the approach below, and
> highlight
> > > some issues we identified while implementing it.
> > >
> > > Currently, EAL does not depend on any "DPDK" libraries (ignore kvargs
> etc
> > > for a minute).
> > > Telemetry is a DPDK library, so it depends on EAL. In order to have EAL
> > > initialize
> > > Telemetry, it must depend on it - this causes a circular dependency.
> > >
> > > This patchset resolves that circular dependency by using a "weak" symbol
> for
> > > telemetry init, and then the "strong" version of telemetry init will
> replace
> > > it when the library is compiled in.
> >
> > Correction: we attempted this approach - but ended up adding a TAILQ of
> library
> > initializers functions to EAL, which was then iterated during
> rte_eal_init().
> > This also resolved the circular-dependency, but the same linking issue as
> > described below still exists.
> >
> > So - the same question still stands - what is the best solution for 18.11?
> >
> >
> > > Although this *technically* works, it
> > > requires
> > > that applications *LINK* against Telemetry library explicitly - as EAL
> won't
> > > pull
> > > in the Telemetry .so automatically... This means application-level
> build-
> > > system
> > > changes to enable --telemetry on the DPDK EAL command line.
> > >
> > > Given the complexity in enabling EAL to handle the Telemetry init() and
> its
> > > dependencies, I'd like to ask you folks for input on how to better solve
> > > this?
>
> First, the telemetry feature must be enabled via a public function (API).
> The application can decide to enable the feature at any time, right?
> If the application wants to enable the feature at initialization
> (and considers user input from the command line),
> then the init function has a dependency on telemetry.
> Your dependency concern is that the init function (which is high level)
> is in EAL (which is the lowest layer in DPDK).
Yes, and this has been resolved by allowing components to register
with EAL to have their _init() function called later. V3 coming up
with this approach, it seems to cover the required use-cases.
> I think the command line should not be managed directly by EAL.
> My suggestion in last summit was to move this code in a different library.
> We should also move the init function(s) to a new high level library.
>
> This is my proposal to solve cyclic dependency: move rte_eal_init in a lib
> which depends on everything.
I have prototyped this approach, and it is not really clean. It means
splitting EAL into two halves, and due to meson library naming we have
to move all eal files to eal_impl or something, and then eal.so keeps rte_eal_init().
Removing functions from the .map files is also technically an ABI break,
at which point I didn't think it was the right solution.
> About the linking issue, I don't understand the problem.
> If you use the DPDK makefiles, rte.app.mk should manage it.
> If you use the DPDK meson, all libs are linked.
> If you use your own system, of course you need to add telemetry lib.
Yes agreed, in practice it should be exactly like this. In reality
it can be harder to achieve the exact dependencies correctly with
both Static/Shared builds and constructors etc.
I believe the current approach of registering an _init() function
will be acceptable, let's wait for v3 to hit the mailing list.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2 00/10] introduce telemetry library
2018-10-09 10:33 3% ` Van Haaren, Harry
@ 2018-10-09 11:41 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-09 11:41 UTC (permalink / raw)
To: Van Haaren, Harry
Cc: Laatz, Kevin, dev, stephen, gaetan.rivet, shreyansh.jain,
Richardson, Bruce
09/10/2018 12:33, Van Haaren, Harry:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 04/10/2018 15:25, Van Haaren, Harry:
> > > From: Van Haaren, Harry
> > > > From: Laatz, Kevin
> > > > >
> > > > > This patchset introduces a Telemetry library for DPDK Service
> > Assurance.
> > > > > This library provides an easy way to query DPDK Ethdev metrics.
> > > >
> > > > <snip>
> > > >
> > > > > Note: We are aware that the --telemetry flag is not working for meson
> > > > > builds, we are working on it for a future patch. Despite opterr being
> > set
> > > > > to 0, --telemetry said to be 'unrecognized' as a startup print. This
> > is a
> > > > > cosmetic issue and will also be addressed.
> > > > >
> > > > > ---
> > > > > v2:
> > > > > - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
> > > > > - Refactored rte_telemetry_command (Gaetan)
> > > > > - Added MAINTAINERS file entry (Stephen)
> > > > > - Updated docs to reflect vdev to eal rework
> > > > > - Removed collectd patch from patchset (Thomas)
> > > > > - General code clean up from v1 feedback
> > > >
> > > >
> > > > Hi Gaetan, Thomas, Stephen and Shreyansh!
> > > >
> > > >
> > > > goto TL_DR; // if time is short :)
> > > >
> > > >
> > > > In this v2 patchset, we've reworked the Telemetry to no longer use the
> > vdev
> > > > infrastructure, instead having EAL enable it directly. This was
> > requested as
> > > > feedback to the v1 patchset. I'll detail the approach below, and
> > highlight
> > > > some issues we identified while implementing it.
> > > >
> > > > Currently, EAL does not depend on any "DPDK" libraries (ignore kvargs
> > etc
> > > > for a minute).
> > > > Telemetry is a DPDK library, so it depends on EAL. In order to have EAL
> > > > initialize
> > > > Telemetry, it must depend on it - this causes a circular dependency.
> > > >
> > > > This patchset resolves that circular dependency by using a "weak" symbol
> > for
> > > > telemetry init, and then the "strong" version of telemetry init will
> > replace
> > > > it when the library is compiled in.
> > >
> > > Correction: we attempted this approach - but ended up adding a TAILQ of
> > library
> > > initializers functions to EAL, which was then iterated during
> > rte_eal_init().
> > > This also resolved the circular-dependency, but the same linking issue as
> > > described below still exists.
> > >
> > > So - the same question still stands - what is the best solution for 18.11?
> > >
> > >
> > > > Although this *technically* works, it
> > > > requires
> > > > that applications *LINK* against Telemetry library explicitly - as EAL
> > won't
> > > > pull
> > > > in the Telemetry .so automatically... This means application-level
> > build-
> > > > system
> > > > changes to enable --telemetry on the DPDK EAL command line.
> > > >
> > > > Given the complexity in enabling EAL to handle the Telemetry init() and
> > its
> > > > dependencies, I'd like to ask you folks for input on how to better solve
> > > > this?
> >
> > First, the telemetry feature must be enabled via a public function (API).
> > The application can decide to enable the feature at any time, right?
> > If the application wants to enable the feature at initialization
> > (and considers user input from the command line),
> > then the init function has a dependency on telemetry.
> > Your dependency concern is that the init function (which is high level)
> > is in EAL (which is the lowest layer in DPDK).
>
> Yes, and this has been resolved by allowing components to register
> with EAL to have their _init() function called later. V3 coming up
> with this approach, it seems to cover the required use-cases.
>
>
> > I think the command line should not be managed directly by EAL.
> > My suggestion in last summit was to move this code in a different library.
> > We should also move the init function(s) to a new high level library.
> >
> > This is my proposal to solve cyclic dependency: move rte_eal_init in a lib
> > which depends on everything.
>
> I have prototyped this approach, and it is not really clean. It means
> splitting EAL into two halves, and due to meson library naming we have
> to move all eal files to eal_impl or something, and then eal.so keeps rte_eal_init().
>
> Removing functions from the .map files is also technically an ABI break,
> at which point I didn't think it was the right solution.
>
>
> > About the linking issue, I don't understand the problem.
> > If you use the DPDK makefiles, rte.app.mk should manage it.
> > If you use the DPDK meson, all libs are linked.
> > If you use your own system, of course you need to add telemetry lib.
>
> Yes agreed, in practice it should be exactly like this. In reality
> it can be harder to achieve the exact dependencies correctly with
> both Static/Shared builds and constructors etc.
>
> I believe the current approach of registering an _init() function
> will be acceptable, let's wait for v3 to hit the mailing list.
I think it is not clean.
We should really split EAL in two parts:
- low level routines
- high level init.
About telemetry, you can find any workaround, but it must be temporary.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [RFC v2 0/9] ipsec: new library for IPsec data-path processing
@ 2018-10-09 18:23 2% ` Konstantin Ananyev
2018-11-15 23:53 2% ` [dpdk-dev] [PATCH " Konstantin Ananyev
1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-10-09 18:23 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
This RFC targets 19.02 release.
This RFC introduces a new library within DPDK: librte_ipsec.
The aim is to provide DPDK native high performance library for IPsec
data-path processing.
The library is supposed to utilize existing DPDK crypto-dev and
security API to provide application with transparent IPsec processing API.
The library is concentrated on data-path protocols processing (ESP and AH),
IKE protocol(s) implementation is out of scope for that library.
Though hook/callback mechanisms might be defined in future to allow
integrate it with existing IKE implementations.
Due to quite complex nature of IPsec protocol suite and variety of user
requirements and usage scenarios a few API levels will be provided:
1) Security Association (SA-level) API
Operates at SA level, provides functions to:
- initialize/teardown SA object
- process inbound/outbound ESP/AH packets associated with the given SA
(decrypt/encrypt, authenticate, check integrity,
add/remove ESP/AH related headers and data, etc.).
2) Security Association Database (SAD) API
API to create/manage/destroy IPsec SAD.
While DPDK IPsec library plans to have its own implementation,
the intention is to keep it as independent from the other parts
of IPsec library as possible.
That is supposed to give users the ability to provide their own
implementation of the SAD compatible with the other parts of the
IPsec library.
3) IPsec Context (CTX) API
This is supposed to be a high-level API, where each IPsec CTX is an
abstraction of 'independent copy of the IPsec stack'.
CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
and provides:
- de-multiplexing stream of inbound packets to particular SAs and
further IPsec related processing.
- IPsec related processing for the outbound packets.
- SA add/delete/update functionality
Current RFC concentrates on SA-level API only (1),
detailed discussion for 2) and 3) will be subjects for separate RFC(s).
SA (low) level API
==================
API described below operates on SA level.
It provides functionality that allows user for given SA to process
inbound and outbound IPsec packets.
To be more specific:
- for inbound ESP/AH packets perform decryption, authentication,
integrity checking, remove ESP/AH related headers
- for outbound packets perform payload encryption, attach ICV,
update/add IP headers, add ESP/AH headers/trailers,
setup related mbuf felids (ol_flags, tx_offloads, etc.).
- initialize/un-initialize given SA based on user provided parameters.
The following functionality:
- match inbound/outbound packets to particular SA
- manage crypto/security devices
- provide SAD/SPD related functionality
- determine what crypto/security device has to be used
for given packet(s)
is out of scope for SA-level API.
SA-level API is based on top of crypto-dev/security API and relies on them
to perform actual cipher and integrity checking.
To have an ability to easily map crypto/security sessions into related
IPSec SA opaque userdata field was added into
rte_cryptodev_sym_session and rte_security_session structures.
That implies ABI change for both librte_crytpodev and librte_security.
Due to the nature of crypto-dev API (enqueue/deque model) we use
asynchronous API for IPsec packets destined to be processed
by crypto-device.
Expected API call sequence would be:
/* enqueue for processing by crypto-device */
rte_ipsec_crypto_prepare(...);
rte_cryptodev_enqueue_burst(...);
/* dequeue from crypto-device and do final processing (if any) */
rte_cryptodev_dequeue_burst(...);
rte_ipsec_crypto_group(...); /* optional */
rte_ipsec_process(...);
Though for packets destined for inline processing no extra overhead
is required and synchronous API call: rte_ipsec_process()
is sufficient for that case.
Current implementation supports all four currently defined rte_security types.
Though to accommodate future custom implementations function pointers model is
used for both rte_ipsec_crypto_prepare() and rte_ipsec_process().
Implemented:
------------
- ESP tunnel mode support (both IPv4/IPv6)
- Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL
- Anti-Replay window and ESN support
- Unit Test (few basic cases for now)
TODO list:
----------
- ESP transport mode support (both IPv4/IPv6)
- update examples/ipsec-secgw to use librte_ipsec
- extend Unit Test
Konstantin Ananyev (9):
cryptodev: add opaque userdata pointer into crypto sym session
security: add opaque userdata pointer into security session
net: add ESP trailer structure definition
lib: introduce ipsec library
ipsec: add SA data-path API
ipsec: implement SA data-path API
ipsec: rework SA replay window/SQN for MT environment
ipsec: helper functions to group completed crypto-ops
test/ipsec: introduce functional test
config/common_base | 5 +
lib/Makefile | 2 +
lib/librte_cryptodev/rte_cryptodev.h | 2 +
lib/librte_ipsec/Makefile | 27 +
lib/librte_ipsec/crypto.h | 74 ++
lib/librte_ipsec/ipsec_sqn.h | 315 ++++++++
lib/librte_ipsec/meson.build | 10 +
lib/librte_ipsec/pad.h | 45 ++
lib/librte_ipsec/rte_ipsec.h | 156 ++++
lib/librte_ipsec/rte_ipsec_group.h | 151 ++++
lib/librte_ipsec/rte_ipsec_sa.h | 166 ++++
lib/librte_ipsec/rte_ipsec_version.map | 15 +
lib/librte_ipsec/rwl.h | 68 ++
lib/librte_ipsec/sa.c | 1005 ++++++++++++++++++++++++
lib/librte_ipsec/sa.h | 92 +++
lib/librte_ipsec/ses.c | 45 ++
lib/librte_net/rte_esp.h | 10 +-
lib/librte_security/rte_security.h | 2 +
lib/meson.build | 2 +
mk/rte.app.mk | 2 +
test/test/Makefile | 3 +
test/test/meson.build | 3 +
test/test/test_ipsec.c | 1329 ++++++++++++++++++++++++++++++++
23 files changed, 3528 insertions(+), 1 deletion(-)
create mode 100644 lib/librte_ipsec/Makefile
create mode 100644 lib/librte_ipsec/crypto.h
create mode 100644 lib/librte_ipsec/ipsec_sqn.h
create mode 100644 lib/librte_ipsec/meson.build
create mode 100644 lib/librte_ipsec/pad.h
create mode 100644 lib/librte_ipsec/rte_ipsec.h
create mode 100644 lib/librte_ipsec/rte_ipsec_group.h
create mode 100644 lib/librte_ipsec/rte_ipsec_sa.h
create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
create mode 100644 lib/librte_ipsec/rwl.h
create mode 100644 lib/librte_ipsec/sa.c
create mode 100644 lib/librte_ipsec/sa.h
create mode 100644 lib/librte_ipsec/ses.c
create mode 100644 test/test/test_ipsec.c
--
2.13.6
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v4 2/2] hash table: add an iterator over conflicting entries
@ 2018-10-09 19:29 2% ` Qiaobin Fu
0 siblings, 0 replies; 200+ results
From: Qiaobin Fu @ 2018-10-09 19:29 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, honnappa.nagarahalli, yipeng1.wang, michel, qiaobinf
Function rte_hash_iterate_conflict_entries_with_hash() iterates
over the entries that conflict with an incoming entry.
Iterating over conflicting entries enables one to decide
if the incoming entry is more valuable than the entries already
in the hash table. This is particularly useful after
an insertion failure.
v4:
* Fix the style issue
* Follow the ABI updates
v3:
* Make the rte_hash_iterate() API similar to
rte_hash_iterate_conflict_entries()
v2:
* Fix the style issue
* Make the API more universal
Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gaëtan Rivet <gaetan.rivet@6wind.com>
---
MAINTAINERS | 2 +-
lib/librte_hash/rte_cuckoo_hash.c | 134 ++++++++++++++++++++++++++-
lib/librte_hash/rte_cuckoo_hash.h | 11 +++
lib/librte_hash/rte_hash.h | 71 +++++++++++++-
lib/librte_hash/rte_hash_version.map | 14 +++
test/test/test_hash.c | 6 +-
test/test/test_hash_multiwriter.c | 8 +-
test/test/test_hash_readwrite.c | 14 ++-
8 files changed, 246 insertions(+), 14 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 9fd258fad..e8c81656f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1055,7 +1055,7 @@ F: test/test/test_efd*
F: examples/server_node_efd/
F: doc/guides/sample_app_ug/server_node_efd.rst
-Hashes
+Hashes - EXPERIMENTAL
M: Bruce Richardson <bruce.richardson@intel.com>
M: Pablo de Lara <pablo.de.lara.guarch@intel.com>
F: lib/librte_hash/
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index a3e76684d..439251a7f 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1301,7 +1301,10 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
}
int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
+rte_hash_iterate_v1808(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ uint32_t *next)
{
uint32_t bucket_idx, idx, position;
struct rte_hash_key *next_key;
@@ -1344,3 +1347,132 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
return position - 1;
}
+VERSION_SYMBOL(rte_hash_iterate, _v1808, 18.08);
+
+int32_t
+rte_hash_iterate_v1811(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state)
+{
+ struct rte_hash_iterator_priv *it;
+ uint32_t bucket_idx, idx, position;
+ struct rte_hash_key *next_key;
+
+ RETURN_IF_TRUE(((h == NULL) || (key == NULL) ||
+ (data == NULL) || (state == NULL)), -EINVAL);
+
+ RTE_BUILD_BUG_ON(sizeof(struct rte_hash_iterator_priv) >
+ sizeof(struct rte_hash_iterator_state));
+
+ it = (struct rte_hash_iterator_priv *)state;
+ if (it->next == 0)
+ it->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+ /* Out of bounds */
+ if (it->next >= it->total_entries)
+ return -ENOENT;
+
+ /* Calculate bucket and index of current iterator */
+ bucket_idx = it->next / RTE_HASH_BUCKET_ENTRIES;
+ idx = it->next % RTE_HASH_BUCKET_ENTRIES;
+
+ __hash_rw_reader_lock(h);
+ /* If current position is empty, go to the next one */
+ while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+ it->next++;
+ /* End of table */
+ if (it->next == it->total_entries) {
+ __hash_rw_reader_unlock(h);
+ return -ENOENT;
+ }
+ bucket_idx = it->next / RTE_HASH_BUCKET_ENTRIES;
+ idx = it->next % RTE_HASH_BUCKET_ENTRIES;
+ }
+ /* Get position of entry in key table */
+ position = h->buckets[bucket_idx].key_idx[idx];
+ next_key = (struct rte_hash_key *) ((char *)h->key_store +
+ position * h->key_entry_size);
+ /* Return key and data */
+ *key = next_key->key;
+ *data = next_key->pdata;
+
+ __hash_rw_reader_unlock(h);
+
+ /* Increment iterator */
+ it->next++;
+
+ return position - 1;
+}
+MAP_STATIC_SYMBOL(int32_t rte_hash_iterate(const struct rte_hash *h,
+ const void **key, void **data, struct rte_hash_iterator_state *state),
+ rte_hash_iterate_v1811);
+
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries_with_hash(struct rte_hash *h,
+ const void **key,
+ void **data,
+ hash_sig_t sig,
+ struct rte_hash_iterator_state *state)
+{
+ struct rte_hash_iterator_conflict_entries_priv *it;
+
+ RETURN_IF_TRUE(((h == NULL) || (key == NULL) ||
+ (data == NULL) || (state == NULL)), -EINVAL);
+
+ RTE_BUILD_BUG_ON(sizeof(
+ struct rte_hash_iterator_conflict_entries_priv) >
+ sizeof(struct rte_hash_iterator_state));
+
+ it = (struct rte_hash_iterator_conflict_entries_priv *)state;
+ if (it->vnext == 0) {
+ /*
+ * Get the primary bucket index given
+ * the precomputed hash value.
+ */
+ it->primary_bidx = sig & h->bucket_bitmask;
+ /*
+ * Get the secondary bucket index given
+ * the precomputed hash value.
+ */
+ it->secondary_bidx =
+ rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+ }
+
+ while (it->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
+ uint32_t bidx = it->vnext < RTE_HASH_BUCKET_ENTRIES ?
+ it->primary_bidx : it->secondary_bidx;
+ uint32_t next = it->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
+ uint32_t position;
+ struct rte_hash_key *next_key;
+
+ RTE_BUILD_BUG_ON(!RTE_IS_POWER_OF_2(RTE_HASH_BUCKET_ENTRIES));
+ __hash_rw_reader_lock(h);
+ position = h->buckets[bidx].key_idx[next];
+
+ /* Increment iterator. */
+ it->vnext++;
+
+ /*
+ * The test below is unlikely because this iterator is meant
+ * to be used after a failed insert.
+ */
+ if (unlikely(position == EMPTY_SLOT)) {
+ __hash_rw_reader_unlock(h);
+ continue;
+ }
+
+ /* Get the entry in key table. */
+ next_key = (struct rte_hash_key *) ((char *)h->key_store +
+ position * h->key_entry_size);
+ /* Return key and data. */
+ *key = next_key->key;
+ *data = next_key->pdata;
+
+ __hash_rw_reader_unlock(h);
+
+ return position - 1;
+ }
+
+ return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467d5..70297b16d 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -195,4 +195,15 @@ struct queue_node {
int prev_slot; /* Parent(slot) in search path */
};
+struct rte_hash_iterator_priv {
+ uint32_t next;
+ uint32_t total_entries;
+};
+
+struct rte_hash_iterator_conflict_entries_priv {
+ uint32_t vnext;
+ uint32_t primary_bidx;
+ uint32_t secondary_bidx;
+};
+
#endif
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d9315f..43f6d8b88 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@
#include <stdint.h>
#include <stddef.h>
+#include <rte_compat.h>
+
#ifdef __cplusplus
extern "C" {
#endif
@@ -37,6 +39,9 @@ extern "C" {
/** Flag to support reader writer concurrency */
#define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
+/** Size of the hash table iterator state structure */
+#define RTE_HASH_ITERATOR_STATE_SIZE 64
+
/** Signature of key that is stored internally. */
typedef uint32_t hash_sig_t;
@@ -64,6 +69,16 @@ struct rte_hash_parameters {
/** @internal A hash table structure. */
struct rte_hash;
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal A hash table iterator state structure.
+ */
+struct rte_hash_iterator_state {
+ uint8_t space[RTE_HASH_ITERATOR_STATE_SIZE];
+} __rte_cache_aligned;
+
/**
* Create a new hash table.
*
@@ -443,6 +458,9 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
uint32_t num_keys, int32_t *positions);
/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
* Iterate through the hash table, returning key-value pairs.
*
* @param h
@@ -453,16 +471,61 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
* @param data
* Output containing the data associated with key.
* Returns NULL if data was not stored.
- * @param next
- * Pointer to iterator. Should be 0 to start iterating the hash table.
- * Iterator is incremented after each call of this function.
+ * @param state
+ * Pointer to the iterator state.
* @return
* Position where key was stored, if successful.
* - -EINVAL if the parameters are invalid.
* - -ENOENT if end of the hash table.
*/
int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+rte_hash_iterate(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state);
+
+int32_t
+rte_hash_iterate_v1808(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ uint32_t *next);
+
+int32_t
+rte_hash_iterate_v1811(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state);
+BIND_DEFAULT_SYMBOL(rte_hash_iterate, _v1811, 18.11);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Iterate over entries that conflict with a given hash.
+ *
+ * @param h
+ * Hash table to iterate.
+ * @param key
+ * Output containing the key at where the iterator is currently pointing.
+ * @param data
+ * Output containing the data associated with key.
+ * Returns NULL if data was not stored.
+ * @param sig
+ * Precomputed hash value for the conflict entry.
+ * @param state
+ * Pointer to the iterator state.
+ * @return
+ * Position where key was stored, if successful.
+ * - -EINVAL if the parameters are invalid.
+ * - -ENOENT if there is no more conflicting entries.
+ */
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries_with_hash(struct rte_hash *h,
+ const void **key,
+ void **data,
+ hash_sig_t sig,
+ struct rte_hash_iterator_state *state);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8e2..b1bb5cb02 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -53,3 +53,17 @@ DPDK_18.08 {
rte_hash_count;
} DPDK_16.07;
+
+DPDK_18.11 {
+ global:
+
+ rte_hash_iterate;
+
+} DPDK_18.08;
+
+EXPERIMENTAL {
+ global:
+
+ rte_hash_iterate_conflict_entries_with_hash;
+
+};
diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd10..a9691e5d5 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1170,8 +1170,8 @@ static int test_hash_iteration(void)
void *next_data;
void *data[NUM_ENTRIES];
unsigned added_keys;
- uint32_t iter = 0;
int ret = 0;
+ struct rte_hash_iterator_state state;
ut_params.entries = NUM_ENTRIES;
ut_params.name = "test_hash_iteration";
@@ -1190,8 +1190,10 @@ static int test_hash_iteration(void)
break;
}
+ memset(&state, 0, sizeof(state));
+
/* Iterate through the hash table */
- while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+ while (rte_hash_iterate(handle, &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added */
for (i = 0; i < NUM_ENTRIES; i++) {
if (memcmp(next_key, keys[i], ut_params.key_len) == 0) {
diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
index 6a3eb10bd..63c0cd8d0 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -4,6 +4,7 @@
#include <inttypes.h>
#include <locale.h>
+#include <string.h>
#include <rte_cycles.h>
#include <rte_hash.h>
@@ -125,12 +126,15 @@ test_hash_multiwriter(void)
const void *next_key;
void *next_data;
- uint32_t iter = 0;
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
uint32_t count;
+ struct rte_hash_iterator_state state;
+
+ memset(&state, 0, sizeof(state));
+
snprintf(name, 32, "test%u", calledCount++);
hash_params.name = name;
@@ -203,7 +207,7 @@ test_hash_multiwriter(void)
goto err3;
}
- while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+ while (rte_hash_iterate(handle, &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_multiwriter_test_params.found[i]++;
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d80..f9279e21e 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -166,18 +166,21 @@ test_hash_readwrite_functional(int use_htm)
unsigned int i;
const void *next_key;
void *next_data;
- uint32_t iter = 0;
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
int use_jhash = 1;
+ struct rte_hash_iterator_state state;
+
rte_atomic64_init(&gcycles);
rte_atomic64_clear(&gcycles);
rte_atomic64_init(&ginsertions);
rte_atomic64_clear(&ginsertions);
+ memset(&state, 0, sizeof(state));
+
if (init_params(use_htm, use_jhash) != 0)
goto err;
@@ -196,7 +199,7 @@ test_hash_readwrite_functional(int use_htm)
rte_eal_mp_wait_lcore();
while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
- &next_data, &iter) >= 0) {
+ &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_rw_test_param.found[i]++;
@@ -315,9 +318,10 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
const void *next_key;
void *next_data;
- uint32_t iter = 0;
int use_jhash = 0;
+ struct rte_hash_iterator_state state;
+
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
@@ -333,6 +337,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
rte_atomic64_init(&gwrite_cycles);
rte_atomic64_clear(&gwrite_cycles);
+ memset(&state, 0, sizeof(state));
+
if (init_params(use_htm, use_jhash) != 0)
goto err;
@@ -485,7 +491,7 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
rte_eal_mp_wait_lcore();
while (rte_hash_iterate(tbl_rw_test_param.h,
- &next_key, &next_data, &iter) >= 0) {
+ &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_rw_test_param.found[i]++;
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
@ 2018-10-10 8:56 0% ` Tu, Lijuan
2018-10-11 9:26 0% ` Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Tu, Lijuan @ 2018-10-10 8:56 UTC (permalink / raw)
To: Alejandro Lucero, dev
Hi
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alejandro Lucero
> Sent: Friday, October 5, 2018 8:45 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
> memsegs IOVAs addresses
>
> A device can suffer addressing limitations. This function checks memsegs
> have iovas within the supported range based on dma mask.
>
> PMDs should use this function during initialization if device suffers
> addressing limitations, returning an error if this function returns memsegs
> out of range.
>
> Another usage is for emulated IOMMU hardware with addressing limitations.
>
> It is necessary to save the most restricted dma mask for checking out
> memory allocated dynamically after initialization.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 10 ++++
> lib/librte_eal/common/eal_common_memory.c | 60
> +++++++++++++++++++++++
> lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> lib/librte_eal/common/include/rte_memory.h | 3 ++
> lib/librte_eal/common/malloc_heap.c | 12 +++++
> lib/librte_eal/linuxapp/eal/eal.c | 2 +
> lib/librte_eal/rte_eal_version.map | 1 +
> 7 files changed, 91 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 2133a5b..c806dc6 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -104,6 +104,14 @@ New Features
> the specified port. The port must be stopped before the command call in
> order
> to reconfigure queues.
>
> +* **Added check for ensuring allocated memory addressable by devices.**
> +
> + Some devices can have addressing limitations so a new function,
> + ``rte_eal_check_dma_mask``, has been added for checking allocated
> + memory is not out of the device range. Because now memory can be
> + dynamically allocated after initialization, a dma mask is kept and
> + any new allocated memory will be checked out against that dma mask
> + and rejected if out of range. If more than one device has addressing
> limitations, the dma mask is the more restricted one.
>
> API Changes
> -----------
> @@ -156,6 +164,8 @@ ABI Changes
> ``rte_config`` structure on account of improving DPDK usability
> when
> using either ``--legacy-mem`` or ``--single-file-segments`` flags.
>
> +* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more
> restricted
> + dma mask based on devices addressing limitations.
>
> Removed Items
> -------------
> diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> index 0b69804..c482f0d 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -385,6 +385,66 @@ struct virtiova {
> rte_memseg_walk(dump_memseg, f);
> }
>
> +static int
> +check_iova(const struct rte_memseg_list *msl __rte_unused,
> + const struct rte_memseg *ms, void *arg) {
> + uint64_t *mask = arg;
> + rte_iova_t iova;
> +
> + /* higher address within segment */
> + iova = (ms->iova + ms->len) - 1;
> + if (!(iova & *mask))
> + return 0;
> +
> + RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
> range\n",
> + ms->iova, ms->len);
> +
> + RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
> + return 1;
> +}
> +
> +#if defined(RTE_ARCH_64)
> +#define MAX_DMA_MASK_BITS 63
> +#else
> +#define MAX_DMA_MASK_BITS 31
> +#endif
> +
> +/* check memseg iovas are within the required range based on dma mask
> +*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
> + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> + uint64_t mask;
> +
> + /* sanity check */
> + if (maskbits > MAX_DMA_MASK_BITS) {
> + RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
> + maskbits, MAX_DMA_MASK_BITS);
> + return -1;
> + }
> +
> + /* create dma mask */
> + mask = ~((1ULL << maskbits) - 1);
> +
> + if (rte_memseg_walk(check_iova, &mask))
[Lijuan]In my environment, testpmd halts at rte_memseg_walk() when maskbits is 0.
> + /*
> + * Dma mask precludes hugepage usage.
> + * This device can not be used and we do not need to keep
> + * the dma mask.
> + */
> + return 1;
> +
> + /*
> + * we need to keep the more restricted maskbit for checking
> + * potential dynamic memory allocation in the future.
> + */
> + mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
> + RTE_MIN(mcfg->dma_maskbits, maskbits);
> +
> + return 0;
> +}
> +
> /* return the number of memory channels */ unsigned
> rte_memory_get_nchannel(void) { diff --git
> a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index 62a21c2..b5dff70 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -81,6 +81,9 @@ struct rte_mem_config {
> /* legacy mem and single file segments options are shared */
> uint32_t legacy_mem;
> uint32_t single_file_segments;
> +
> + /* keeps the more restricted dma mask */
> + uint8_t dma_maskbits;
> } __attribute__((__packed__));
>
>
> diff --git a/lib/librte_eal/common/include/rte_memory.h
> b/lib/librte_eal/common/include/rte_memory.h
> index 14bd277..c349d6c 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
> rte_memseg_list *msl,
> */
> unsigned rte_memory_get_nrank(void);
>
> +/* check memsegs iovas are within a range based on dma mask */ int
> +rte_eal_check_dma_mask(uint8_t maskbits);
> +
> /**
> * Drivers based on uio will not load unless physical
> * addresses are obtainable. It is only possible to get diff --git
> a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index ac7bbb3..3b5b2b6 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -259,11 +259,13 @@ struct malloc_elem *
> int socket, unsigned int flags, size_t align, size_t bound,
> bool contig, struct rte_memseg **ms, int n_segs) {
> + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> struct rte_memseg_list *msl;
> struct malloc_elem *elem = NULL;
> size_t alloc_sz;
> int allocd_pages;
> void *ret, *map_addr;
> + uint64_t mask;
>
> alloc_sz = (size_t)pg_sz * n_segs;
>
> @@ -291,6 +293,16 @@ struct malloc_elem *
> goto fail;
> }
>
> + if (mcfg->dma_maskbits) {
> + mask = ~((1ULL << mcfg->dma_maskbits) - 1);
> + if (rte_eal_check_dma_mask(mask)) {
> + RTE_LOG(ERR, EAL,
> + "%s(): couldn't allocate memory due to DMA mask\n",
> + __func__);
> + goto fail;
> + }
> + }
> +
> /* add newly minted memsegs to malloc heap */
> elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 4a55d3b..dfe1b8c 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -263,6 +263,8 @@ enum rte_iova_mode
> * processes could later map the config into this exact location */
> rte_config.mem_config->mem_cfg_addr = (uintptr_t)
> rte_mem_cfg_addr;
>
> + rte_config.mem_config->dma_maskbits = 0;
> +
> }
>
> /* attach to an existing shared memory config */ diff --git
> a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 73282bb..2baefce 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -291,6 +291,7 @@ EXPERIMENTAL {
> rte_devargs_parsef;
> rte_devargs_remove;
> rte_devargs_type_count;
> + rte_eal_check_dma_mask;
> rte_eal_cleanup;
> rte_eal_hotplug_add;
> rte_eal_hotplug_remove;
> --
> 1.9.1
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
@ 2018-10-10 12:02 2% ` Adrien Mazarguil
2018-10-10 13:17 0% ` Ori Kam
0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-10-10 12:02 UTC (permalink / raw)
To: Ori Kam
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
Sorry if I'm a bit late to the discussion, please see below.
On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
<snip>
> > On 10/7/2018 1:57 PM, Ori Kam wrote:
> > This series implement the generic L2/L3 tunnel encapsulation actions
> > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions"
> >
> > Currenlty the encap/decap actions only support encapsulation
> > of VXLAN and NVGRE L2 packets (L2 encapsulation is where
> > the inner packet has a valid Ethernet header, while L3 encapsulation
> > is where the inner packet doesn't have the Ethernet header).
> > In addtion the parameter to to the encap action is a list of rte items,
> > this results in 2 extra translation, between the application to the action
> > and from the action to the NIC. This results in negetive impact on the
> > insertion performance.
Not sure it's a valid concern since in this proposal, PMD is still expected
to interpret the opaque buffer contents regardless for validation and to
convert it to its internal format.
Worse, it will require a packet parser to iterate over enclosed headers
instead of a list of convenient rte_flow_whatever objects. It won't be
faster without the convenience of pointers to properly aligned structures
that only contain relevant data fields.
> > Looking forward there are going to be a need to support many more tunnel
> > encapsulations. For example MPLSoGRE, MPLSoUDP.
> > Adding the new encapsulation will result in duplication of code.
> > For example the code for handling NVGRE and VXLAN are exactly the same,
> > and each new tunnel will have the same exact structure.
> >
> > This series introduce a generic encapsulation for L2 tunnel types, and
> > generic encapsulation for L3 tunnel types. In addtion the new
> > encapsulations commands are using raw buffer inorder to save the
> > converstion time, both for the application and the PMD.
>From a usability standpoint I'm not a fan of the current interface to
perform NVGRE/VXLAN encap, however this proposal adds another layer of
opaqueness in the name of making things more generic than rte_flow already
is.
Assuming they are not to be interpreted by PMDs, maybe there's a case for
prepending arbitrary buffers to outgoing traffic and removing them from
incoming traffic. However this feature should not be named "generic tunnel
encap/decap" as it's misleading.
Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be more
appropriate. I think on the "pop" side, only the size would matter.
Another problem is that you must not require actions to rely on specific
pattern content:
[...]
*
* Decapsulate outer most tunnel from matched flow.
*
* The flow pattern must have a valid tunnel header
*/
RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP,
For maximum flexibility, all actions should be usable on their own on empty
pattern. On the other hand, you can document undefined behavior when
performing some action on traffic that doesn't contain something.
Reason is that invalid traffic may have already been removed by other flow
rules (or whatever happens) before such a rule is reached; it's a user's
responsibility to provide such guarantee.
When parsing an action, a PMD is not supposed to look at the pattern. Action
list should contain all the needed info, otherwise it means the API is badly
defined.
I'm aware the above makes it tough to implement something like
RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's
unfortunately why I think it must not be defined like that.
My opinion is that the best generic approach to perform encap/decap with
rte_flow would use one dedicated action per protocol header to
add/remove/modify. This is the suggestion I originally made for
VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
matters [3].
Remember that whatever is provided, be it an opaque buffer (like you did), a
separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself
(what I'm suggesting to do), PMDs have to process it. There will be a CPU
cost. Keep in mind odd use cases that involve QinQinQinQinQ.
> > I like the idea to generalize encap/decap actions. It makes a bit harder
> > for reader to find which encap/decap actions are supported in fact,
> > but it changes nothing for automated usage in the code - just try it
> > (as a generic way used in rte_flow).
> >
>
> Even now the user doesn't know which encapsulation is supported since
> it is PMD and sometime kernel related. On the other end it simplify adding
> encapsulation to specific costumers with some time just FW update.
Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
explicitly supported by rte_flow shouldn't be possible. Who will expect
something that isn't defined by the API to work and rely on it in their
application? I don't see it happening.
Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
that the only alternative is a generic API to work around me :)
> > Arguments about a way of encap/decap headers specification (flow items
> > vs raw) sound sensible, but I'm not sure about it.
> > It would be simpler if the tunnel header is added appended or removed
> > as is, but as I understand it is not true. For example, IPv4 ID will be
> > different in incoming packets to be decapsulated and different values
> > should be used on encapsulation. Checksums will be different (but
> > offloaded in any case).
> >
>
> I'm not sure I understand your comment.
> Decapsulation is independent of encapsulation, for example if we decap
> L2 tunnel type then there is no parameter at all the NIC just removes
> the outer layers.
According to the pattern? As described above, you can't rely on that.
Pattern does not necessarily match the full stack of outer layers.
Decap action must be able to determine what to do on its own, possibly in
conjunction with other actions in the list but that's all.
> > Current way allows to specify which fields do not matter and which one
> > must match. It allows to say that, for example, VNI match is sufficient
> > to decapsulate.
> >
>
> The encapsulation according to definition, is a list of headers that should
> encapsulate the packet. So I don't understand your comment about matching
> fields. The matching is based on the flow and the encapsulation is just data
> that should be added on top of the packet.
>
> > Also arguments assume that action input is accepted as is by the HW.
> > It could be true, but could be obviously false and HW interface may
> > require parsed input (i.e. driver must parse the input buffer and extract
> > required fields of packet headers).
> >
>
> You are correct there some PMD even Mellanox (for the E-Switch) require to parsed input
> There is no driver that knows rte_flow structure so in any case there should be
> Some translation between the encapsulation data and the NIC data.
> I agree that writing the code for translation can be harder in this approach,
> but the code is only written once is the insertion speed is much higher this way.
Avoiding code duplication enough of a reason to do something. Yes NVGRE and
VXLAN encap/decap should be redefined because of that. But IMO, they should
prepend a single VXLAN or NVGRE header and be followed by other actions that
in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers
and finally an Ethernet header.
> Also like I said some Virtual Switches are already store this data in raw buffer
> (they update only specific fields) so this will also save time for the application when
> creating a rule.
>
> > So, I'd say no. It should be better motivated if we change existing
> > approach (even advertised as experimental).
>
> I think the reasons I gave are very good motivation to change the approach
> please also consider that there is no implementation yet that supports the
> old approach.
Well, although the existing API made this painful, I did submit one [4] and
there's an updated version from Slava [5] for mlx5.
> while we do have code that uses the new approach.
If you need the ability to prepend a raw buffer, please consider a different
name for the related actions, redefine them without reliance on specific
pattern items and leave NVGRE/VXLAN encap/decap as is for the time
being. They can deprecated anytime without ABI impact.
On the other hand if that raw buffer is to be interpreted by the PMD for
more intelligent tunnel encap/decap handling, I do not agree with the
proposed approach for usability reasons.
[2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
https://mails.dpdk.org/archives/dev/2018-April/096418.html
[3] ethdev: alter behavior of flow API actions
https://git.dpdk.org/dpdk/commit/?id=cc17feb90413
[4] net/mlx5: add VXLAN encap support to switch flow rules
https://mails.dpdk.org/archives/dev/2018-August/110598.html
[5] net/mlx5: e-switch VXLAN flow validation routine
https://mails.dpdk.org/archives/dev/2018-October/113782.html
--
Adrien Mazarguil
6WIND
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 12:02 2% ` Adrien Mazarguil
@ 2018-10-10 13:17 0% ` Ori Kam
2018-10-10 16:10 0% ` Adrien Mazarguil
0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2018-10-10 13:17 UTC (permalink / raw)
To: Adrien Mazarguil
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
Hi
PSB.
> -----Original Message-----
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Sent: Wednesday, October 10, 2018 3:02 PM
> To: Ori Kam <orika@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty
> <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled
> <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio
> Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh
> <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation
> actions
>
> Sorry if I'm a bit late to the discussion, please see below.
>
> On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> <snip>
> > > On 10/7/2018 1:57 PM, Ori Kam wrote:
> > > This series implement the generic L2/L3 tunnel encapsulation actions
> > > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions"
> > >
> > > Currenlty the encap/decap actions only support encapsulation
> > > of VXLAN and NVGRE L2 packets (L2 encapsulation is where
> > > the inner packet has a valid Ethernet header, while L3 encapsulation
> > > is where the inner packet doesn't have the Ethernet header).
> > > In addtion the parameter to to the encap action is a list of rte items,
> > > this results in 2 extra translation, between the application to the action
> > > and from the action to the NIC. This results in negetive impact on the
> > > insertion performance.
>
> Not sure it's a valid concern since in this proposal, PMD is still expected
> to interpret the opaque buffer contents regardless for validation and to
> convert it to its internal format.
>
This is the action to take, we should assume
that the pattern is valid and not parse it at all.
Another issue, we have a lot of complains about the time we take
for validation, I know that currently we must validate the rule when creating it,
but this can change, why should a rule that was validate and the only change
is the IP dest of the encap data?
virtual switch after creating the first flow are just modifying it so why force
them into revalidating it? (but this issue is a different topic)
> Worse, it will require a packet parser to iterate over enclosed headers
> instead of a list of convenient rte_flow_whatever objects. It won't be
> faster without the convenience of pointers to properly aligned structures
> that only contain relevant data fields.
>
Also in the rte_item we are not aligned so there is no difference in performance,
between the two approaches, In the rte_item actually we have unused pointer which
are just a waste.
Also needs to consider how application are using it. They are already have it in raw buffer
so it saves the conversation time for the application.
> > > Looking forward there are going to be a need to support many more tunnel
> > > encapsulations. For example MPLSoGRE, MPLSoUDP.
> > > Adding the new encapsulation will result in duplication of code.
> > > For example the code for handling NVGRE and VXLAN are exactly the same,
> > > and each new tunnel will have the same exact structure.
> > >
> > > This series introduce a generic encapsulation for L2 tunnel types, and
> > > generic encapsulation for L3 tunnel types. In addtion the new
> > > encapsulations commands are using raw buffer inorder to save the
> > > converstion time, both for the application and the PMD.
>
> From a usability standpoint I'm not a fan of the current interface to
> perform NVGRE/VXLAN encap, however this proposal adds another layer of
> opaqueness in the name of making things more generic than rte_flow already
> is.
>
I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple
just give the encapsulation data and that's it. For example on system that support number of
encapsulations they don't need to call to a different function just to change the buffer.
> Assuming they are not to be interpreted by PMDs, maybe there's a case for
> prepending arbitrary buffers to outgoing traffic and removing them from
> incoming traffic. However this feature should not be named "generic tunnel
> encap/decap" as it's misleading.
>
> Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> more
> appropriate. I think on the "pop" side, only the size would matter.
>
Maybe the name can be change but again the application does encapsulation so it will
be more intuitive for it.
> Another problem is that you must not require actions to rely on specific
> pattern content:
>
I don't think this can be true anymore since for example what do you expect
to happen when you place an action for example modify ip to packet with no ip?
This may raise issues in the NIC.
Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise
really unexpected beaver can accord.
> [...]
> *
> * Decapsulate outer most tunnel from matched flow.
> *
> * The flow pattern must have a valid tunnel header
> */
> RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP,
>
> For maximum flexibility, all actions should be usable on their own on empty
> pattern. On the other hand, you can document undefined behavior when
> performing some action on traffic that doesn't contain something.
>
Like I said and like it is already defined for VXLAN_enacp we must know
the pattern otherwise the rule can be declined in Kernel / crash when trying to decap
packet without outer tunnel.
> Reason is that invalid traffic may have already been removed by other flow
> rules (or whatever happens) before such a rule is reached; it's a user's
> responsibility to provide such guarantee.
>
> When parsing an action, a PMD is not supposed to look at the pattern. Action
> list should contain all the needed info, otherwise it means the API is badly
> defined.
>
> I'm aware the above makes it tough to implement something like
> RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's
> unfortunately why I think it must not be defined like that.
>
> My opinion is that the best generic approach to perform encap/decap with
> rte_flow would use one dedicated action per protocol header to
> add/remove/modify. This is the suggestion I originally made for
> VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> matters [3].
I agree that your approach make a lot of sense, but there are number of issues with it
* it is harder and takes more time from the application point of view.
* it is slower when compared to the raw buffer.
>
> Remember that whatever is provided, be it an opaque buffer (like you did), a
> separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself
> (what I'm suggesting to do), PMDs have to process it. There will be a CPU
> cost. Keep in mind odd use cases that involve QinQinQinQinQ.
>
> > > I like the idea to generalize encap/decap actions. It makes a bit harder
> > > for reader to find which encap/decap actions are supported in fact,
> > > but it changes nothing for automated usage in the code - just try it
> > > (as a generic way used in rte_flow).
> > >
> >
> > Even now the user doesn't know which encapsulation is supported since
> > it is PMD and sometime kernel related. On the other end it simplify adding
> > encapsulation to specific costumers with some time just FW update.
>
> Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
> explicitly supported by rte_flow shouldn't be possible. Who will expect
> something that isn't defined by the API to work and rely on it in their
> application? I don't see it happening.
>
Some of our customers are working with private tunnel type, and they can configure it using kernel
or just new FW this is a real use case.
> Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
> that the only alternative is a generic API to work around me :)
>
Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release?
> > > Arguments about a way of encap/decap headers specification (flow items
> > > vs raw) sound sensible, but I'm not sure about it.
> > > It would be simpler if the tunnel header is added appended or removed
> > > as is, but as I understand it is not true. For example, IPv4 ID will be
> > > different in incoming packets to be decapsulated and different values
> > > should be used on encapsulation. Checksums will be different (but
> > > offloaded in any case).
> > >
> >
> > I'm not sure I understand your comment.
> > Decapsulation is independent of encapsulation, for example if we decap
> > L2 tunnel type then there is no parameter at all the NIC just removes
> > the outer layers.
>
> According to the pattern? As described above, you can't rely on that.
> Pattern does not necessarily match the full stack of outer layers.
>
> Decap action must be able to determine what to do on its own, possibly in
> conjunction with other actions in the list but that's all.
>
Decap removes the outer headers.
Some tunnels don't have inner L2 and it must be added after the decap
this is what L3 decap means, and the user must supply the valid L2 header.
> > > Current way allows to specify which fields do not matter and which one
> > > must match. It allows to say that, for example, VNI match is sufficient
> > > to decapsulate.
> > >
> >
> > The encapsulation according to definition, is a list of headers that should
> > encapsulate the packet. So I don't understand your comment about matching
> > fields. The matching is based on the flow and the encapsulation is just data
> > that should be added on top of the packet.
> >
> > > Also arguments assume that action input is accepted as is by the HW.
> > > It could be true, but could be obviously false and HW interface may
> > > require parsed input (i.e. driver must parse the input buffer and extract
> > > required fields of packet headers).
> > >
> >
> > You are correct there some PMD even Mellanox (for the E-Switch) require to
> parsed input
> > There is no driver that knows rte_flow structure so in any case there should
> be
> > Some translation between the encapsulation data and the NIC data.
> > I agree that writing the code for translation can be harder in this approach,
> > but the code is only written once is the insertion speed is much higher this
> way.
>
> Avoiding code duplication enough of a reason to do something. Yes NVGRE and
> VXLAN encap/decap should be redefined because of that. But IMO, they should
> prepend a single VXLAN or NVGRE header and be followed by other actions that
> in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers
> and finally an Ethernet header.
>
> > Also like I said some Virtual Switches are already store this data in raw buffer
> > (they update only specific fields) so this will also save time for the application
> when
> > creating a rule.
> >
> > > So, I'd say no. It should be better motivated if we change existing
> > > approach (even advertised as experimental).
> >
> > I think the reasons I gave are very good motivation to change the approach
> > please also consider that there is no implementation yet that supports the
> > old approach.
>
> Well, although the existing API made this painful, I did submit one [4] and
> there's an updated version from Slava [5] for mlx5.
>
> > while we do have code that uses the new approach.
>
> If you need the ability to prepend a raw buffer, please consider a different
> name for the related actions, redefine them without reliance on specific
> pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> being. They can deprecated anytime without ABI impact.
>
> On the other hand if that raw buffer is to be interpreted by the PMD for
> more intelligent tunnel encap/decap handling, I do not agree with the
> proposed approach for usability reasons.
>
> [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
>
> [3] ethdev: alter behavior of flow API actions
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
>
> [4] net/mlx5: add VXLAN encap support to switch flow rules
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
>
> [5] net/mlx5: e-switch VXLAN flow validation routine
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
>
> --
> Adrien Mazarguil
> 6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 13:17 0% ` Ori Kam
@ 2018-10-10 16:10 0% ` Adrien Mazarguil
2018-10-11 8:48 0% ` Ori Kam
0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-10-10 16:10 UTC (permalink / raw)
To: Ori Kam
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote:
<snip>
> > -----Original Message-----
> > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
<snip>
> > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> > <snip>
> > > > On 10/7/2018 1:57 PM, Ori Kam wrote:
<snip>
> > > > In addtion the parameter to to the encap action is a list of rte items,
> > > > this results in 2 extra translation, between the application to the action
> > > > and from the action to the NIC. This results in negetive impact on the
> > > > insertion performance.
> >
> > Not sure it's a valid concern since in this proposal, PMD is still expected
> > to interpret the opaque buffer contents regardless for validation and to
> > convert it to its internal format.
> >
> This is the action to take, we should assume
> that the pattern is valid and not parse it at all.
> Another issue, we have a lot of complains about the time we take
> for validation, I know that currently we must validate the rule when creating it,
> but this can change, why should a rule that was validate and the only change
> is the IP dest of the encap data?
> virtual switch after creating the first flow are just modifying it so why force
> them into revalidating it? (but this issue is a different topic)
Did you measure what proportion of time is spent on validation when creating
a flow rule?
Based on past experience with mlx4/mlx5, creation used to involve a number
of expensive system calls while validation was basically a single logic loop
checking individual items/actions while performing conversion to HW
format (mandatory for creation). Context switches related to kernel
involvement are the true performance killers.
I'm not sure this is a valid argument in favor of this approach since flow
rule validation still needs to happen regardless.
By the way, applications are not supposed to call rte_flow_validate() before
rte_flow_create(). The former can be helpful in some cases (e.g. to get a
rough idea of PMD capabilities during initialization) but they should in
practice only rely on rte_flow_create(), then fall back to software
processing if that fails.
> > Worse, it will require a packet parser to iterate over enclosed headers
> > instead of a list of convenient rte_flow_whatever objects. It won't be
> > faster without the convenience of pointers to properly aligned structures
> > that only contain relevant data fields.
> >
> Also in the rte_item we are not aligned so there is no difference in performance,
> between the two approaches, In the rte_item actually we have unused pointer which
> are just a waste.
Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have
relied on _pattern item_ structures, the room for their "last" pointer is
arguably wasted. On the other hand, the "mask" pointer allows masking
relevant fields that matter to the application (e.g. source/destination
addresses as opposed to IPv4 length, version and other irrelevant fields for
encap).
Not sure why you think it's not aligned. We're comparing an array of
rte_flow_item objects with raw packet data. The latter requires
interpretation of each protocol header to jump to the next offset. This is
more complex on both sides: to build such a buffer for the application, then
to have it processed by the PMD.
> Also needs to consider how application are using it. They are already have it in raw buffer
> so it saves the conversation time for the application.
I don't think so. Applications typically know where some traffic is supposed
to go and what VNI it should use. They don't have a prefabricated packet
handy to prepend to outgoing traffic. If that was the case they'd most
likely do so themselves through a extra packet segment and not bother with
PMD offloads.
<snip>
> > From a usability standpoint I'm not a fan of the current interface to
> > perform NVGRE/VXLAN encap, however this proposal adds another layer of
> > opaqueness in the name of making things more generic than rte_flow already
> > is.
> >
> I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple
> just give the encapsulation data and that's it. For example on system that support number of
> encapsulations they don't need to call to a different function just to change the buffer.
I'm saying it's opaque from an API standpoint if you expect the PMD to
interpret that buffer's contents in order to prepend it in a smart way.
Since this generic encap does not support masks, there is no way for an
application to at least tell a PMD what data matters and what doesn't in the
provided buffer. This means invalid checksums, lengths and so on must be
sent as is to the wire. What's the use case for such a behavior?
> > Assuming they are not to be interpreted by PMDs, maybe there's a case for
> > prepending arbitrary buffers to outgoing traffic and removing them from
> > incoming traffic. However this feature should not be named "generic tunnel
> > encap/decap" as it's misleading.
> >
> > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> > more
> > appropriate. I think on the "pop" side, only the size would matter.
> >
> Maybe the name can be change but again the application does encapsulation so it will
> be more intuitive for it.
>
> > Another problem is that you must not require actions to rely on specific
> > pattern content:
> >
> I don't think this can be true anymore since for example what do you expect
> to happen when you place an action for example modify ip to packet with no ip?
>
> This may raise issues in the NIC.
> Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise
> really unexpected beaver can accord.
Right, that's why it must be documented as undefined behavior. The API is
not supposed to enforce the relationship. A PMD may require the presence of
some pattern item in order to perform some action, but this is a PMD
limitation, not a limitation of the API itself.
<snip>
> For maximum flexibility, all actions should be usable on their own on empty
> > pattern. On the other hand, you can document undefined behavior when
> > performing some action on traffic that doesn't contain something.
> >
>
> Like I said and like it is already defined for VXLAN_enacp we must know
> the pattern otherwise the rule can be declined in Kernel / crash when trying to decap
> packet without outer tunnel.
Right, PMD limitation then. You are free to document it in the PMD.
<snip>
> > My opinion is that the best generic approach to perform encap/decap with
> > rte_flow would use one dedicated action per protocol header to
> > add/remove/modify. This is the suggestion I originally made for
> > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> > matters [3].
>
> I agree that your approach make a lot of sense, but there are number of issues with it
> * it is harder and takes more time from the application point of view.
> * it is slower when compared to the raw buffer.
I'm convinced of the opposite :) We could try to implement your raw buffer
approach as well as mine in testpmd (one action per layer, not the current
VXLAN/NVGRE encap mess mind you) in order to determine which is the most
convenient on the application side.
<snip>
> > Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
> > explicitly supported by rte_flow shouldn't be possible. Who will expect
> > something that isn't defined by the API to work and rely on it in their
> > application? I don't see it happening.
> >
> Some of our customers are working with private tunnel type, and they can configure it using kernel
> or just new FW this is a real use case.
You can already use negative types to quickly address HW and
customer-specific needs by the way. Could this [6] perhaps address the
issue?
PMDs can expose public APIs. You could devise one that spits new negative
item/action types based on some data, to be subsequently used by flow
rules with that PMD only.
> > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
> > that the only alternative is a generic API to work around me :)
> >
>
> Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release?
I don't know, is rte_flow held to a special standard compared to other DPDK
features in this regard? Engineering patches can always be provided,
backported and whatnot.
Customer applications will have to be modified and recompiled to benefit
from any new FW capabilities regardless, it's extremely unlikely to be just
a matter of installing a new FW image.
<snip>
> > Pattern does not necessarily match the full stack of outer layers.
> >
> > Decap action must be able to determine what to do on its own, possibly in
> > conjunction with other actions in the list but that's all.
> >
> Decap removes the outer headers.
> Some tunnels don't have inner L2 and it must be added after the decap
> this is what L3 decap means, and the user must supply the valid L2 header.
My point is that any data required to perform decap must be provided by the
decap action itself, not through a pattern item, whose only purpose is to
filter traffic and may not be present. Precisely what you did for L3 decap.
<snip>
> > > I think the reasons I gave are very good motivation to change the approach
> > > please also consider that there is no implementation yet that supports the
> > > old approach.
> >
> > Well, although the existing API made this painful, I did submit one [4] and
> > there's an updated version from Slava [5] for mlx5.
> >
> > > while we do have code that uses the new approach.
> >
> > If you need the ability to prepend a raw buffer, please consider a different
> > name for the related actions, redefine them without reliance on specific
> > pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> > being. They can deprecated anytime without ABI impact.
> >
> > On the other hand if that raw buffer is to be interpreted by the PMD for
> > more intelligent tunnel encap/decap handling, I do not agree with the
> > proposed approach for usability reasons.
I'm still not convinced by your approach. If these new actions *must* be
included unmodified right now to prevent some customer cataclysm, then fine
as an experiment but please leave VXLAN/NVGRE encaps alone for the time
being.
> > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
> >
> > [3] ethdev: alter behavior of flow API actions
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
> >
> > [4] net/mlx5: add VXLAN encap support to switch flow rules
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
> >
> > [5] net/mlx5: e-switch VXLAN flow validation routine
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
[6] "9.2.9. Negative types"
http://doc.dpdk.org/guides-18.08/prog_guide/rte_flow.html#negative-types
On an unrelated note, is there a way to prevent Outlook from mangling URLs
on your side? (those emea01.safelinks things)
--
Adrien Mazarguil
6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 16:10 0% ` Adrien Mazarguil
@ 2018-10-11 8:48 0% ` Ori Kam
0 siblings, 0 replies; 200+ results
From: Ori Kam @ 2018-10-11 8:48 UTC (permalink / raw)
To: Adrien Mazarguil
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler, Ori Kam
Hi Adrian,
Thanks for your comments please see my answer below and inline.
Due to a very short time limit and the fact that we have more than
4 patches that are based on this we need to close it fast.
As I can see there are number of options:
* the old approach that neither of us like. And which mean that for
every tunnel we create a new command.
* My proposed suggestion as is. Which is easier for at least number of application
to implement and faster in most cases.
* My suggestion with different name, but then we need to find also a name
for the decap and also a name for decap_l3. This approach is also problematic
since we have 2 API that are doing the same thig. For example in test-pmd encap
vxlan in which API shell we use?
* Combine between my suggestion and the current one by replacing the raw
buffer with list of items. Less code duplication easier on the validation ( that
don't think we need to validate the encap data) but we loss insertion rate.
* your suggestion of list of action that each action is one item. Main problem
is speed. Complexity form the application side and time to implement.
> -----Original Message-----
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Sent: Wednesday, October 10, 2018 7:10 PM
> To: Ori Kam <orika@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty
> <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled
> <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio
> Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh
> <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation
> actions
>
> On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote:
> <snip>
> > > -----Original Message-----
> > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> <snip>
> > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> > > <snip>
> > > > > On 10/7/2018 1:57 PM, Ori Kam wrote:
> <snip>
> > > > > In addtion the parameter to to the encap action is a list of rte items,
> > > > > this results in 2 extra translation, between the application to the action
> > > > > and from the action to the NIC. This results in negetive impact on the
> > > > > insertion performance.
> > >
> > > Not sure it's a valid concern since in this proposal, PMD is still expected
> > > to interpret the opaque buffer contents regardless for validation and to
> > > convert it to its internal format.
> > >
> > This is the action to take, we should assume
> > that the pattern is valid and not parse it at all.
> > Another issue, we have a lot of complains about the time we take
> > for validation, I know that currently we must validate the rule when creating
> it,
> > but this can change, why should a rule that was validate and the only change
> > is the IP dest of the encap data?
> > virtual switch after creating the first flow are just modifying it so why force
> > them into revalidating it? (but this issue is a different topic)
>
> Did you measure what proportion of time is spent on validation when creating
> a flow rule?
>
> Based on past experience with mlx4/mlx5, creation used to involve a number
> of expensive system calls while validation was basically a single logic loop
> checking individual items/actions while performing conversion to HW
> format (mandatory for creation). Context switches related to kernel
> involvement are the true performance killers.
>
I'm sorry to say I don't have the numbers, but I can tell you
that in the new API in most cases there will be just one system call.
In addition any extra time is a wasted time, again this is a request we got from number
of customers.
> I'm not sure this is a valid argument in favor of this approach since flow
> rule validation still needs to happen regardless.
>
> By the way, applications are not supposed to call rte_flow_validate() before
> rte_flow_create(). The former can be helpful in some cases (e.g. to get a
> rough idea of PMD capabilities during initialization) but they should in
> practice only rely on rte_flow_create(), then fall back to software
> processing if that fails.
>
First I don't think we need to validate the encapsulation data if the data is wrong
then there will the packet will be dropped. Just like you are saying with the restrication
of the flow items it is the responsibility of the application.
Also I said there is a demand for costumers and there is no reason not to do it
but in any case this is not relevant for the current patch.
> > > Worse, it will require a packet parser to iterate over enclosed headers
> > > instead of a list of convenient rte_flow_whatever objects. It won't be
> > > faster without the convenience of pointers to properly aligned structures
> > > that only contain relevant data fields.
> > >
> > Also in the rte_item we are not aligned so there is no difference in
> performance,
> > between the two approaches, In the rte_item actually we have unused
> pointer which
> > are just a waste.
>
> Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have
> relied on _pattern item_ structures, the room for their "last" pointer is
> arguably wasted. On the other hand, the "mask" pointer allows masking
> relevant fields that matter to the application (e.g. source/destination
> addresses as opposed to IPv4 length, version and other irrelevant fields for
> encap).
>
At least according to my testing the NIC can't uses masks and and it is working based
on the offloading configured to any packet (like checksum )
> Not sure why you think it's not aligned. We're comparing an array of
> rte_flow_item objects with raw packet data. The latter requires
> interpretation of each protocol header to jump to the next offset. This is
> more complex on both sides: to build such a buffer for the application, then
> to have it processed by the PMD.
>
Maybe I missing something but the in a buffer approach likely all the data will be in the
cache and will if allocated will also be aligned. On the other hand the rte_items
also are not guarantee to be in the same cache line each access to item may result
in a cache miss. Also accessing individual members are just as accessing them in
raw buffer.
> > Also needs to consider how application are using it. They are already have it
> in raw buffer
> > so it saves the conversation time for the application.
>
> I don't think so. Applications typically know where some traffic is supposed
> to go and what VNI it should use. They don't have a prefabricated packet
> handy to prepend to outgoing traffic. If that was the case they'd most
> likely do so themselves through a extra packet segment and not bother with
> PMD offloads.
>
Contrail V-Router has such a buffer and it just changes the specific fields.
This is one of the thing we wants to offload, from my last check also OVS uses
such buffer.
> <snip>
> > > From a usability standpoint I'm not a fan of the current interface to
> > > perform NVGRE/VXLAN encap, however this proposal adds another layer of
> > > opaqueness in the name of making things more generic than rte_flow
> already
> > > is.
> > >
> > I'm sorry but I don't understand why it is more opaqueness, as I see it is very
> simple
> > just give the encapsulation data and that's it. For example on system that
> support number of
> > encapsulations they don't need to call to a different function just to change
> the buffer.
>
> I'm saying it's opaque from an API standpoint if you expect the PMD to
> interpret that buffer's contents in order to prepend it in a smart way.
>
> Since this generic encap does not support masks, there is no way for an
> application to at least tell a PMD what data matters and what doesn't in the
> provided buffer. This means invalid checksums, lengths and so on must be
> sent as is to the wire. What's the use case for such a behavior?
>
The NIC treats the packet as normal packet that goes throw all normal offloading.
> > > Assuming they are not to be interpreted by PMDs, maybe there's a case for
> > > prepending arbitrary buffers to outgoing traffic and removing them from
> > > incoming traffic. However this feature should not be named "generic tunnel
> > > encap/decap" as it's misleading.
> > >
> > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> > > more
> > > appropriate. I think on the "pop" side, only the size would matter.
> > >
> > Maybe the name can be change but again the application does encapsulation
> so it will
> > be more intuitive for it.
> >
> > > Another problem is that you must not require actions to rely on specific
> > > pattern content:
> > >
> > I don't think this can be true anymore since for example what do you expect
> > to happen when you place an action for example modify ip to packet with no
> ip?
> >
> > This may raise issues in the NIC.
> > Same goes for decap after the flow is in the NIC he must assume that he can
> remove otherwise
> > really unexpected beaver can accord.
>
> Right, that's why it must be documented as undefined behavior. The API is
> not supposed to enforce the relationship. A PMD may require the presence of
> some pattern item in order to perform some action, but this is a PMD
> limitation, not a limitation of the API itself.
>
Agree
> <snip>
> > For maximum flexibility, all actions should be usable on their own on empty
> > > pattern. On the other hand, you can document undefined behavior when
> > > performing some action on traffic that doesn't contain something.
> > >
> >
> > Like I said and like it is already defined for VXLAN_enacp we must know
> > the pattern otherwise the rule can be declined in Kernel / crash when trying
> to decap
> > packet without outer tunnel.
>
> Right, PMD limitation then. You are free to document it in the PMD.
>
Agree
> <snip>
> > > My opinion is that the best generic approach to perform encap/decap with
> > > rte_flow would use one dedicated action per protocol header to
> > > add/remove/modify. This is the suggestion I originally made for
> > > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> > > matters [3].
> >
> > I agree that your approach make a lot of sense, but there are number of
> issues with it
> > * it is harder and takes more time from the application point of view.
> > * it is slower when compared to the raw buffer.
>
> I'm convinced of the opposite :) We could try to implement your raw buffer
> approach as well as mine in testpmd (one action per layer, not the current
> VXLAN/NVGRE encap mess mind you) in order to determine which is the most
> convenient on the application side.
>
There are 2 different implementations one for test-pmd and one for normal application.
Writing the code in test-pmd in raw buffer is simpler but less flexible
writing the code in a real application I think is simpler in the buffer approach.
Since they already have a buffer.
> <snip>
> > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations
> not
> > > explicitly supported by rte_flow shouldn't be possible. Who will expect
> > > something that isn't defined by the API to work and rely on it in their
> > > application? I don't see it happening.
> > >
> > Some of our customers are working with private tunnel type, and they can
> configure it using kernel
> > or just new FW this is a real use case.
>
> You can already use negative types to quickly address HW and
> customer-specific needs by the way. Could this [6] perhaps address the
> issue?
>
> PMDs can expose public APIs. You could devise one that spits new negative
> item/action types based on some data, to be subsequently used by flow
> rules with that PMD only.
>
> > > Come on, adding new encap/decap actions to DPDK is shouldn't be such a
> pain
> > > that the only alternative is a generic API to work around me :)
> > >
> >
> > Yes but like I said when a costumer asks for a ecnap and I can give it to him
> why wait for the DPDK next release?
>
> I don't know, is rte_flow held to a special standard compared to other DPDK
> features in this regard? Engineering patches can always be provided,
> backported and whatnot.
>
> Customer applications will have to be modified and recompiled to benefit
> from any new FW capabilities regardless, it's extremely unlikely to be just
> a matter of installing a new FW image.
>
In some cases this is what's happen 😊
> <snip>
> > > Pattern does not necessarily match the full stack of outer layers.
> > >
> > > Decap action must be able to determine what to do on its own, possibly in
> > > conjunction with other actions in the list but that's all.
> > >
> > Decap removes the outer headers.
> > Some tunnels don't have inner L2 and it must be added after the decap
> > this is what L3 decap means, and the user must supply the valid L2 header.
>
> My point is that any data required to perform decap must be provided by the
> decap action itself, not through a pattern item, whose only purpose is to
> filter traffic and may not be present. Precisely what you did for L3 decap.
>
Agree we remove the limitation and just say unpredicted result may accord.
> <snip>
> > > > I think the reasons I gave are very good motivation to change the
> approach
> > > > please also consider that there is no implementation yet that supports the
> > > > old approach.
> > >
> > > Well, although the existing API made this painful, I did submit one [4] and
> > > there's an updated version from Slava [5] for mlx5.
> > >
> > > > while we do have code that uses the new approach.
> > >
> > > If you need the ability to prepend a raw buffer, please consider a different
> > > name for the related actions, redefine them without reliance on specific
> > > pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> > > being. They can deprecated anytime without ABI impact.
> > >
> > > On the other hand if that raw buffer is to be interpreted by the PMD for
> > > more intelligent tunnel encap/decap handling, I do not agree with the
> > > proposed approach for usability reasons.
>
> I'm still not convinced by your approach. If these new actions *must* be
> included unmodified right now to prevent some customer cataclysm, then fine
> as an experiment but please leave VXLAN/NVGRE encaps alone for the time
> being.
>
> > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> > >
> 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> > >
> 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> > > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
> > >
> > > [3] ethdev: alter behavior of flow API actions
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> > >
> .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> > >
> orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> > >
> c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> > >
> =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
> > >
> > > [4] net/mlx5: add VXLAN encap support to switch flow rules
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> > >
> 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> > >
> %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> > > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
> > >
> > > [5] net/mlx5: e-switch VXLAN flow validation routine
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> > >
> b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> > >
> b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> > > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
>
> [6] "9.2.9. Negative types"
>
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoc.dpdk
> .org%2Fguides-18.08%2Fprog_guide%2Frte_flow.html%23negative-
> types&data=02%7C01%7Corika%40mellanox.com%7C52a7b66d888f47a02
> fa308d62ecae971%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63
> 6747846398519627&sdata=Rn1s5FgQB8pSgjLvs3K2M4rX%2BVbK5Txi59iy
> Q%2FbsUqQ%3D&reserved=0
>
> On an unrelated note, is there a way to prevent Outlook from mangling URLs
> on your side? (those emea01.safelinks things)
>
I will try to find a solution. I didn't find one so far.
> --
> Adrien Mazarguil
> 6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-10 8:56 0% ` Tu, Lijuan
@ 2018-10-11 9:26 0% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-11 9:26 UTC (permalink / raw)
To: lijuan.tu; +Cc: dev
On Wed, Oct 10, 2018 at 10:00 AM Tu, Lijuan <lijuan.tu@intel.com> wrote:
> Hi
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alejandro Lucero
> > Sent: Friday, October 5, 2018 8:45 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
> > memsegs IOVAs addresses
> >
> > A device can suffer addressing limitations. This function checks memsegs
> > have iovas within the supported range based on dma mask.
> >
> > PMDs should use this function during initialization if device suffers
> > addressing limitations, returning an error if this function returns
> memsegs
> > out of range.
> >
> > Another usage is for emulated IOMMU hardware with addressing limitations.
> >
> > It is necessary to save the most restricted dma mask for checking out
> > memory allocated dynamically after initialization.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> > doc/guides/rel_notes/release_18_11.rst | 10 ++++
> > lib/librte_eal/common/eal_common_memory.c | 60
> > +++++++++++++++++++++++
> > lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> > lib/librte_eal/common/include/rte_memory.h | 3 ++
> > lib/librte_eal/common/malloc_heap.c | 12 +++++
> > lib/librte_eal/linuxapp/eal/eal.c | 2 +
> > lib/librte_eal/rte_eal_version.map | 1 +
> > 7 files changed, 91 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_18_11.rst
> > b/doc/guides/rel_notes/release_18_11.rst
> > index 2133a5b..c806dc6 100644
> > --- a/doc/guides/rel_notes/release_18_11.rst
> > +++ b/doc/guides/rel_notes/release_18_11.rst
> > @@ -104,6 +104,14 @@ New Features
> > the specified port. The port must be stopped before the command call
> in
> > order
> > to reconfigure queues.
> >
> > +* **Added check for ensuring allocated memory addressable by devices.**
> > +
> > + Some devices can have addressing limitations so a new function,
> > + ``rte_eal_check_dma_mask``, has been added for checking allocated
> > + memory is not out of the device range. Because now memory can be
> > + dynamically allocated after initialization, a dma mask is kept and
> > + any new allocated memory will be checked out against that dma mask
> > + and rejected if out of range. If more than one device has addressing
> > limitations, the dma mask is the more restricted one.
> >
> > API Changes
> > -----------
> > @@ -156,6 +164,8 @@ ABI Changes
> > ``rte_config`` structure on account of improving DPDK usability
> > when
> > using either ``--legacy-mem`` or ``--single-file-segments``
> flags.
> >
> > +* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more
> > restricted
> > + dma mask based on devices addressing limitations.
> >
> > Removed Items
> > -------------
> > diff --git a/lib/librte_eal/common/eal_common_memory.c
> > b/lib/librte_eal/common/eal_common_memory.c
> > index 0b69804..c482f0d 100644
> > --- a/lib/librte_eal/common/eal_common_memory.c
> > +++ b/lib/librte_eal/common/eal_common_memory.c
> > @@ -385,6 +385,66 @@ struct virtiova {
> > rte_memseg_walk(dump_memseg, f);
> > }
> >
> > +static int
> > +check_iova(const struct rte_memseg_list *msl __rte_unused,
> > + const struct rte_memseg *ms, void *arg) {
> > + uint64_t *mask = arg;
> > + rte_iova_t iova;
> > +
> > + /* higher address within segment */
> > + iova = (ms->iova + ms->len) - 1;
> > + if (!(iova & *mask))
> > + return 0;
> > +
> > + RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
> > range\n",
> > + ms->iova, ms->len);
> > +
> > + RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
> > + return 1;
> > +}
> > +
> > +#if defined(RTE_ARCH_64)
> > +#define MAX_DMA_MASK_BITS 63
> > +#else
> > +#define MAX_DMA_MASK_BITS 31
> > +#endif
> > +
> > +/* check memseg iovas are within the required range based on dma mask
> > +*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
> > + struct rte_mem_config *mcfg =
> > rte_eal_get_configuration()->mem_config;
> > + uint64_t mask;
> > +
> > + /* sanity check */
> > + if (maskbits > MAX_DMA_MASK_BITS) {
> > + RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
> > + maskbits, MAX_DMA_MASK_BITS);
> > + return -1;
> > + }
> > +
> > + /* create dma mask */
> > + mask = ~((1ULL << maskbits) - 1);
> > +
> > + if (rte_memseg_walk(check_iova, &mask))
>
> [Lijuan]In my environment, testpmd halts at rte_memseg_walk() when
> maskbits is 0.
>
>
Can you explain this further?
Who is calling rte_eal_check_dma_mask with mask 0? is this a X86_64 system?
The only explanation I can find is the IOMMU hardware reporting mgaw=0 what
I would say is something completely wrong.
> > + /*
> > + * Dma mask precludes hugepage usage.
> > + * This device can not be used and we do not need to keep
> > + * the dma mask.
> > + */
> > + return 1;
> > +
> > + /*
> > + * we need to keep the more restricted maskbit for checking
> > + * potential dynamic memory allocation in the future.
> > + */
> > + mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
> > + RTE_MIN(mcfg->dma_maskbits, maskbits);
> > +
> > + return 0;
> > +}
> > +
> > /* return the number of memory channels */ unsigned
> > rte_memory_get_nchannel(void) { diff --git
> > a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > index 62a21c2..b5dff70 100644
> > --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > @@ -81,6 +81,9 @@ struct rte_mem_config {
> > /* legacy mem and single file segments options are shared */
> > uint32_t legacy_mem;
> > uint32_t single_file_segments;
> > +
> > + /* keeps the more restricted dma mask */
> > + uint8_t dma_maskbits;
> > } __attribute__((__packed__));
> >
> >
> > diff --git a/lib/librte_eal/common/include/rte_memory.h
> > b/lib/librte_eal/common/include/rte_memory.h
> > index 14bd277..c349d6c 100644
> > --- a/lib/librte_eal/common/include/rte_memory.h
> > +++ b/lib/librte_eal/common/include/rte_memory.h
> > @@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
> > rte_memseg_list *msl,
> > */
> > unsigned rte_memory_get_nrank(void);
> >
> > +/* check memsegs iovas are within a range based on dma mask */ int
> > +rte_eal_check_dma_mask(uint8_t maskbits);
> > +
> > /**
> > * Drivers based on uio will not load unless physical
> > * addresses are obtainable. It is only possible to get diff --git
> > a/lib/librte_eal/common/malloc_heap.c
> > b/lib/librte_eal/common/malloc_heap.c
> > index ac7bbb3..3b5b2b6 100644
> > --- a/lib/librte_eal/common/malloc_heap.c
> > +++ b/lib/librte_eal/common/malloc_heap.c
> > @@ -259,11 +259,13 @@ struct malloc_elem *
> > int socket, unsigned int flags, size_t align, size_t bound,
> > bool contig, struct rte_memseg **ms, int n_segs) {
> > + struct rte_mem_config *mcfg =
> > rte_eal_get_configuration()->mem_config;
> > struct rte_memseg_list *msl;
> > struct malloc_elem *elem = NULL;
> > size_t alloc_sz;
> > int allocd_pages;
> > void *ret, *map_addr;
> > + uint64_t mask;
> >
> > alloc_sz = (size_t)pg_sz * n_segs;
> >
> > @@ -291,6 +293,16 @@ struct malloc_elem *
> > goto fail;
> > }
> >
> > + if (mcfg->dma_maskbits) {
> > + mask = ~((1ULL << mcfg->dma_maskbits) - 1);
> > + if (rte_eal_check_dma_mask(mask)) {
> > + RTE_LOG(ERR, EAL,
> > + "%s(): couldn't allocate memory due to DMA
> mask\n",
> > + __func__);
> > + goto fail;
> > + }
> > + }
> > +
> > /* add newly minted memsegs to malloc heap */
> > elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> > b/lib/librte_eal/linuxapp/eal/eal.c
> > index 4a55d3b..dfe1b8c 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal.c
> > @@ -263,6 +263,8 @@ enum rte_iova_mode
> > * processes could later map the config into this exact location */
> > rte_config.mem_config->mem_cfg_addr = (uintptr_t)
> > rte_mem_cfg_addr;
> >
> > + rte_config.mem_config->dma_maskbits = 0;
> > +
> > }
> >
> > /* attach to an existing shared memory config */ diff --git
> > a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> > index 73282bb..2baefce 100644
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -291,6 +291,7 @@ EXPERIMENTAL {
> > rte_devargs_parsef;
> > rte_devargs_remove;
> > rte_devargs_type_count;
> > + rte_eal_check_dma_mask;
> > rte_eal_cleanup;
> > rte_eal_hotplug_add;
> > rte_eal_hotplug_remove;
> > --
> > 1.9.1
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
@ 2018-10-11 12:10 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-11 12:10 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, gaetan.rivet, ophirmu, qi.z.zhang,
ferruh.yigit, ktraynor
08/10/2018 23:45, Stephen Hemminger:
> On Sun, 7 Oct 2018 11:32:39 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > This is a follow-up of an idea presented at Dublin
> > during the "hotplug talk".
> >
> > Instead of changing the existing hotplug functions, as in the RFC,
> > some new experimental functions are added.
> > The old functions lose their experimental status in order to provide
> > a non-experimental replacement for deprecated attach/detach functions.
> >
> > It has been discussed briefly in the latest technical board meeting.
> >
> >
> > Changes in v6 - after Gaetan's review:
> > - bump ABI version of all buses (because of rte_device change)
> > - unroll snprintf loop in rte_eal_hotplug_add
> >
> > Changes in v5:
> > - rte_devargs_remove is fixed in case of null devargs (patch 2)
> > - a pointer to the bus is added in rte_device (patch 3)
> > - rte_dev_remove is fixed in case of no devargs (patch 5)
> >
> > Changes in v4 - after Andrew's review:
> > - add API changes in release notes (patches 1 & 2)
> > - fix memory leak in rte_eal_hotplug_add (patch 4)
> >
> > Change in v3:
> > - fix null dereferencing in error path (patch 2)
> >
> >
> > Thomas Monjalon (5):
> > devargs: remove deprecated functions
> > devargs: simplify parameters of removal function
> > eal: add bus pointer in device structure
> > eal: remove experimental flag of hotplug functions
> > eal: simplify parameters of hotplug functions
>
> I like these changes.
>
> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Applied
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
@ 2018-10-11 14:20 4% Konstantin Ananyev
2018-11-12 12:03 0% ` Akhil Goyal
0 siblings, 1 reply; 200+ results
From: Konstantin Ananyev @ 2018-10-11 14:20 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
Below are details and reasoning for proposed changes.
1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
operate based on cytpodev device id, though inside
rte_cryptodev_sym_session device specific data is addressed
by driver id (not device id).
That creates a problem with current implementation when we have
two or more devices with the same driver used by the same session.
Consider the following example:
struct rte_cryptodev_sym_session *sess;
rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
rte_cryptodev_sym_session_clear(dev_id=X, sess);
After that point if X and Y uses the same driver,
then sess can't be used by device Y any more.
The reason for that - driver specific (not device specific)
data per session, plus there is no information
how many device instances use that data.
Probably the simplest way to deal with that issue -
add a reference counter per each driver data.
2.rte_cryptodev_sym_session_set_user_data() and
rte_cryptodev_sym_session_get_user_data() -
with current implementation there is no defined way for the user to
determine what is the max allowed size of the private data.
rte_cryptodev_sym_session_set_user_data() just blindly copies
user provided data without checking memory boundaries violation.
To overcome that issue propose to add 'uint16_t priv_size' into
rte_cryptodev_sym_session structure.
3.rte_cryptodev_sym_session contains an array of variable size for
driver specific data.
Though number of elements in that array is determined by static
variable nb_drivers, that could be modified by
rte_cryptodev_allocate_driver().
That construction seems to work ok so far, as right now users register
all their PMDs at startup, though it doesn't mean that it would always
remain like that.
To make it less error prone propose to add 'uint16_t nb_drivers'
into the rte_cryptodev_sym_session structure.
At least that allows related functions to check that provided
driver id wouldn't overrun variable array boundaries,
again it allows to determine size of already allocated session
without accessing global variable.
4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
would have sort of readonly type data (init once at allocation time,
keep unmodified through session life-time).
That requires more changes in current cryptodev implementation:
Right now inside cryptodev framework both rte_cryptodev_sym_session
and driver specific session data are two completely different sctrucures
(e.g. struct cryptodev_sym_session and struct null_crypto_session).
Though current cryptodev implementation implicitly assumes that driver
will allocate both of them from within the same mempool.
Plus this is done in a manner that they override each other fields
(reuse the same space - sort of implicit C union).
That's probably not the best programming practice,
plus make impossible to have readonly fields inside both of them.
To overcome that situation propose to changed an API a bit, to allow
to use two different mempools for these two distinct data structures.
5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
I suppose that self-explanatory, and might be used in a lot of places
(would be quite useful for ipsec library we develop).
The new proposed layout for rte_cryptodev_sym_session:
struct rte_cryptodev_sym_session {
uint64_t userdata;
/**< Can be used for external metadata */
uint16_t nb_drivers;
/**< number of elements in sess_data array */
uint16_t priv_size;
/**< session private data will be placed after sess_data */
__extension__ struct {
void *data;
uint16_t refcnt;
} sess_data[0];
/**< Driver specific session material, variable size */
};
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
doc/guides/rel_notes/deprecation.rst | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d2aec64d1..998a0d92c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -74,3 +74,12 @@ Deprecation Notices
This is due to a lack of flexibility and reliance on a type unusable with
C++ programs (struct rte_flow_desc).
+
+* cryptodev: several API and ABI changes are planned for rte_cryptodev
+ in v19.02:
+
+ - The size and layout of ``rte_cryptodev_sym_session`` will change
+ to fix existing issues.
+ - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
+ ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
+ two different mempools for crypto and device private sessions.
--
2.13.6
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
@ 2018-10-12 9:32 0% ` Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2 siblings, 2 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 9:32 UTC (permalink / raw)
To: thomas; +Cc: dev, ferruh.yigit
On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
> About the series:
>
> This series of patches upgrades the DPAA2 driver firmware to
> v10.10.10 (MC Firmware).
> As the bus/fslmc is modified, it is a dependent object for other
> drivers like net/crypto/qdma. Also, the changes are mostly tightly
> linked - thus, the patches include upgrade as well as sequential
> changes to driver.
> Once done, it would imply that DPAA2 driver won't work with any MC
> FW lower than 10.10.10.
>
> Support for this new firmware is available in publically available
> LSDK (Layerscape SDK) release [1].
>
> Besides the FW change, there are other subtle changes as well:
> - Support reading the MAC address from NIC device, rather than
> using a default MAC
> - Adding support for QBMan 5.0 FW APIs
> - Some patches for NXP's LX2 platform specific features
> - And some bug fixes.
>
> Dependency:
>
> * These patches are based on net-next/master 58c3b609699a8c
> * Series [1] is logically related to this, but has no git/patch
> related dependency. It is series for upgrade of DPAA.
>
> [1] https://lsdk.github.io/index.html
> [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
>
> Version History:
> v1->v2:
> - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
> first set of patches (MC firmware update) breaks the internal ABI
> - Added support for ordered processing APIs. These APIs are expected
> to be used in subseqent feature updates on DPAA2 ethernet driver.
> - Some internal bug fixes.
> (Patches increased from 11~15)
>
Hi Thomas,
Would you be taking this series for RC1?
(Ideally being driver code, this should have been with Ferruh but
patchwork is showing your name).
-
Shreyansh
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
@ 2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 9:42 UTC (permalink / raw)
To: thomas; +Cc: dev, ferruh.yigit
On Friday 12 October 2018 03:02 PM, Shreyansh Jain wrote:
> On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
>> About the series:
>>
>> This series of patches upgrades the DPAA2 driver firmware to
>> v10.10.10 (MC Firmware).
>> As the bus/fslmc is modified, it is a dependent object for other
>> drivers like net/crypto/qdma. Also, the changes are mostly tightly
>> linked - thus, the patches include upgrade as well as sequential
>> changes to driver.
>> Once done, it would imply that DPAA2 driver won't work with any MC
>> FW lower than 10.10.10.
>>
>> Support for this new firmware is available in publically available
>> LSDK (Layerscape SDK) release [1].
>>
>> Besides the FW change, there are other subtle changes as well:
>> - Support reading the MAC address from NIC device, rather than
>> using a default MAC
>> - Adding support for QBMan 5.0 FW APIs
>> - Some patches for NXP's LX2 platform specific features
>> - And some bug fixes.
>>
>> Dependency:
>>
>> * These patches are based on net-next/master 58c3b609699a8c
>> * Series [1] is logically related to this, but has no git/patch
>> related dependency. It is series for upgrade of DPAA.
>>
>> [1] https://lsdk.github.io/index.html
>> [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
>>
>> Version History:
>> v1->v2:
>> - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
>> first set of patches (MC firmware update) breaks the internal ABI
>> - Added support for ordered processing APIs. These APIs are expected
>> to be used in subseqent feature updates on DPAA2 ethernet driver.
>> - Some internal bug fixes.
>> (Patches increased from 11~15)
>>
>
> Hi Thomas,
>
> Would you be taking this series for RC1?
> (Ideally being driver code, this should have been with Ferruh but
> patchwork is showing your name).
Thomas,
I will send a v3; v2 patch apply is broken because of some version bumps
done for buses on master.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v3 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
@ 2018-10-12 10:04 2% ` Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-10-12 10:04 UTC (permalink / raw)
To: thomas; +Cc: ferruh.yigit, dev, Shreyansh Jain
About the series:
This series of patches upgrades the DPAA2 driver firmware to
v10.10.10 (MC Firmware).
As the bus/fslmc is modified, it is a dependent object for other
drivers like net/crypto/qdma. Also, the changes are mostly tightly
linked - thus, the patches include upgrade as well as sequential
changes to driver.
Once done, it would imply that DPAA2 driver won't work with any MC
FW lower than 10.10.10.
Support for this new firmware is available in publically available
LSDK (Layerscape SDK) release [1].
Besides the FW change, there are other subtle changes as well:
- Support reading the MAC address from NIC device, rather than
using a default MAC
- Adding support for QBMan 5.0 FW APIs
- Some patches for NXP's LX2 platform specific features
- And some bug fixes.
Dependency:
* These patches are based on net-next/master 58c3b609699a8c
* Series [1] is logically related to this, but has no git/patch
related dependency. It is series for upgrade of DPAA.
[1] https://lsdk.github.io/index.html
[2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
Version History:
v2->v3:
- Rebased over master (662e382244)
v1->v2:
- Bumped up the version of the libraries (pmd/bus/crypto/event) as the
first set of patches (MC firmware update) breaks the internal ABI
- Added support for ordered processing APIs. These APIs are expected
to be used in subseqent feature updates on DPAA2 ethernet driver.
- Some internal bug fixes.
(Patches increased from 11~15)
Hemant Agrawal (9):
net/dpaa2: fix VLAN filter enablement
bus/fslmc: upgrade mc FW APIs to 10.10.0
net/dpaa2: upgrade dpni to mc FW APIs to 10.10.0
crypto/dpaa2_sec: upgarde mc FW APIs to 10.10.0
net/dpaa2: update RSS value in mbuf for lx2 platform
net/dpaa2: optimize the fd reset in Tx path
net/dpaa2: enhance the queue memory cleanup routines
net/dpaa2: support MBUF VLAN tci population from HW parser
net/dpaa2: support Rx checksum offload in slow parsing
Nipun Gupta (4):
net/dpaa2: fix IOVA conversion for congestion memory
bus/fslmc: support memory backed portals with QBMAN 5.0
bus/fslmc: support 32 enq and deq for LX2 platform
bus/fslmc: disable annotation prefetch for LX2
Shreyansh Jain (2):
net/dpaa2: read hardware provided MAC for DPNI devices
net/dpaa2: add per queue stats get and reset support
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 +++++
drivers/bus/fslmc/mc/dpcon.c | 30 +
drivers/bus/fslmc/mc/dpdmai.c | 14 +
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 +-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 +
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 ++
drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 197 +++--
drivers/bus/fslmc/portal/dpaa2_hw_dpio.h | 4 +
drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 32 +-
drivers/bus/fslmc/qbman/include/compat.h | 3 +-
.../fslmc/qbman/include/fsl_qbman_portal.h | 33 +-
drivers/bus/fslmc/qbman/qbman_portal.c | 764 +++++++++++++++---
drivers/bus/fslmc/qbman/qbman_portal.h | 30 +-
drivers/bus/fslmc/qbman/qbman_sys.h | 100 ++-
drivers/bus/fslmc/qbman/qbman_sys_decl.h | 4 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 12 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 8 +-
drivers/crypto/dpaa2_sec/mc/dpseci.c | 128 ++-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci.h | 25 +-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci_cmd.h | 73 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/dpaa2_eventdev.c | 4 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/base/dpaa2_hw_dpni_annot.h | 40 +
drivers/net/dpaa2/dpaa2_ethdev.c | 173 +++-
drivers/net/dpaa2/dpaa2_rxtx.c | 95 ++-
drivers/net/dpaa2/mc/dpni.c | 134 ++-
drivers/net/dpaa2/mc/fsl_dpkg.h | 71 +-
drivers/net/dpaa2/mc/fsl_dpni.h | 378 +++++----
drivers/net/dpaa2/mc/fsl_dpni_cmd.h | 87 +-
drivers/net/dpaa2/mc/fsl_net.h | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
51 files changed, 2374 insertions(+), 584 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
@ 2018-10-12 10:04 2% ` Shreyansh Jain
0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 10:04 UTC (permalink / raw)
To: thomas; +Cc: ferruh.yigit, dev, Hemant Agrawal
From: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.
The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.
This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 ++++++++++++++++++++
drivers/bus/fslmc/mc/dpcon.c | 30 +++
drivers/bus/fslmc/mc/dpdmai.c | 14 ++
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 ++++-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +++++-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 ++
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 +++++++++
drivers/bus/fslmc/rte_bus_fslmc_version.map | 10 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
29 files changed, 538 insertions(+), 33 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
diff --git a/drivers/bus/fslmc/mc/dpbp.c b/drivers/bus/fslmc/mc/dpbp.c
index 0215d22da..d9103409c 100644
--- a/drivers/bus/fslmc/mc/dpbp.c
+++ b/drivers/bus/fslmc/mc/dpbp.c
@@ -248,6 +248,16 @@ int dpbp_reset(struct fsl_mc_io *mc_io,
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpbp_get_attributes - Retrieve DPBP attributes.
+ *
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPBP object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpbp_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/dpci.c b/drivers/bus/fslmc/mc/dpci.c
index ff366bfa9..95edae9d9 100644
--- a/drivers/bus/fslmc/mc/dpci.c
+++ b/drivers/bus/fslmc/mc/dpci.c
@@ -265,6 +265,15 @@ int dpci_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpci_get_attributes() - Retrieve DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -292,6 +301,94 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpci_get_peer_attributes() - Retrieve peer DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned peer attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr)
+{
+ struct dpci_rsp_get_peer_attr *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_PEER_ATTR,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_peer_attr *)cmd.params;
+ attr->peer_id = le32_to_cpu(rsp_params->id);
+ attr->num_of_priorities = rsp_params->num_of_priorities;
+
+ return 0;
+}
+
+/**
+ * dpci_get_link_state() - Retrieve the DPCI link state.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @up: Returned link state; returns '1' if link is up, '0' otherwise
+ *
+ * DPCI can be connected to another DPCI, together they
+ * create a 'link'. In order to use the DPCI Tx and Rx queues,
+ * both objects must be enabled.
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up)
+{
+ struct dpci_rsp_get_link_state *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_LINK_STATE,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_link_state *)cmd.params;
+ *up = dpci_get_field(rsp_params->up, UP);
+
+ return 0;
+}
+
+/**
+ * dpci_set_rx_queue() - Set Rx queue configuration
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @priority: Select the queue relative to number of
+ * priorities configured at DPCI creation; use
+ * DPCI_ALL_QUEUES to configure all Rx queues
+ * identically.
+ * @cfg: Rx queue configuration
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -314,6 +411,9 @@ int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
dpci_set_field(cmd_params->dest_type,
DEST_TYPE,
cfg->dest_cfg.dest_type);
+ dpci_set_field(cmd_params->dest_type,
+ ORDER_PRESERVATION,
+ cfg->order_preservation_en);
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
@@ -438,3 +538,100 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
return 0;
}
+
+/**
+ * dpci_set_opr() - Set Order Restoration configuration.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @options: Configuration mode options
+ * can be OPR_OPT_CREATE or OPR_OPT_RETIRE
+ * @cfg: Configuration options for the OPR
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg)
+{
+ struct dpci_cmd_set_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_SET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_set_opr *)cmd.params;
+ cmd_params->index = index;
+ cmd_params->options = options;
+ cmd_params->oloe = cfg->oloe;
+ cmd_params->oeane = cfg->oeane;
+ cmd_params->olws = cfg->olws;
+ cmd_params->oa = cfg->oa;
+ cmd_params->oprrws = cfg->oprrws;
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
+/**
+ * dpci_get_opr() - Retrieve Order Restoration config and query.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @cfg: Returned OPR configuration
+ * @qry: Returned OPR query
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry)
+{
+ struct dpci_rsp_get_opr *rsp_params;
+ struct dpci_cmd_get_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_get_opr *)cmd.params;
+ cmd_params->index = index;
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_opr *)cmd.params;
+ cfg->oloe = rsp_params->oloe;
+ cfg->oeane = rsp_params->oeane;
+ cfg->olws = rsp_params->olws;
+ cfg->oa = rsp_params->oa;
+ cfg->oprrws = rsp_params->oprrws;
+ qry->rip = dpci_get_field(rsp_params->flags, RIP);
+ qry->enable = dpci_get_field(rsp_params->flags, OPR_ENABLE);
+ qry->nesn = le16_to_cpu(rsp_params->nesn);
+ qry->ndsn = le16_to_cpu(rsp_params->ndsn);
+ qry->ea_tseq = le16_to_cpu(rsp_params->ea_tseq);
+ qry->tseq_nlis = dpci_get_field(rsp_params->tseq_nlis, TSEQ_NLIS);
+ qry->ea_hseq = le16_to_cpu(rsp_params->ea_hseq);
+ qry->hseq_nlis = dpci_get_field(rsp_params->hseq_nlis, HSEQ_NLIS);
+ qry->ea_hptr = le16_to_cpu(rsp_params->ea_hptr);
+ qry->ea_tptr = le16_to_cpu(rsp_params->ea_tptr);
+ qry->opr_vid = le16_to_cpu(rsp_params->opr_vid);
+ qry->opr_id = le16_to_cpu(rsp_params->opr_id);
+
+ return 0;
+}
diff --git a/drivers/bus/fslmc/mc/dpcon.c b/drivers/bus/fslmc/mc/dpcon.c
index 3f6e04b97..92bd26512 100644
--- a/drivers/bus/fslmc/mc/dpcon.c
+++ b/drivers/bus/fslmc/mc/dpcon.c
@@ -295,6 +295,36 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpcon_set_notification() - Set DPCON notification destination
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCON object
+ * @cfg: Notification parameters
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg)
+{
+ struct dpcon_cmd_set_notification *dpcon_cmd;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCON_CMDID_SET_NOTIFICATION,
+ cmd_flags,
+ token);
+ dpcon_cmd = (struct dpcon_cmd_set_notification *)cmd.params;
+ dpcon_cmd->dpio_id = cpu_to_le32(cfg->dpio_id);
+ dpcon_cmd->priority = cfg->priority;
+ dpcon_cmd->user_ctx = cpu_to_le64(cfg->user_ctx);
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
/**
* dpcon_get_api_version - Get Data Path Concentrator API version
* @mc_io: Pointer to MC portal's DPCON object
diff --git a/drivers/bus/fslmc/mc/dpdmai.c b/drivers/bus/fslmc/mc/dpdmai.c
index 528889df3..dcb9d516a 100644
--- a/drivers/bus/fslmc/mc/dpdmai.c
+++ b/drivers/bus/fslmc/mc/dpdmai.c
@@ -113,6 +113,7 @@ int dpdmai_create(struct fsl_mc_io *mc_io,
cmd_flags,
dprc_token);
cmd_params = (struct dpdmai_cmd_create *)cmd.params;
+ cmd_params->num_queues = cfg->num_queues;
cmd_params->priorities[0] = cfg->priorities[0];
cmd_params->priorities[1] = cfg->priorities[1];
@@ -297,6 +298,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
rsp_params = (struct dpdmai_rsp_get_attr *)cmd.params;
attr->id = le32_to_cpu(rsp_params->id);
attr->num_of_priorities = rsp_params->num_of_priorities;
+ attr->num_of_queues = rsp_params->num_of_queues;
return 0;
}
@@ -306,6 +308,8 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation; use
* DPDMAI_ALL_QUEUES to configure all Rx queues
@@ -317,6 +321,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg)
{
@@ -331,6 +336,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
cmd_params->dest_id = cpu_to_le32(cfg->dest_cfg.dest_id);
cmd_params->dest_priority = cfg->dest_cfg.priority;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
cmd_params->user_ctx = cpu_to_le64(cfg->user_ctx);
cmd_params->options = cpu_to_le32(cfg->options);
dpdmai_set_field(cmd_params->dest_type,
@@ -346,6 +352,8 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Rx queue attributes
@@ -355,6 +363,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr)
{
@@ -369,6 +378,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
@@ -392,6 +402,8 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Tx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Tx queue attributes
@@ -401,6 +413,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr)
{
@@ -415,6 +428,7 @@ int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
diff --git a/drivers/bus/fslmc/mc/dpio.c b/drivers/bus/fslmc/mc/dpio.c
index 966277cc6..a3382ed14 100644
--- a/drivers/bus/fslmc/mc/dpio.c
+++ b/drivers/bus/fslmc/mc/dpio.c
@@ -268,6 +268,15 @@ int dpio_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpio_get_attributes() - Retrieve DPIO attributes
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPIO object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
int dpio_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp.h b/drivers/bus/fslmc/mc/fsl_dpbp.h
index 111836261..9d405b42c 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp.h
@@ -82,6 +82,7 @@ int dpbp_get_attributes(struct fsl_mc_io *mc_io,
/**
* BPSCN write will attempt to allocate into a cache (coherent write)
*/
+#define DPBP_NOTIF_OPT_COHERENT_WRITE 0x00000001
int dpbp_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
index 18402cedf..55c9fc9b4 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
@@ -9,13 +9,15 @@
/* DPBP Version */
#define DPBP_VER_MAJOR 3
-#define DPBP_VER_MINOR 3
+#define DPBP_VER_MINOR 4
/* Command versioning */
#define DPBP_CMD_BASE_VERSION 1
+#define DPBP_CMD_VERSION_2 2
#define DPBP_CMD_ID_OFFSET 4
#define DPBP_CMD(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_BASE_VERSION)
+#define DPBP_CMD_V2(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_VERSION_2)
/* Command IDs */
#define DPBP_CMDID_CLOSE DPBP_CMD(0x800)
@@ -37,8 +39,8 @@
#define DPBP_CMDID_GET_IRQ_STATUS DPBP_CMD(0x016)
#define DPBP_CMDID_CLEAR_IRQ_STATUS DPBP_CMD(0x017)
-#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD(0x1b0)
-#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD(0x1b1)
+#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD_V2(0x1b0)
+#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD_V2(0x1b1)
#define DPBP_CMDID_GET_FREE_BUFFERS_NUM DPBP_CMD(0x1b2)
@@ -68,8 +70,8 @@ struct dpbp_cmd_set_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
@@ -79,8 +81,8 @@ struct dpbp_rsp_get_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
diff --git a/drivers/bus/fslmc/mc/fsl_dpci.h b/drivers/bus/fslmc/mc/fsl_dpci.h
index f69ed3f33..9af9097e5 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci.h
@@ -6,6 +6,8 @@
#ifndef __FSL_DPCI_H
#define __FSL_DPCI_H
+#include <fsl_dpopr.h>
+
/* Data Path Communication Interface API
* Contains initialization APIs and runtime control APIs for DPCI
*/
@@ -17,7 +19,7 @@ struct fsl_mc_io;
/**
* Maximum number of Tx/Rx priorities per DPCI object
*/
-#define DPCI_PRIO_NUM 2
+#define DPCI_PRIO_NUM 4
/**
* Indicates an invalid frame queue
@@ -106,6 +108,27 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpci_attr *attr);
+/**
+ * struct dpci_peer_attr - Structure representing the peer DPCI attributes
+ * @peer_id: DPCI peer id; if no peer is connected returns (-1)
+ * @num_of_priorities: The pper's number of receive priorities; determines the
+ * number of transmit priorities for the local DPCI object
+ */
+struct dpci_peer_attr {
+ int peer_id;
+ uint8_t num_of_priorities;
+};
+
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr);
+
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up);
+
/**
* enum dpci_dest - DPCI destination types
* @DPCI_DEST_NONE: Unassigned destination; The queue is set in parked mode
@@ -153,6 +176,11 @@ struct dpci_dest_cfg {
*/
#define DPCI_QUEUE_OPT_DEST 0x00000002
+/**
+ * Set the queue to hold active mode.
+ */
+#define DPCI_QUEUE_OPT_HOLD_ACTIVE 0x00000004
+
/**
* struct dpci_rx_queue_cfg - Structure representing RX queue configuration
* @options: Flags representing the suggested modifications to the queue;
@@ -163,11 +191,14 @@ struct dpci_dest_cfg {
* 'options'
* @dest_cfg: Queue destination parameters;
* valid only if 'DPCI_QUEUE_OPT_DEST' is contained in 'options'
+ * @order_preservation_en: order preservation configuration for the rx queue
+ * valid only if 'DPCI_QUEUE_OPT_HOLD_ACTIVE' is contained in 'options'
*/
struct dpci_rx_queue_cfg {
uint32_t options;
uint64_t user_ctx;
struct dpci_dest_cfg dest_cfg;
+ int order_preservation_en;
};
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
@@ -217,4 +248,18 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
uint16_t *major_ver,
uint16_t *minor_ver);
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg);
+
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry);
+
#endif /* __FSL_DPCI_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
index 634248ac0..92b85a820 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
@@ -8,7 +8,7 @@
/* DPCI Version */
#define DPCI_VER_MAJOR 3
-#define DPCI_VER_MINOR 3
+#define DPCI_VER_MINOR 4
#define DPCI_CMD_BASE_VERSION 1
#define DPCI_CMD_BASE_VERSION_V2 2
@@ -35,6 +35,8 @@
#define DPCI_CMDID_GET_PEER_ATTR DPCI_CMD_V1(0x0e2)
#define DPCI_CMDID_GET_RX_QUEUE DPCI_CMD_V1(0x0e3)
#define DPCI_CMDID_GET_TX_QUEUE DPCI_CMD_V1(0x0e4)
+#define DPCI_CMDID_SET_OPR DPCI_CMD_V1(0x0e5)
+#define DPCI_CMDID_GET_OPR DPCI_CMD_V1(0x0e6)
/* Macros for accessing command fields smaller than 1byte */
#define DPCI_MASK(field) \
@@ -90,6 +92,8 @@ struct dpci_rsp_get_link_state {
#define DPCI_DEST_TYPE_SHIFT 0
#define DPCI_DEST_TYPE_SIZE 4
+#define DPCI_ORDER_PRESERVATION_SHIFT 4
+#define DPCI_ORDER_PRESERVATION_SIZE 1
struct dpci_cmd_set_rx_queue {
uint32_t dest_id;
@@ -128,5 +132,61 @@ struct dpci_rsp_get_api_version {
uint16_t minor;
};
+struct dpci_cmd_set_opr {
+ uint16_t pad0;
+ uint8_t index;
+ uint8_t options;
+ uint8_t pad1[7];
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+};
+
+struct dpci_cmd_get_opr {
+ uint16_t pad;
+ uint8_t index;
+};
+
+#define DPCI_RIP_SHIFT 0
+#define DPCI_RIP_SIZE 1
+#define DPCI_OPR_ENABLE_SHIFT 1
+#define DPCI_OPR_ENABLE_SIZE 1
+#define DPCI_TSEQ_NLIS_SHIFT 0
+#define DPCI_TSEQ_NLIS_SIZE 1
+#define DPCI_HSEQ_NLIS_SHIFT 0
+#define DPCI_HSEQ_NLIS_SIZE 1
+
+struct dpci_rsp_get_opr {
+ uint64_t pad0;
+ /* from LSB: rip:1 enable:1 */
+ uint8_t flags;
+ uint16_t pad1;
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+ uint16_t nesn;
+ uint16_t pad8;
+ uint16_t ndsn;
+ uint16_t pad2;
+ uint16_t ea_tseq;
+ /* only the LSB */
+ uint8_t tseq_nlis;
+ uint8_t pad3;
+ uint16_t ea_hseq;
+ /* only the LSB */
+ uint8_t hseq_nlis;
+ uint8_t pad4;
+ uint16_t ea_hptr;
+ uint16_t pad5;
+ uint16_t ea_tptr;
+ uint16_t pad6;
+ uint16_t opr_vid;
+ uint16_t pad7;
+ uint16_t opr_id;
+};
#pragma pack(pop)
#endif /* _FSL_DPCI_CMD_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpcon.h b/drivers/bus/fslmc/mc/fsl_dpcon.h
index 36dd5f3c1..fc0430dc1 100644
--- a/drivers/bus/fslmc/mc/fsl_dpcon.h
+++ b/drivers/bus/fslmc/mc/fsl_dpcon.h
@@ -81,6 +81,25 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpcon_attr *attr);
+/**
+ * struct dpcon_notification_cfg - Structure representing notification params
+ * @dpio_id: DPIO object ID; must be configured with a notification channel;
+ * to disable notifications set it to 'DPCON_INVALID_DPIO_ID';
+ * @priority: Priority selection within the DPIO channel; valid values
+ * are 0-7, depending on the number of priorities in that channel
+ * @user_ctx: User context value provided with each CDAN message
+ */
+struct dpcon_notification_cfg {
+ int dpio_id;
+ uint8_t priority;
+ uint64_t user_ctx;
+};
+
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg);
+
int dpcon_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai.h b/drivers/bus/fslmc/mc/fsl_dpdmai.h
index 03e46ec14..40469cc13 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai.h
@@ -39,6 +39,7 @@ int dpdmai_close(struct fsl_mc_io *mc_io,
* should be configured with 0
*/
struct dpdmai_cfg {
+ uint8_t num_queues;
uint8_t priorities[DPDMAI_PRIO_NUM];
};
@@ -78,6 +79,7 @@ int dpdmai_reset(struct fsl_mc_io *mc_io,
struct dpdmai_attr {
int id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
@@ -149,6 +151,7 @@ struct dpdmai_rx_queue_cfg {
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg);
@@ -168,6 +171,7 @@ struct dpdmai_rx_queue_attr {
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr);
@@ -183,6 +187,7 @@ struct dpdmai_tx_queue_attr {
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr);
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
index 618e19eae..7e122de4e 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
@@ -7,30 +7,32 @@
/* DPDMAI Version */
#define DPDMAI_VER_MAJOR 3
-#define DPDMAI_VER_MINOR 2
+#define DPDMAI_VER_MINOR 3
/* Command versioning */
#define DPDMAI_CMD_BASE_VERSION 1
+#define DPDMAI_CMD_VERSION_2 2
#define DPDMAI_CMD_ID_OFFSET 4
#define DPDMAI_CMD(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_BASE_VERSION)
+#define DPDMAI_CMD_V2(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_VERSION_2)
/* Command IDs */
#define DPDMAI_CMDID_CLOSE DPDMAI_CMD(0x800)
#define DPDMAI_CMDID_OPEN DPDMAI_CMD(0x80E)
-#define DPDMAI_CMDID_CREATE DPDMAI_CMD(0x90E)
+#define DPDMAI_CMDID_CREATE DPDMAI_CMD_V2(0x90E)
#define DPDMAI_CMDID_DESTROY DPDMAI_CMD(0x98E)
#define DPDMAI_CMDID_GET_API_VERSION DPDMAI_CMD(0xa0E)
#define DPDMAI_CMDID_ENABLE DPDMAI_CMD(0x002)
#define DPDMAI_CMDID_DISABLE DPDMAI_CMD(0x003)
-#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD(0x004)
+#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD_V2(0x004)
#define DPDMAI_CMDID_RESET DPDMAI_CMD(0x005)
#define DPDMAI_CMDID_IS_ENABLED DPDMAI_CMD(0x006)
-#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD(0x1A0)
-#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD(0x1A1)
-#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD(0x1A2)
+#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD_V2(0x1A0)
+#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD_V2(0x1A1)
+#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD_V2(0x1A2)
/* Macros for accessing command fields smaller than 1byte */
#define DPDMAI_MASK(field) \
@@ -47,7 +49,7 @@ struct dpdmai_cmd_open {
};
struct dpdmai_cmd_create {
- uint8_t pad;
+ uint8_t num_queues;
uint8_t priorities[2];
};
@@ -66,6 +68,7 @@ struct dpdmai_rsp_is_enabled {
struct dpdmai_rsp_get_attr {
uint32_t id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
#define DPDMAI_DEST_TYPE_SHIFT 0
@@ -77,7 +80,7 @@ struct dpdmai_cmd_set_rx_queue {
uint8_t priority;
/* from LSB: dest_type:4 */
uint8_t dest_type;
- uint8_t pad;
+ uint8_t queue_idx;
uint64_t user_ctx;
uint32_t options;
};
@@ -85,6 +88,7 @@ struct dpdmai_cmd_set_rx_queue {
struct dpdmai_cmd_get_queue {
uint8_t pad[5];
uint8_t priority;
+ uint8_t queue_idx;
};
struct dpdmai_rsp_get_rx_queue {
diff --git a/drivers/bus/fslmc/mc/fsl_dpmng.h b/drivers/bus/fslmc/mc/fsl_dpmng.h
index afaf9b711..8559bef87 100644
--- a/drivers/bus/fslmc/mc/fsl_dpmng.h
+++ b/drivers/bus/fslmc/mc/fsl_dpmng.h
@@ -18,7 +18,7 @@ struct fsl_mc_io;
* Management Complex firmware version information
*/
#define MC_VER_MAJOR 10
-#define MC_VER_MINOR 3
+#define MC_VER_MINOR 10
/**
* struct mc_version
diff --git a/drivers/bus/fslmc/mc/fsl_dpopr.h b/drivers/bus/fslmc/mc/fsl_dpopr.h
new file mode 100644
index 000000000..fd727e011
--- /dev/null
+++ b/drivers/bus/fslmc/mc/fsl_dpopr.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
+ *
+ * Copyright 2013-2015 Freescale Semiconductor Inc.
+ * Copyright 2018 NXP
+ *
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/** @addtogroup dpopr Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ * @{
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ * 0 - Window size is 32 frames.
+ * 1 - Window size is 64 frames.
+ * 2 - Window size is 128 frames.
+ * 3 - Window size is 256 frames.
+ * 4 - Window size is 512 frames.
+ * 5 - Window size is 1024 frames.
+ *@oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ *@olws: OPR acceptable late arrival window size (0 to 3)
+ * 0 - Disabled. Late arrivals are always rejected.
+ * 1 - Window size is 32 frames.
+ * 2 - Window size is the same as the OPR restoration
+ * window size configured in the OPRRWS field.
+ * 3 - Window size is 8192 frames.
+ * Late arrivals are always accepted.
+ *@oeane: Order restoration list (ORL) resource exhaustion
+ * advance NESN enable (0 disabled, 1 enabled)
+ *@oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+ uint8_t oprrws;
+ uint8_t oa;
+ uint8_t olws;
+ uint8_t oeane;
+ uint8_t oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+ char enable;
+ char rip;
+ uint16_t ndsn;
+ uint16_t nesn;
+ uint16_t ea_hseq;
+ char hseq_nlis;
+ uint16_t ea_tseq;
+ char tseq_nlis;
+ uint16_t ea_tptr;
+ uint16_t ea_hptr;
+ uint16_t opr_id;
+ uint16_t opr_vid;
+};
+
+#endif /* __FSL_DPOPR_H_ */
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index b4a881704..8717373dd 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -117,3 +117,13 @@ DPDK_18.05 {
rte_dpaa2_memsegs;
} DPDK_18.02;
+
+DPDK_18.11 {
+ global:
+
+ dpci_get_link_state;
+ dpci_get_opr;
+ dpci_get_peer_attributes;
+ dpci_set_opr;
+
+} DPDK_18.05;
diff --git a/drivers/crypto/dpaa2_sec/Makefile b/drivers/crypto/dpaa2_sec/Makefile
index da3d8f84f..a61be49db 100644
--- a/drivers/crypto/dpaa2_sec/Makefile
+++ b/drivers/crypto/dpaa2_sec/Makefile
@@ -41,7 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_sec_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# library source files
SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC) += dpaa2_sec_dpseci.c
diff --git a/drivers/crypto/dpaa2_sec/meson.build b/drivers/crypto/dpaa2_sec/meson.build
index 01afc5877..8fa4827ed 100644
--- a/drivers/crypto/dpaa2_sec/meson.build
+++ b/drivers/crypto/dpaa2_sec/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/event/dpaa2/Makefile b/drivers/event/dpaa2/Makefile
index 5e1a63200..3f85dd2be 100644
--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc
# versioning export map
EXPORT_MAP := rte_pmd_dpaa2_event_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/event/dpaa2/meson.build b/drivers/event/dpaa2/meson.build
index de7a46155..c46b39e9d 100644
--- a/drivers/event/dpaa2/meson.build
+++ b/drivers/event/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/mempool/dpaa2/Makefile b/drivers/mempool/dpaa2/Makefile
index 9e4c87d79..4996a2cd1 100644
--- a/drivers/mempool/dpaa2/Makefile
+++ b/drivers/mempool/dpaa2/Makefile
@@ -19,7 +19,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_mempool_dpaa2_version.map
# Lbrary version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/mempool/dpaa2/meson.build b/drivers/mempool/dpaa2/meson.build
index 90bab6069..6b6ead617 100644
--- a/drivers/mempool/dpaa2/meson.build
+++ b/drivers/mempool/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
index 9b0b14331..1d46f7f25 100644
--- a/drivers/net/dpaa2/Makefile
+++ b/drivers/net/dpaa2/Makefile
@@ -25,7 +25,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/net/dpaa2/meson.build b/drivers/net/dpaa2/meson.build
index 213f0d72f..b34595258 100644
--- a/drivers/net/dpaa2/meson.build
+++ b/drivers/net/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/raw/dpaa2_cmdif/Makefile b/drivers/raw/dpaa2_cmdif/Makefile
index 9b863dda2..0dbe5c821 100644
--- a/drivers/raw/dpaa2_cmdif/Makefile
+++ b/drivers/raw/dpaa2_cmdif/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_rawdev
EXPORT_MAP := rte_pmd_dpaa2_cmdif_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_cmdif/meson.build b/drivers/raw/dpaa2_cmdif/meson.build
index 1d146872e..37bb24a1b 100644
--- a/drivers/raw/dpaa2_cmdif/meson.build
+++ b/drivers/raw/dpaa2_cmdif/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'bus_vdev']
sources = files('dpaa2_cmdif.c')
diff --git a/drivers/raw/dpaa2_qdma/Makefile b/drivers/raw/dpaa2_qdma/Makefile
index d88809ead..645220772 100644
--- a/drivers/raw/dpaa2_qdma/Makefile
+++ b/drivers/raw/dpaa2_qdma/Makefile
@@ -25,7 +25,7 @@ LDLIBS += -lrte_ring
EXPORT_MAP := rte_pmd_dpaa2_qdma_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index 2787d3028..44503331e 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -805,7 +805,7 @@ dpaa2_dpdmai_dev_uninit(struct rte_rawdev *rawdev)
DPAA2_QDMA_ERR("dmdmai disable failed");
/* Set up the DQRR storage for Rx */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq = &(dpdmai_dev->rx_queue[i]);
if (rxq->q_storage) {
@@ -856,17 +856,17 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
ret);
goto init_err;
}
- dpdmai_dev->num_queues = attr.num_of_priorities;
+ dpdmai_dev->num_queues = attr.num_of_queues;
/* Set up Rx Queues */
- for (i = 0; i < attr.num_of_priorities; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq;
memset(&rx_queue_cfg, 0, sizeof(struct dpdmai_rx_queue_cfg));
ret = dpdmai_set_rx_queue(&dpdmai_dev->dpdmai,
CMD_PRI_LOW,
dpdmai_dev->token,
- i, &rx_queue_cfg);
+ i, 0, &rx_queue_cfg);
if (ret) {
DPAA2_QDMA_ERR("Setting Rx queue failed with err: %d",
ret);
@@ -893,9 +893,9 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
}
/* Get Rx and Tx queues FQID's */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
ret = dpdmai_get_rx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &rx_attr);
+ dpdmai_dev->token, i, 0, &rx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
@@ -904,7 +904,7 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
dpdmai_dev->rx_queue[i].fqid = rx_attr.fqid;
ret = dpdmai_get_tx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &tx_attr);
+ dpdmai_dev->token, i, 0, &tx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
index c6a057806..0cbe90255 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
@@ -11,6 +11,8 @@ struct qdma_io_meta;
#define DPAA2_QDMA_MAX_FLE 3
#define DPAA2_QDMA_MAX_SDD 2
+#define DPAA2_DPDMAI_MAX_QUEUES 8
+
/** FLE pool size: 3 Frame list + 2 source/destination descriptor */
#define QDMA_FLE_POOL_SIZE (sizeof(struct qdma_io_meta) + \
sizeof(struct qbman_fle) * DPAA2_QDMA_MAX_FLE + \
@@ -142,9 +144,9 @@ struct dpaa2_dpdmai_dev {
/** Number of queue in this DPDMAI device */
uint8_t num_queues;
/** RX queues */
- struct dpaa2_queue rx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue rx_queue[DPAA2_DPDMAI_MAX_QUEUES];
/** TX queues */
- struct dpaa2_queue tx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue tx_queue[DPAA2_DPDMAI_MAX_QUEUES];
};
#endif /* __DPAA2_QDMA_H__ */
diff --git a/drivers/raw/dpaa2_qdma/meson.build b/drivers/raw/dpaa2_qdma/meson.build
index b6a081f11..2a4b69c16 100644
--- a/drivers/raw/dpaa2_qdma/meson.build
+++ b/drivers/raw/dpaa2_qdma/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'ring']
sources = files('dpaa2_qdma.c')
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
@ 2018-10-12 10:16 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-12 10:16 UTC (permalink / raw)
To: Shreyansh Jain; +Cc: dev, ferruh.yigit
12/10/2018 11:32, Shreyansh Jain:
> On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
> > About the series:
> >
> > This series of patches upgrades the DPAA2 driver firmware to
> > v10.10.10 (MC Firmware).
> > As the bus/fslmc is modified, it is a dependent object for other
> > drivers like net/crypto/qdma. Also, the changes are mostly tightly
> > linked - thus, the patches include upgrade as well as sequential
> > changes to driver.
> > Once done, it would imply that DPAA2 driver won't work with any MC
> > FW lower than 10.10.10.
> >
> > Support for this new firmware is available in publically available
> > LSDK (Layerscape SDK) release [1].
> >
> > Besides the FW change, there are other subtle changes as well:
> > - Support reading the MAC address from NIC device, rather than
> > using a default MAC
> > - Adding support for QBMan 5.0 FW APIs
> > - Some patches for NXP's LX2 platform specific features
> > - And some bug fixes.
> >
> > Dependency:
> >
> > * These patches are based on net-next/master 58c3b609699a8c
> > * Series [1] is logically related to this, but has no git/patch
> > related dependency. It is series for upgrade of DPAA.
> >
> > [1] https://lsdk.github.io/index.html
> > [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
> >
> > Version History:
> > v1->v2:
> > - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
> > first set of patches (MC firmware update) breaks the internal ABI
> > - Added support for ordered processing APIs. These APIs are expected
> > to be used in subseqent feature updates on DPAA2 ethernet driver.
> > - Some internal bug fixes.
> > (Patches increased from 11~15)
> >
>
> Hi Thomas,
>
> Would you be taking this series for RC1?
Yes
> (Ideally being driver code, this should have been with Ferruh but
> patchwork is showing your name).
Ferruh is taking patches for drivers/net/ and related.
This series is touching a lot more.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v17 0/6] enable hotplug on multi-process
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
@ 2018-10-16 0:16 1% ` Qi Zhang
2018-10-16 0:16 2% ` [dpdk-dev] [PATCH v17 2/6] eal: " Qi Zhang
1 sibling, 1 reply; 200+ results
From: Qi Zhang @ 2018-10-16 0:16 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
v17:
- fix format in release notes
- rework build_devargs
- fix devargs memory leak in rte_dev_hotplug_add
- always explicit if (<check> != 0)
- rename rte_dev_hotplug_mp_init to rte_mp_dev_hotplug_init and move
funciton claim to rte_dev.h
- comment reword
v16:
- rebase to patch "simplify parameters of hotplug functions"
http://patchwork.dpdk.org/patch/45463/ include:
* keep rte_eal_hotplug_add/rte_eal_hotplug_move unchanged.
* the IPC sync logic is moved to rte_dev_probe/rte_dev_remove.
* simplify the IPC message by removing busname and devname from
eal_dev_mp_req, since devargs string will encode those information
already.
- combined release notes with related code changes.
- replace do_ prefix to local_ for local process only probe/remove function.
- improve comments
v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.
v14:
- rebase.
- All changes belongs to patch 1/6.
1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
since it is only used by secondary process.
2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
I think its better to call rte_eth_dev_release_port_secondary after
dev_uninit since it is possible that secondary process need to release
some local resources in dev_uninit before release the port and return.
Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
because there is no special handle in all exist dev_uninit for secondary
process.
3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
general step, so we don't need patches for i40e and ixgbe.
4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
when detaching a device. It is possible that device is not synced during
some situation, so attach an exist device in primary still need to sync
with secondary. Also, it's not necessary to rollback if we fail to
attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.
v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
so, modify the sample code to use rte_eal_hotplug_add and
rte_eal_hotplug_remove to attach/detach device.
v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.
v11: - move out common code from pci_vfio_unmap_secondary and
pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
to patch 5/19 to improve code readability.
v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
all processes now, it is not necessary to support private vdev for
a secondary process which is identified by a not-NULL devargs in
"--vdev". So re-work on all vdev driver changes to simpified device
probe scenario on a secondary process, devargs will be ignored on
secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.
v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
1) The device can't be detached clearly in a secondary process, which
also cause it can't be attached again, due to the error that
/dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
2) repeat detach/attach device will cause "cannot find TAILQ entry
for PCI device" due to incorrect PCI address compare.
(see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue
v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.
NOTE:
Some issues which is not related with this patchset is expected when
play with hotplug_mp sample as belows.
- Attach a PCI device twice may cause device can't be detached
below fix is required:
https://patches.dpdk.org/patch/42030/
- ixgbe device can't detached, below fix is required
https://patches.dpdk.org/patch/42031/
v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.
NOTE: current patchset depends on below IPC fix, or it may not be able
to attach a shared vdev.
https://patches.dpdk.org/patch/41647/
v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
secondary process, so we don't need to patch on PMD if PMD use the
default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
thread.
v5:
- since we will keep mp thread separate from interrupt thread,
it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
way to prevent rte_eth_dev_release_port be called after
rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
previous re-work
- fix the issue when the first attached device is a private device from
secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
an open and in discussion as below.
https://mails.dpdk.org/archives/dev/2018-June/105359.html
v4:
- since mp thread will be merged to interrupt thread, the fix on v3
for sync IPC deadlock will not work. the new version enable the
machanism to invoke a mp action callback in a temporary thread to
avoid the IPC deadlock, with this, secondary to primary request
impelemtation also be simplified, since we can use sync request
directly in a separate thread.
v3:
- enable mp init callback register to help non-eal module to initialize
mp channel during rte_eal_init
- fix when attach share device from secondary.
1) dead lock due to sync IPC be invoked in rte_malloc in primary
process when handle secondary request to attach device, the
solution is primary process to issue share device attach/detach
in interrupt thread.
2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
1) rename sample to "hotplug_mp", and move to example/multi-process.
2) cleanup header include.
3) call rte_eal_cleanup before exit.
v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling
- fix meson.build.
- improve comments.
Background:
===========
Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.
While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.
Solution:
=========
So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
Scenario for Case 1, 2:
attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed
Scenario for case 3, 4:
attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
goto i).
c) primary forward attach request to all secondary as async request
(because this in mp thread context, use sync request will deadlock,
same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
detach device from secondary:
a) secondary send async request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and perform pre-detach check, if device
is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.
APIs chenages:
==============
scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.
PMD Impact:
===========
Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required
Example:
========
The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.
/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto
Command Line Example:
>help
>list
/* attach a pci device */
> attach 0000:81:00.0
/* detach the pci device */
> detach 0000:81:00.0
/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0
/* detach the vdev af_packet device */
> detach net_af_packet
Qi Zhang (6):
ethdev: add function to release port in secondary process
eal: enable hotplug on multi-process
eal: support attach or detach share device from secondary
drivers/net: enable hotplug on secondary process
drivers/net: enable device detach on secondary
examples/multi_process: add hotplug sample
doc/guides/rel_notes/release_18_11.rst | 13 +
drivers/net/af_packet/rte_eth_af_packet.c | 6 +-
drivers/net/bnxt/bnxt_ethdev.c | 6 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 6 +-
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/kni/rte_eth_kni.c | 6 +-
drivers/net/liquidio/lio_ethdev.c | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +
drivers/net/pcap/rte_eth_pcap.c | 6 +-
drivers/net/tap/rte_eth_tap.c | 8 +-
drivers/net/vhost/rte_eth_vhost.c | 6 +-
drivers/net/virtio/virtio_ethdev.c | 2 +-
examples/multi_process/Makefile | 1 +
examples/multi_process/hotplug_mp/Makefile | 23 ++
examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++
examples/multi_process/hotplug_mp/commands.h | 10 +
examples/multi_process/hotplug_mp/main.c | 41 +++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 254 ++++++++++++++--
lib/librte_eal/common/eal_private.h | 22 ++
lib/librte_eal/common/hotplug_mp.c | 426 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++
lib/librte_eal/common/include/rte_dev.h | 12 +
lib/librte_eal/common/include/rte_eal.h | 9 +
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
lib/librte_ethdev/rte_ethdev.c | 17 +-
lib/librte_ethdev/rte_ethdev_driver.h | 16 +-
lib/librte_ethdev/rte_ethdev_pci.h | 10 +-
lib/librte_ethdev/rte_ethdev_version.map | 7 +
32 files changed, 1151 insertions(+), 43 deletions(-)
create mode 100644 examples/multi_process/hotplug_mp/Makefile
create mode 100644 examples/multi_process/hotplug_mp/commands.c
create mode 100644 examples/multi_process/hotplug_mp/commands.h
create mode 100644 examples/multi_process/hotplug_mp/main.c
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
--
2.13.6
^ permalink raw reply [relevance 1%]
* [dpdk-dev] [PATCH v17 2/6] eal: enable hotplug on multi-process
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
@ 2018-10-16 0:16 2% ` Qi Zhang
0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-10-16 0:16 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.
IPC scenario for Case 1, 2:
attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 13 ++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 254 +++++++++++++++++++++++++++++---
lib/librte_eal/common/eal_private.h | 22 +++
lib/librte_eal/common/hotplug_mp.c | 221 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 ++++++
lib/librte_eal/common/include/rte_dev.h | 12 ++
lib/librte_eal/common/include/rte_eal.h | 9 ++
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
11 files changed, 567 insertions(+), 19 deletions(-)
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 436b20e2b..da2236fea 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -159,6 +159,13 @@ New Features
this application doesn't need to launch dedicated worker threads for vhost
enqueue/dequeue operations.
+* **Support device multi-process hotplug.**
+
+ Hotplug and hot-unplug for devices will now be supported in multiprocessing
+ scenario. Any ethdev devices created in the primary process will be regarded
+ as shared and will be available for all DPDK processes. Synchronization
+ between processes will be done using DPDK IPC.
+
API Changes
-----------
@@ -213,6 +220,12 @@ API Changes
* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
has been changed from uint8_t to uint16_t.
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+ In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+ that device be attached on all processes, while ``rte_eal_hotplug_remove``
+ will guarantee device be detached on all processes.
+
ABI Changes
-----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index 97bff4852..6e9bc02c5 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -62,6 +62,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_mp.c
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 7663eaa3f..2209f8843 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -19,8 +19,10 @@
#include <rte_log.h>
#include <rte_spinlock.h>
#include <rte_malloc.h>
+#include <rte_string_fns.h>
#include "eal_private.h"
+#include "hotplug_mp.h"
/**
* The device event callback description.
@@ -127,37 +129,61 @@ int rte_eal_dev_detach(struct rte_device *dev)
return ret;
}
-int
-rte_eal_hotplug_add(const char *busname, const char *devname,
- const char *drvargs)
+/* helper funciton to build devargs, caller should free the memory */
+static int
+build_devargs(const char *busname, const char *devname,
+ const char *drvargs, char **devargs)
{
- int ret;
- char *devargs = NULL;
int length;
+ char *da;
length = snprintf(NULL, 0, "%s:%s,%s", busname, devname, drvargs);
+
if (length < 0)
return -EINVAL;
- devargs = malloc(length + 1);
- if (devargs == NULL)
+
+ da = malloc(length + 1);
+ if (da == NULL)
return -ENOMEM;
- ret = snprintf(devargs, length + 1, "%s:%s,%s", busname, devname, drvargs);
- if (ret < 0)
+
+ if (snprintf(da, length + 1, "%s:%s,%s",
+ busname, devname, drvargs) < 0) {
+ free(da);
return -EINVAL;
+ }
- ret = rte_dev_probe(devargs);
+ *devargs = da;
+ return 0;
+}
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+ const char *drvargs)
+{
+
+ char *devargs;
+ int ret;
+
+ ret = build_devargs(busname, devname, drvargs, &devargs);
+
+ if (ret != 0)
+ return ret;
+
+ ret = rte_dev_probe(devargs);
free(devargs);
+
return ret;
}
-int __rte_experimental
-rte_dev_probe(const char *devargs)
+/* probe device at local process. */
+int
+local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
struct rte_device *dev;
struct rte_devargs *da;
int ret;
+ *new_dev = NULL;
da = calloc(1, sizeof(*da));
if (da == NULL)
return -ENOMEM;
@@ -174,11 +200,11 @@ rte_dev_probe(const char *devargs)
}
ret = rte_devargs_insert(da);
- if (ret)
+ if (ret != 0)
goto err_devarg;
ret = da->bus->scan();
- if (ret)
+ if (ret != 0)
goto err_devarg;
dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
@@ -195,11 +221,13 @@ rte_dev_probe(const char *devargs)
}
ret = dev->bus->plug(dev);
- if (ret) {
+ if (ret != 0) {
RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
dev->name);
goto err_devarg;
}
+
+ *new_dev = dev;
return 0;
err_devarg:
@@ -231,8 +259,9 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
return rte_dev_remove(dev);
}
-int __rte_experimental
-rte_dev_remove(struct rte_device *dev)
+/* remove device at local process. */
+int
+local_dev_remove(struct rte_device *dev)
{
int ret;
@@ -248,10 +277,197 @@ rte_dev_remove(struct rte_device *dev)
}
ret = dev->bus->unplug(dev);
- if (ret)
+ if (ret != 0)
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
dev->name);
- rte_devargs_remove(dev->devargs);
+ else
+ rte_devargs_remove(dev->devargs);
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_probe(const char *devargs)
+{
+ struct eal_dev_mp_req req;
+ struct rte_device *dev;
+ int ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_ATTACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result != 0)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug add device\n");
+ return req.result;
+ }
+
+ /* attach a shared device from primary start from here: */
+
+ /* primary attach the new device itself. */
+ ret = local_dev_probe(devargs, &dev);
+
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on primary process\n");
+
+ /**
+ * it is possible that secondary process failed to attached a
+ * device that primary process have during initialization,
+ * so for -EEXIST case, we still need to sync with secondary
+ * process.
+ */
+ if (ret != -EEXIST)
+ return ret;
+ }
+
+ /* primary send attach sync request to secondary. */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /* if any communication error, we need to rollback. */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug add request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to attach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on secondary process\n");
+ ret = req.result;
+
+ /* for -EEXIST, we don't need to rollback. */
+ if (ret == -EEXIST)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
+ /* primary rollback itself. */
+ if (local_dev_remove(dev) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on primary."
+ "Devices in secondary may not sync with primary\n");
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_remove(struct rte_device *dev)
+{
+ struct eal_dev_mp_req req;
+ char *devargs;
+ int ret;
+
+ ret = build_devargs(dev->devargs->bus->name, dev->name, "", &devargs);
+ if (ret != 0)
+ return ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_DETACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+ free(devargs);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result != 0)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug remove device\n");
+ return req.result;
+ }
+
+ /* detach a device from primary start from here: */
+
+ /* primary send detach sync request to secondary */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /**
+ * if communication error, we need to rollback, because it is possible
+ * part of the secondary processes still detached it successfully.
+ */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send device detach request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to detach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on secondary process\n");
+ ret = req.result;
+ /**
+ * if -ENOENT, we don't need to rollback, since devices is
+ * already detached on secondary process.
+ */
+ if (ret != -ENOENT)
+ goto rollback;
+ }
+
+ /* primary detach the device itself. */
+ ret = local_dev_remove(dev);
+
+ /* if primary failed, still need to consider if rollback is necessary */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on primary process\n");
+ /* if -ENOENT, we don't need to rollback */
+ if (ret == -ENOENT)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device detach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
return ret;
}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a83c..2ad94e435 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,26 @@ int
rte_devargs_layers_parse(struct rte_devargs *devargs,
const char *devstr);
+/*
+ * probe a device at local process.
+ *
+ * @param devargs
+ * Device arguments including bus, class and driver properties.
+ * @param new_dev
+ * new device be probed as output.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_probe(const char *devargs, struct rte_device **new_dev);
+
+/**
+ * Hotplug remove a given device from a specific bus at local process.
+ *
+ * @param dev
+ * Data structure of the device to remove.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_remove(struct rte_device *dev);
+
#endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
new file mode 100644
index 000000000..92d8f50d3
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <string.h>
+
+#include <rte_eal.h>
+#include <rte_alarm.h>
+#include <rte_string_fns.h>
+#include <rte_devargs.h>
+
+#include "hotplug_mp.h"
+#include "eal_private.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+{
+ const char *name = _name;
+
+ return strcmp(dev->name, name);
+}
+
+struct mp_reply_bundle {
+ struct rte_mp_msg msg;
+ void *peer;
+};
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ RTE_SET_USED(msg);
+ RTE_SET_USED(peer);
+ return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+ struct mp_reply_bundle *bundle = param;
+ struct rte_mp_msg *msg = &bundle->msg;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct rte_mp_msg mp_resp;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct rte_devargs *da;
+ struct rte_device *dev;
+ struct rte_bus *bus;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+
+ switch (req->t) {
+ case EAL_DEV_REQ_TYPE_ATTACH:
+ case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
+ ret = local_dev_probe(req->devargs, &dev);
+ break;
+ case EAL_DEV_REQ_TYPE_DETACH:
+ case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
+ da = calloc(1, sizeof(*da));
+ if (da == NULL) {
+ ret = -ENOMEM;
+ goto quit;
+ }
+
+ ret = rte_devargs_parse(da, req->devargs);
+ if (ret != 0)
+ goto quit;
+
+ bus = rte_bus_find_by_name(da->bus->name);
+ if (bus == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", da->bus->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ dev = bus->find_device(NULL, cmp_dev_name, da->name);
+ if (dev == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find plugged device (%s)\n", da->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ ret = local_dev_remove(dev);
+quit:
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+ resp->result = ret;
+ if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+
+ free(bundle->peer);
+ free(bundle);
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ struct rte_mp_msg mp_resp;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct mp_reply_bundle *bundle;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+
+ bundle = calloc(1, sizeof(*bundle));
+ if (bundle == NULL) {
+ resp->result = -ENOMEM;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+
+ bundle->msg = *msg;
+ /**
+ * We need to send reply on interrupt thread, but peer can't be
+ * parsed directly, so this is a temporal hack, need to be fixed
+ * when it is ready.
+ */
+ bundle->peer = (void *)strdup(peer);
+
+ /**
+ * We are at IPC callback thread, sync IPC is not allowed due to
+ * dead lock, so we delegate the task to interrupt thread.
+ */
+ ret = rte_eal_alarm_set(1, __handle_primary_request, bundle);
+ if (ret != 0) {
+ resp->result = ret;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+ }
+ return 0;
+}
+
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req)
+{
+ RTE_SET_USED(req);
+ return -ENOTSUP;
+}
+
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
+{
+ struct rte_mp_msg mp_req;
+ struct rte_mp_reply mp_reply;
+ struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+ int ret;
+ int i;
+
+ memset(&mp_req, 0, sizeof(mp_req));
+ memcpy(mp_req.param, req, sizeof(*req));
+ mp_req.len_param = sizeof(*req);
+ strlcpy(mp_req.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+ ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "rte_mp_request_sync failed\n");
+ return ret;
+ }
+
+ if (mp_reply.nb_sent != mp_reply.nb_received) {
+ RTE_LOG(ERR, EAL, "not all secondary reply\n");
+ return -1;
+ }
+
+ req->result = 0;
+ for (i = 0; i < mp_reply.nb_received; i++) {
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_reply.msgs[i].param;
+ if (resp->result != 0) {
+ req->result = resp->result;
+ if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+ req->result != -EEXIST)
+ break;
+ if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
+ req->result != -ENOENT)
+ break;
+ }
+ }
+
+ return 0;
+}
+
+int rte_mp_dev_hotplug_init(void)
+{
+ int ret;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_secondary_request);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ } else {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_primary_request);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ }
+
+ return 0;
+}
diff --git a/lib/librte_eal/common/hotplug_mp.h b/lib/librte_eal/common/hotplug_mp.h
new file mode 100644
index 000000000..597fde3d7
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _HOTPLUG_MP_H_
+#define _HOTPLUG_MP_H_
+
+#include "rte_dev.h"
+#include "rte_bus.h"
+
+#define EAL_DEV_MP_ACTION_REQUEST "eal_dev_mp_request"
+#define EAL_DEV_MP_ACTION_RESPONSE "eal_dev_mp_response"
+
+#define EAL_DEV_MP_DEV_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
+#define EAL_DEV_MP_BUS_NAME_MAX_LEN 32
+#define EAL_DEV_MP_DEV_ARGS_MAX_LEN 128
+
+enum eal_dev_req_type {
+ EAL_DEV_REQ_TYPE_ATTACH,
+ EAL_DEV_REQ_TYPE_DETACH,
+ EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK,
+ EAL_DEV_REQ_TYPE_DETACH_ROLLBACK,
+};
+
+struct eal_dev_mp_req {
+ enum eal_dev_req_type t;
+ char devargs[EAL_DEV_MP_DEV_ARGS_MAX_LEN];
+ int result;
+};
+
+/**
+ * This is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request is issued from primary process.
+ */
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req);
+
+
+#endif /* _HOTPLUG_MP_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 036180ff3..696cf7bbe 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -192,6 +192,9 @@ int rte_eal_dev_detach(struct rte_device *dev);
/**
* Hotplug add a given device to a specific bus.
*
+ * In multi-process, it will request other processes to add the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param busname
* The bus name the device is added to.
* @param devname
@@ -211,6 +214,9 @@ int rte_eal_hotplug_add(const char *busname, const char *devname,
*
* Add matching devices.
*
+ * In multi-process, it will request other processes to add the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param devargs
* Device arguments including bus, class and driver properties.
* @return
@@ -221,6 +227,9 @@ int __rte_experimental rte_dev_probe(const char *devargs);
/**
* Hotplug remove a given device from a specific bus.
*
+ * In multi-process, it will request other processes to remove the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param busname
* The bus name the device is removed from.
* @param devname
@@ -236,6 +245,9 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
*
* Remove one device.
*
+ * In multi-process, it will request other processes to remove the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param dev
* Data structure of the device to remove.
* @return
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index e114dcbdc..3ee897c1d 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -378,6 +378,15 @@ int __rte_experimental
rte_mp_reply(struct rte_mp_msg *msg, const char *peer);
/**
+ * Register all mp action callbacks for hotplug.
+ *
+ * @return
+ * 0 on success, negative on error.
+ */
+int __rte_experimental
+rte_mp_dev_hotplug_init(void);
+
+/**
* Usage function typedef used by the application usage function.
*
* Use this function typedef to define and call rte_set_application_usage_hook()
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index b7fc98499..04c414356 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -28,6 +28,7 @@ common_sources = files(
'eal_common_thread.c',
'eal_common_timer.c',
'eal_common_uuid.c',
+ 'hotplug_mp.c',
'malloc_elem.c',
'malloc_heap.c',
'malloc_mp.c',
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5c16bc40f..736bc6569 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -70,6 +70,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_mp.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 950f33f2c..d342a04f0 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -888,6 +888,12 @@ rte_eal_init(int argc, char **argv)
}
}
+ /* register multi-process action callbacks for hotplug */
+ if (rte_mp_dev_hotplug_init() < 0) {
+ rte_eal_init_alert("failed to register mp callback for hotplug\n");
+ return -1;
+ }
+
if (rte_bus_scan()) {
rte_eal_init_alert("Cannot scan the buses for devices\n");
rte_errno = ENODEV;
--
2.13.6
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v2 1/2] eal: add API that sleeps while waiting for threads
@ 2018-10-16 8:42 3% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-10-16 8:42 UTC (permalink / raw)
To: Yigit, Ferruh, Richardson, Bruce; +Cc: dev, Yigit, Ferruh, stephen
HI Ferruh,
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Monday, October 15, 2018 11:21 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Yigit, Ferruh <ferruh.yigit@intel.com>; stephen@networkplumber.org
> Subject: [dpdk-dev] [PATCH v2 1/2] eal: add API that sleeps while waiting for threads
>
> It is common that sample applications call rte_eal_wait_lcore() while
> waiting for worker threads to be terminated.
> Mostly master lcore keeps waiting in this function.
>
> The waiting app for termination is not a time critical task, app can
> prefer a sleep version of the waiting to consume less cycles.
>
> A sleeping version of the API, rte_eal_wait_lcore_sleep(), has been
> added which uses pthread conditions.
>
> Sample applications will be updated later to use this API.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> v2:
> * use pthread cond instead of usleep
> ---
> lib/librte_eal/bsdapp/eal/eal.c | 3 +++
> lib/librte_eal/bsdapp/eal/eal_thread.c | 7 ++++++
> lib/librte_eal/common/eal_common_launch.c | 22 ++++++++++++++++++
> lib/librte_eal/common/include/rte_launch.h | 26 ++++++++++++++++++++++
> lib/librte_eal/common/include/rte_lcore.h | 3 +++
> lib/librte_eal/linuxapp/eal/eal.c | 3 +++
> lib/librte_eal/linuxapp/eal/eal_thread.c | 7 ++++++
> lib/librte_eal/rte_eal_version.map | 1 +
> 8 files changed, 72 insertions(+)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 7735194a3..e7d676657 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -756,6 +756,9 @@ rte_eal_init(int argc, char **argv)
> snprintf(thread_name, sizeof(thread_name),
> "lcore-slave-%d", i);
> rte_thread_setname(lcore_config[i].thread_id, thread_name);
> +
> + pthread_mutex_init(&rte_eal_thread_mutex[i], NULL);
> + pthread_cond_init(&rte_eal_thread_cond[i], NULL);
> }
>
> /*
> diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c
> index 309b58726..60db32d57 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> @@ -28,6 +28,9 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY;
> RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
>
> +pthread_cond_t rte_eal_thread_cond[RTE_MAX_LCORE];
> +pthread_mutex_t rte_eal_thread_mutex[RTE_MAX_LCORE];
I think would be better to include cond and mutex into struct lcore_config itself,
probably would help to avoid false sharing.
Though yeh, it would mean ABI breakage, I suppose.
> +
> /*
> * Send a message to a slave lcore identified by slave_id to call a
> * function f with argument arg. Once the execution is done, the
> @@ -154,6 +157,10 @@ eal_thread_loop(__attribute__((unused)) void *arg)
> lcore_config[lcore_id].ret = ret;
> rte_wmb();
> lcore_config[lcore_id].state = FINISHED;
> +
> + pthread_mutex_lock(&rte_eal_thread_mutex[lcore_id]);
> + pthread_cond_signal(&rte_eal_thread_cond[lcore_id]);
> + pthread_mutex_unlock(&rte_eal_thread_mutex[lcore_id]);
I understand it would work that way too, but if you introduce mutex and cond around
the state, then it is better to manipulate/access the state after grabbing the mutex.
BTW in that case we don't need wmb:
lcore_config[lcore_id].ret = ret;
pthread_mutex_lock(...);
lcore_config[lcore_id].state = FINISHED;
pthread_cond_signal(..);
pthread_mutex_unlock(...);
Konstantin
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] doc: show internal functions in doxygen
@ 2018-10-19 7:39 3% ` Ferruh Yigit
2018-10-22 6:15 0% ` Shreyansh Jain
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-19 7:39 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, john.mcnamara, marko.kovacevic
On 10/18/2018 6:04 PM, Thomas Monjalon wrote:
> 18/10/2018 18:22, Ferruh Yigit:
>> On 10/18/2018 5:08 PM, Thomas Monjalon wrote:
>>> Not sure we want to show the internal functions to users.
>>> It may be useful only for PMD developers.
>>> Do we vote? +1 / -1 welcome!
>>
>> What is affected from this setting, can you give an example what was not shown
>> will be shown now?
>
> For instance, most of the things in rte_ethdev_core.h.
> All the doxygen with @internal tag are affected.
rte_ethdev_core.h is not part of API documentation but I randomly checked
rte_lpm.h which has some @internal structures.
But those in the lpm header is the ones for ABI versioning, I think it is
confusing to expose them to the user, and documentation doesn't highlight that
it is internal.
So not a strong opinion, but from my side -1
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] doc: show internal functions in doxygen
2018-10-19 7:39 3% ` Ferruh Yigit
@ 2018-10-22 6:15 0% ` Shreyansh Jain
0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-22 6:15 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, john.mcnamara, marko.kovacevic
On Friday 19 October 2018 01:09 PM, Ferruh Yigit wrote:
> On 10/18/2018 6:04 PM, Thomas Monjalon wrote:
>> 18/10/2018 18:22, Ferruh Yigit:
>>> On 10/18/2018 5:08 PM, Thomas Monjalon wrote:
>>>> Not sure we want to show the internal functions to users.
>>>> It may be useful only for PMD developers.
>>>> Do we vote? +1 / -1 welcome!
>>>
>>> What is affected from this setting, can you give an example what was not shown
>>> will be shown now?
>>
>> For instance, most of the things in rte_ethdev_core.h.
>> All the doxygen with @internal tag are affected.
>
> rte_ethdev_core.h is not part of API documentation but I randomly checked
> rte_lpm.h which has some @internal structures.
>
> But those in the lpm header is the ones for ABI versioning, I think it is
> confusing to expose them to the user, and documentation doesn't highlight that
> it is internal.
>
> So not a strong opinion, but from my side -1
>
-1 from me as well.
Even I think it would be overload of information in Doxygen. And to add,
some places might require re-documenting to cleanup internal markers.
My opinion: direct code would help better than doxygen for these cases.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [RFC 00/14] prefix network structures
@ 2018-10-24 8:18 1% Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
` (3 more replies)
0 siblings, 4 replies; 200+ results
From: Olivier Matz @ 2018-10-24 8:18 UTC (permalink / raw)
To: dev
This RFC targets 19.02.
The rte_net headers conflict with the libc headers, because
some definitions are duplicated, sometimes with few differences.
This was discussed in [1], and more recently at the techboard.
Before sending the deprecation notice (target for this is 18.11),
here is a draft that can be discussed.
This RFC adds the rte_ (or RTE_) prefix to all structures, functions
and defines in rte_net library. This is a big changeset, that will
break the API of many functions, but not the ABI.
One question I'm asking is how can we manage the transition.
Initially, I hoped it was possible to have a compat layer during
one release (supporting both prefixed and unprefixed names), but
now that the patch is done, it seems the impact is too big, and
impacts too many libraries.
Few examples:
- rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
- many rte_flow structures use the rte_ prefixed net structures
- the mac field of virtio_net structure is rte_ether_addr
- the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
...
Therefore, it is clear that doing this would break the compilation
of many external applications.
Another drawback we need to take in account: it will make the
backport of patches more difficult, although this is something
that could be tempered by a script.
While it is obviously better to have a good namespace convention,
we need to identify the issues we have today before deciding it's
worth doing the change.
Comments?
Things that are missing in RFC:
- test with FreeBSD
- manually fix some indentation issues
Olivier Matz (14):
net: add rte prefix to arp structures
net: add rte prefix to arp defines
net: add rte prefix to ether structures
net: add rte prefix to ether functions
net: add rte prefix to ether defines
net: add rte prefix to esp structure
net: add rte prefix to gre structure
net: add rte prefix to icmp structure
net: add rte prefix to icmp defines
net: add rte prefix to ip structure
net: add rte prefix to ip defines
net: add rte prefix to sctp structure
net: add rte prefix to tcp structure
net: add rte prefix to udp structure
app/pdump/main.c | 2 +-
app/test-eventdev/test_perf_common.c | 2 +-
app/test-eventdev/test_pipeline_common.c | 2 +-
app/test-pmd/cmdline.c | 66 ++---
app/test-pmd/cmdline_flow.c | 10 +-
app/test-pmd/config.c | 34 +--
app/test-pmd/csumonly.c | 156 +++++-----
app/test-pmd/flowgen.c | 30 +-
app/test-pmd/icmpecho.c | 120 ++++----
app/test-pmd/ieee1588fwd.c | 18 +-
app/test-pmd/macfwd.c | 12 +-
app/test-pmd/macswap.c | 16 +-
app/test-pmd/parameters.c | 6 +-
app/test-pmd/testpmd.c | 22 +-
app/test-pmd/testpmd.h | 18 +-
app/test-pmd/txonly.c | 36 +--
app/test-pmd/util.c | 34 +--
doc/guides/prog_guide/bbdev.rst | 6 +-
.../prog_guide/packet_classif_access_ctrl.rst | 18 +-
doc/guides/prog_guide/rte_flow.rst | 4 +-
doc/guides/sample_app_ug/flow_classify.rst | 28 +-
doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
doc/guides/sample_app_ug/ip_frag.rst | 16 +-
doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
.../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
doc/guides/sample_app_ug/l3_forward.rst | 12 +-
doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
doc/guides/sample_app_ug/ptpclient.rst | 6 +-
doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
doc/guides/sample_app_ug/skeleton.rst | 4 +-
doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
drivers/bus/dpaa/base/fman/fman.c | 2 +-
drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
drivers/bus/dpaa/include/fman.h | 2 +-
drivers/bus/dpaa/include/netcfg.h | 4 +-
drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
drivers/net/ark/ark_ethdev.c | 16 +-
drivers/net/ark/ark_ext.h | 4 +-
drivers/net/ark/ark_global.h | 4 +-
drivers/net/atlantic/atl_ethdev.c | 20 +-
drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
drivers/net/avf/avf.h | 4 +-
drivers/net/avf/avf_ethdev.c | 50 ++--
drivers/net/avf/avf_rxtx.c | 14 +-
drivers/net/avf/avf_vchnl.c | 8 +-
drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
drivers/net/avf/base/avf_common.c | 12 +-
drivers/net/avf/base/avf_prototype.h | 4 +-
drivers/net/avp/avp_ethdev.c | 20 +-
drivers/net/avp/rte_avp_common.h | 2 +-
drivers/net/axgbe/axgbe_dev.c | 4 +-
drivers/net/axgbe/axgbe_ethdev.c | 10 +-
drivers/net/axgbe/axgbe_ethdev.h | 4 +-
drivers/net/axgbe/axgbe_rxtx.c | 2 +-
drivers/net/bnx2x/bnx2x.c | 16 +-
drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
drivers/net/bnx2x/ecore_sp.h | 2 +-
drivers/net/bnxt/bnxt.h | 4 +-
drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
drivers/net/bnxt/bnxt_filter.c | 4 +-
drivers/net/bnxt/bnxt_filter.h | 8 +-
drivers/net/bnxt/bnxt_flow.c | 26 +-
drivers/net/bnxt/bnxt_hwrm.c | 40 +--
drivers/net/bnxt/bnxt_hwrm.h | 2 +-
drivers/net/bnxt/bnxt_ring.c | 8 +-
drivers/net/bnxt/bnxt_rxq.c | 2 +-
drivers/net/bnxt/bnxt_rxr.c | 2 +-
drivers/net/bnxt/bnxt_vnic.c | 2 +-
drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
drivers/net/bonding/rte_eth_bond.h | 2 +-
drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
drivers/net/bonding/rte_eth_bond_api.c | 2 +-
drivers/net/bonding/rte_eth_bond_args.c | 2 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
drivers/net/bonding/rte_eth_bond_private.h | 6 +-
drivers/net/cxgbe/base/adapter.h | 6 +-
drivers/net/cxgbe/base/t4_hw.c | 8 +-
drivers/net/cxgbe/cxgbe.h | 4 +-
drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
drivers/net/cxgbe/cxgbe_filter.h | 2 +-
drivers/net/cxgbe/cxgbe_flow.c | 10 +-
drivers/net/cxgbe/cxgbe_main.c | 4 +-
drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
drivers/net/cxgbe/cxgbevf_main.c | 2 +-
drivers/net/cxgbe/l2t.c | 8 +-
drivers/net/cxgbe/l2t.h | 2 +-
drivers/net/cxgbe/mps_tcam.c | 14 +-
drivers/net/cxgbe/mps_tcam.h | 4 +-
drivers/net/cxgbe/sge.c | 8 +-
drivers/net/dpaa/dpaa_ethdev.c | 20 +-
drivers/net/dpaa/dpaa_rxtx.c | 22 +-
drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
drivers/net/e1000/e1000_ethdev.h | 2 +-
drivers/net/e1000/em_ethdev.c | 34 +--
drivers/net/e1000/em_rxtx.c | 22 +-
drivers/net/e1000/igb_ethdev.c | 70 ++---
drivers/net/e1000/igb_flow.c | 12 +-
drivers/net/e1000/igb_pf.c | 16 +-
drivers/net/e1000/igb_rxtx.c | 18 +-
drivers/net/ena/ena_ethdev.c | 16 +-
drivers/net/ena/ena_ethdev.h | 2 +-
drivers/net/enetc/base/enetc_hw.h | 4 +-
drivers/net/enetc/enetc_ethdev.c | 6 +-
drivers/net/enic/enic.h | 2 +-
drivers/net/enic/enic_clsf.c | 40 +--
drivers/net/enic/enic_ethdev.c | 4 +-
drivers/net/enic/enic_flow.c | 100 +++----
drivers/net/enic/enic_main.c | 2 +-
drivers/net/enic/enic_res.c | 4 +-
drivers/net/failsafe/failsafe.c | 6 +-
drivers/net/failsafe/failsafe_args.c | 4 +-
drivers/net/failsafe/failsafe_ether.c | 6 +-
drivers/net/failsafe/failsafe_ops.c | 6 +-
drivers/net/failsafe/failsafe_private.h | 4 +-
drivers/net/fm10k/fm10k.h | 2 +-
drivers/net/fm10k/fm10k_ethdev.c | 18 +-
drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
drivers/net/i40e/base/i40e_common.c | 12 +-
drivers/net/i40e/base/i40e_prototype.h | 4 +-
drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
drivers/net/i40e/i40e_ethdev.h | 22 +-
drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
drivers/net/i40e/i40e_fdir.c | 126 ++++----
drivers/net/i40e/i40e_flow.c | 58 ++--
drivers/net/i40e/i40e_pf.c | 18 +-
drivers/net/i40e/i40e_rxtx.c | 28 +-
drivers/net/i40e/i40e_vf_representor.c | 2 +-
drivers/net/i40e/rte_pmd_i40e.c | 30 +-
drivers/net/i40e/rte_pmd_i40e.h | 8 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_flow.c | 22 +-
drivers/net/ixgbe/ixgbe_pf.c | 14 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
drivers/net/kni/rte_eth_kni.c | 4 +-
drivers/net/liquidio/lio_ethdev.c | 22 +-
drivers/net/mlx4/mlx4.c | 4 +-
drivers/net/mlx4/mlx4.h | 8 +-
drivers/net/mlx4/mlx4_ethdev.c | 8 +-
drivers/net/mlx4/mlx4_flow.c | 14 +-
drivers/net/mlx4/mlx4_rxtx.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/mlx5/mlx5.h | 14 +-
drivers/net/mlx5/mlx5_flow.c | 22 +-
drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
drivers/net/mlx5/mlx5_mac.c | 18 +-
drivers/net/mlx5/mlx5_nl.c | 28 +-
drivers/net/mlx5/mlx5_rxtx.c | 6 +-
drivers/net/mlx5/mlx5_rxtx.h | 2 +-
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
drivers/net/mlx5/mlx5_trigger.c | 6 +-
drivers/net/mvneta/mvneta_ethdev.c | 22 +-
drivers/net/mvneta/mvneta_ethdev.h | 2 +-
drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
drivers/net/mvpp2/mrvl_flow.c | 4 +-
drivers/net/netvsc/hn_ethdev.c | 4 +-
drivers/net/netvsc/hn_nvs.c | 2 +-
drivers/net/netvsc/hn_rndis.c | 2 +-
drivers/net/netvsc/hn_rxtx.c | 12 +-
drivers/net/netvsc/hn_var.h | 4 +-
drivers/net/netvsc/hn_vf.c | 12 +-
drivers/net/nfp/nfp_net.c | 20 +-
drivers/net/nfp/nfp_net_pmd.h | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +-
drivers/net/octeontx/octeontx_ethdev.h | 2 +-
drivers/net/pcap/rte_eth_pcap.c | 22 +-
drivers/net/qede/base/bcm_osal.h | 2 +-
drivers/net/qede/base/ecore_dev.c | 4 +-
drivers/net/qede/qede_ethdev.c | 58 ++--
drivers/net/qede/qede_ethdev.h | 6 +-
drivers/net/qede/qede_filter.c | 66 ++---
drivers/net/qede/qede_if.h | 4 +-
drivers/net/qede/qede_main.c | 6 +-
drivers/net/qede/qede_rxtx.c | 32 +-
drivers/net/qede/qede_rxtx.h | 2 +-
drivers/net/ring/rte_eth_ring.c | 4 +-
drivers/net/sfc/sfc.h | 2 +-
drivers/net/sfc/sfc_ef10_tx.c | 8 +-
drivers/net/sfc/sfc_ethdev.c | 20 +-
drivers/net/sfc/sfc_flow.c | 12 +-
drivers/net/sfc/sfc_port.c | 8 +-
drivers/net/sfc/sfc_tso.c | 8 +-
drivers/net/softnic/parser.c | 18 +-
drivers/net/softnic/parser.h | 2 +-
drivers/net/softnic/rte_eth_softnic.c | 2 +-
drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
drivers/net/tap/rte_eth_tap.c | 58 ++--
drivers/net/tap/rte_eth_tap.h | 2 +-
drivers/net/tap/tap_bpf_program.c | 14 +-
drivers/net/tap/tap_flow.c | 12 +-
drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
drivers/net/thunderx/base/nicvf_plat.h | 2 +-
drivers/net/thunderx/nicvf_ethdev.c | 18 +-
drivers/net/thunderx/nicvf_struct.h | 2 +-
drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
drivers/net/vhost/rte_eth_vhost.c | 12 +-
drivers/net/virtio/virtio_ethdev.c | 70 ++---
drivers/net/virtio/virtio_pci.h | 4 +-
drivers/net/virtio/virtio_rxtx.c | 28 +-
drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
drivers/net/virtio/virtio_user_ethdev.c | 8 +-
drivers/net/virtio/virtqueue.h | 2 +-
drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
examples/bbdev_app/main.c | 40 +--
examples/bond/main.c | 78 ++---
examples/distributor/main.c | 4 +-
examples/ethtool/ethtool-app/ethapp.c | 8 +-
examples/ethtool/ethtool-app/main.c | 10 +-
examples/ethtool/lib/rte_ethtool.c | 8 +-
examples/ethtool/lib/rte_ethtool.h | 6 +-
examples/eventdev_pipeline/main.c | 4 +-
examples/eventdev_pipeline/pipeline_common.h | 10 +-
examples/flow_classify/flow_classify.c | 30 +-
examples/flow_filtering/main.c | 10 +-
examples/ip_fragmentation/main.c | 62 ++--
examples/ip_pipeline/cli.c | 2 +-
examples/ip_pipeline/kni.c | 2 +-
examples/ip_pipeline/parser.c | 18 +-
examples/ip_pipeline/parser.h | 2 +-
examples/ip_pipeline/pipeline.c | 40 +--
examples/ip_reassembly/main.c | 50 ++--
examples/ipsec-secgw/esp.c | 42 +--
examples/ipsec-secgw/ipsec-secgw.c | 38 +--
examples/ipsec-secgw/sa.c | 6 +-
examples/ipv4_multicast/main.c | 58 ++--
examples/kni/main.c | 14 +-
examples/l2fwd-cat/l2fwd-cat.c | 4 +-
examples/l2fwd-crypto/main.c | 26 +-
examples/l2fwd-jobstats/main.c | 8 +-
examples/l2fwd-keepalive/main.c | 8 +-
examples/l2fwd/main.c | 8 +-
examples/l3fwd-acl/main.c | 102 +++----
examples/l3fwd-power/main.c | 100 +++----
examples/l3fwd-vf/main.c | 68 ++---
examples/l3fwd/l3fwd.h | 8 +-
examples/l3fwd/l3fwd_altivec.h | 14 +-
examples/l3fwd/l3fwd_common.h | 4 +-
examples/l3fwd/l3fwd_em.c | 44 +--
examples/l3fwd/l3fwd_em.h | 20 +-
examples/l3fwd/l3fwd_em_hlm.h | 16 +-
examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
examples/l3fwd/l3fwd_em_sequential.h | 16 +-
examples/l3fwd/l3fwd_lpm.c | 50 ++--
examples/l3fwd/l3fwd_lpm.h | 20 +-
examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
examples/l3fwd/l3fwd_neon.h | 14 +-
examples/l3fwd/l3fwd_sse.h | 14 +-
examples/l3fwd/main.c | 20 +-
examples/link_status_interrupt/main.c | 8 +-
examples/load_balancer/runtime.c | 6 +-
.../client_server_mp/mp_server/main.c | 2 +-
examples/packet_ordering/main.c | 2 +-
examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
examples/ptpclient/ptpclient.c | 32 +-
examples/qos_meter/main.c | 4 +-
examples/qos_sched/init.c | 2 +-
examples/quota_watermark/qw/main.c | 8 +-
examples/rxtx_callbacks/main.c | 4 +-
examples/server_node_efd/node/node.c | 6 +-
examples/server_node_efd/server/main.c | 8 +-
examples/skeleton/basicfwd.c | 4 +-
examples/tep_termination/main.c | 2 +-
examples/tep_termination/main.h | 2 +-
examples/tep_termination/vxlan.c | 108 +++----
examples/tep_termination/vxlan.h | 8 +-
examples/tep_termination/vxlan_setup.c | 30 +-
examples/tep_termination/vxlan_setup.h | 2 +-
examples/vhost/main.c | 40 +--
examples/vhost/main.h | 2 +-
examples/vm_power_manager/channel_monitor.c | 2 +-
.../guest_cli/vm_power_cli_guest.c | 2 +-
examples/vm_power_manager/main.c | 6 +-
examples/vmdq/main.c | 12 +-
examples/vmdq_dcb/main.c | 12 +-
lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
lib/librte_ethdev/rte_ethdev.c | 56 ++--
lib/librte_ethdev/rte_ethdev.h | 12 +-
lib/librte_ethdev/rte_ethdev_core.h | 12 +-
lib/librte_ethdev/rte_flow.h | 32 +-
lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
lib/librte_gro/gro_tcp4.c | 26 +-
lib/librte_gro/gro_tcp4.h | 20 +-
lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
lib/librte_gso/gso_common.h | 16 +-
lib/librte_gso/gso_tcp4.c | 12 +-
lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
lib/librte_gso/gso_udp4.c | 8 +-
lib/librte_gso/rte_gso.h | 8 +-
lib/librte_hash/rte_thash.h | 2 +-
lib/librte_ip_frag/rte_ip_frag.h | 12 +-
lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
lib/librte_kni/rte_kni.c | 4 +-
lib/librte_kni/rte_kni.h | 2 +-
lib/librte_net/rte_arp.c | 32 +-
lib/librte_net/rte_arp.h | 36 +--
lib/librte_net/rte_esp.h | 2 +-
lib/librte_net/rte_ether.h | 178 ++++++------
lib/librte_net/rte_gre.h | 2 +-
lib/librte_net/rte_icmp.h | 6 +-
lib/librte_net/rte_ip.h | 70 ++---
lib/librte_net/rte_net.c | 90 +++---
lib/librte_net/rte_net.h | 22 +-
lib/librte_net/rte_sctp.h | 2 +-
lib/librte_net/rte_tcp.h | 2 +-
lib/librte_net/rte_udp.h | 2 +-
lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
lib/librte_pipeline/rte_table_action.h | 4 +-
lib/librte_port/rte_port_ras.c | 8 +-
lib/librte_port/rte_port_source_sink.c | 6 +-
lib/librte_vhost/vhost.h | 2 +-
lib/librte_vhost/virtio_net.c | 42 +--
test/test-acl/main.c | 2 +-
test/test-pipeline/pipeline_acl.c | 16 +-
test/test-pipeline/pipeline_hash.c | 12 +-
test/test/packet_burst_generator.c | 126 ++++----
test/test/packet_burst_generator.h | 26 +-
test/test/test_acl.c | 8 +-
test/test/test_acl.h | 122 ++++----
test/test/test_cmdline_etheraddr.c | 16 +-
test/test/test_efd.c | 20 +-
test/test/test_event_eth_rx_adapter.c | 2 +-
test/test/test_event_eth_tx_adapter.c | 2 +-
test/test/test_flow_classify.c | 68 ++---
test/test/test_hash.c | 20 +-
test/test/test_link_bonding.c | 284 +++++++++---------
test/test/test_link_bonding_mode4.c | 116 ++++----
test/test/test_link_bonding_rssconf.c | 6 +-
test/test/test_lpm.c | 76 ++---
test/test/test_lpm_perf.c | 10 +-
test/test/test_member.c | 20 +-
test/test/test_pmd_perf.c | 20 +-
test/test/test_sched.c | 20 +-
test/test/test_table_acl.c | 8 +-
test/test/test_thash.c | 12 +-
test/test/virtual_pmd.c | 6 +-
test/test/virtual_pmd.h | 2 +-
367 files changed, 3906 insertions(+), 3913 deletions(-)
--
2.11.0
^ permalink raw reply [relevance 1%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
@ 2018-10-24 14:56 0% ` Wiles, Keith
2018-10-26 7:22 0% ` Olivier Matz
2018-10-24 16:09 0% ` Stephen Hemminger
` (2 subsequent siblings)
3 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2018-10-24 14:56 UTC (permalink / raw)
To: Olivier Matz; +Cc: dpdk-dev
> On Oct 24, 2018, at 1:18 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
>
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
I did not see the deprecation notice in the patches below, but I could have missed it.
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
> --
> 2.11.0
>
Regards,
Keith
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
@ 2018-10-24 16:09 0% ` Stephen Hemminger
2018-10-24 16:39 0% ` Bruce Richardson
2018-10-24 18:38 0% ` Stephen Hemminger
3 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-10-24 16:09 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, 24 Oct 2018 10:18:19 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
The Linux network developers and Glibc have already agreed on how to handle
overlap. Perhaps that policy could be used/extended rather than breaking
every userspace application.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
2018-10-24 16:09 0% ` Stephen Hemminger
@ 2018-10-24 16:39 0% ` Bruce Richardson
2018-10-26 7:20 0% ` Olivier Matz
2018-10-24 18:38 0% ` Stephen Hemminger
3 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-24 16:39 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
Can you clarify a bit as to why we can't keep around compatibility versions
of the headers, alongside the new versions? I'm not following the logic
above. Can we not introduce completely new headers with the replacements
while leaving the old ones intact?
/Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
` (2 preceding siblings ...)
2018-10-24 16:39 0% ` Bruce Richardson
@ 2018-10-24 18:38 0% ` Stephen Hemminger
2018-10-26 7:56 0% ` Olivier Matz
3 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-24 18:38 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, 24 Oct 2018 10:18:19 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
Since BSD structures are available on Linux and BSD why is DPDK reinventing?
There is no value in doing that.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 6/6] doc: remove internal libs from release notes
@ 2018-10-25 0:07 4% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-25 0:07 UTC (permalink / raw)
To: Ferruh Yigit
Cc: dev, Shreyansh Jain, John McNamara, Marko Kovacevic,
yipeng1.wang, pablo.de.lara.guarch
16/10/2018 13:52, Shreyansh Jain:
> On Monday 15 October 2018 08:20 PM, Ferruh Yigit wrote:
> > These libraries has exported functions but the target of those functions
> > are not user but other libraries.
> >
> > The version of these libraries doesn't mean much to the user so can be
> > dropped from release notes.
> >
> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > ---
> > Indeed this is more a question, should we keep them or remove them?
>
> +1 for removing them.
> At least for dpaa/fslmc perspective, I don't see any additional benefit
> in release note. These libraries (dpaa/fslmc) are not actually
> 'libraries' in true (read, plugability) sense :)
> > --- a/doc/guides/rel_notes/release_18_11.rst
> > +++ b/doc/guides/rel_notes/release_18_11.rst
> > - + librte_bus_dpaa.so.2
> > - + librte_bus_fslmc.so.2
> > - + librte_bus_ifpga.so.2
> > - + librte_bus_pci.so.2
> > - + librte_bus_vdev.so.2
> > - + librte_bus_vmbus.so.2
The ABI of bus libraries is important if you want to plug a PMD
into an older DPDK: if bus ABI has changed, you cannot.
I am for keeping them.
> > - librte_pci.so.1
This is a true library!
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 16:39 0% ` Bruce Richardson
@ 2018-10-26 7:20 0% ` Olivier Matz
2018-10-26 10:15 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-10-26 7:20 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
Hi,
On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
>
> Can you clarify a bit as to why we can't keep around compatibility versions
> of the headers, alongside the new versions? I'm not following the logic
> above. Can we not introduce completely new headers with the replacements
> while leaving the old ones intact?
This is something I tried to do, it is not in the RFC because it was
not satisfying, but you can find it there:
http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
With this patch, the usage of unprefixed structures, defines and
functions in rte net is still possible by an external application,
except if RTE_NET_NO_COMPAT is defined.
However, functions and structures that are not in librte_net (the
examples from my previous mail, quoted above) use the rte_ prefixed
structures in their prototypes. For instance, an application that use
rte_eth_macaddr_get() will no compile anymore because it will pass
a (struct ether_addr *) instead of a (struct rte_ether_addr *).
I don't see any good mean to fix this. Maybe we can do something with
defines, but I don't think it is possible to provide both APIs for
functions like rte_eth_macaddr_get(). I'm also not convinced it will be
that helpful. At the end, if the patchset is applied, we want the
applications to switch to the new API. To ease the transition, we can
provide a script to patch an application, very similar to the one I use
to generate the patchset.
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 14:56 0% ` Wiles, Keith
@ 2018-10-26 7:22 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 7:22 UTC (permalink / raw)
To: Wiles, Keith; +Cc: dpdk-dev
Hi,
On Wed, Oct 24, 2018 at 02:56:14PM +0000, Wiles, Keith wrote:
>
>
> > On Oct 24, 2018, at 1:18 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
> > Another drawback we need to take in account: it will make the
> > backport of patches more difficult, although this is something
> > that could be tempered by a script.
> >
> > While it is obviously better to have a good namespace convention,
> > we need to identify the issues we have today before deciding it's
> > worth doing the change.
> >
> > Comments?
>
> I did not see the deprecation notice in the patches below, but I could have missed it.
I will send it only if we reach a consensus about the need to
apply the patchset.
Regards
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 18:38 0% ` Stephen Hemminger
@ 2018-10-26 7:56 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 7:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Hi Stephen,
On Wed, Oct 24, 2018 at 11:38:12AM -0700, Stephen Hemminger wrote:
> On Wed, 24 Oct 2018 10:18:19 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
>
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
> > Another drawback we need to take in account: it will make the
> > backport of patches more difficult, although this is something
> > that could be tempered by a script.
> >
> > While it is obviously better to have a good namespace convention,
> > we need to identify the issues we have today before deciding it's
> > worth doing the change.
> >
> > Comments?
> >
> >
> > Things that are missing in RFC:
> > - test with FreeBSD
> > - manually fix some indentation issues
> >
> >
> > Olivier Matz (14):
> > net: add rte prefix to arp structures
> > net: add rte prefix to arp defines
> > net: add rte prefix to ether structures
> > net: add rte prefix to ether functions
> > net: add rte prefix to ether defines
> > net: add rte prefix to esp structure
> > net: add rte prefix to gre structure
> > net: add rte prefix to icmp structure
> > net: add rte prefix to icmp defines
> > net: add rte prefix to ip structure
> > net: add rte prefix to ip defines
> > net: add rte prefix to sctp structure
> > net: add rte prefix to tcp structure
> > net: add rte prefix to udp structure
> >
>
> Since BSD structures are available on Linux and BSD why is DPDK reinventing?
> There is no value in doing that.
>From what I see, some structures or defines are a bit different in Linux
and FreeBSD. Examples:
/* Linux */
struct ether_addr
{
u_int8_t ether_addr_octet[ETH_ALEN];
} __attribute__ ((__packed__));
/* FreeBSD */
struct ether_addr {
u_char octet[ETHER_ADDR_LEN];
} __packed;
That's true the compat between Linux and FreeBSD is better than before
in glibc. For instance with 7011c2622fe3 ("Remove __FAVOR_BSD.") added
in 2013 (glibc 2.19). It seems that MUSL also supports BSD network
structures.
So, I agree that using BSD structure looks possible, at least for
ip/ip6/tcp/udp/icmp/... structures and defines. I think we would still
need to provide some network structures for less usual protocols.
The question is: are we confident that the support of network BSD
struct/defines/funcs is good enough in all libc we (will) want to use?
Since DPDK is a network software, it is not that odd to provide our
own network structures, because we will have control on them.
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
@ 2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2018-10-26 10:12 UTC (permalink / raw)
To: Honnappa Nagarahalli; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd
On Fri, Oct 26, 2018 at 12:23:56AM +0000, Honnappa Nagarahalli wrote:
> >
> > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > This commit improves the readwrite test to consider extendable table
> > > feature and add more functional tests to cover more corner cases.
> > >
> > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > test/test/test_hash_readwrite.c | 70
> > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > insertions(+), 12 deletions(-)
> > >
> > With the extension of this test case, and the addition of other test cases by
> > Honnappa in the other patch sets in this release, we are building up quite a
> > large set of hash table autotests, some of whose meaning and use is a bit
> > obscure. Are there any hash tests that you feel could be removed at this point,
> > to simplify things?
> >
> (this comment does not apply to this patch)
> Looks like your concern is about maintenance of the test code.
> IMO, we need to reduce the number of configuration flags in this library which should reduce the number of test cases.
> The flags I think that are not necessary are:
> RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that this gives significant performance boost. IMO, if the platform supports it, it should be enabled without user consent (I am not an expert on TSX).
> RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this support. Only use case where this is not required is a single thread doing both inserts and lookups. Even if such a use case is valid, the lock over head should be small.
>
I agree with the idea. What I suggest is that only a single flag should be
needed, and that only for the uncommon case, i.e. where we do not need any
locking of the hash-table. Otherwise the hash should be thread safe by
default and using the most effective locking mechanism for the platform.
Unfortunately, doing this requires an ABI change, but since it only should
affect the create function, it should be doable with function versioning to
keep backward compatibility.
/Bruce
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-26 7:20 0% ` Olivier Matz
@ 2018-10-26 10:15 0% ` Bruce Richardson
2018-10-26 11:28 0% ` Olivier Matz
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-26 10:15 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Fri, Oct 26, 2018 at 09:20:15AM +0200, Olivier Matz wrote:
> Hi,
>
> On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> > On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > > This RFC targets 19.02.
> > >
> > > The rte_net headers conflict with the libc headers, because
> > > some definitions are duplicated, sometimes with few differences.
> > > This was discussed in [1], and more recently at the techboard.
> > >
> > > Before sending the deprecation notice (target for this is 18.11),
> > > here is a draft that can be discussed.
> > >
> > > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > > and defines in rte_net library. This is a big changeset, that will
> > > break the API of many functions, but not the ABI.
> > >
> > > One question I'm asking is how can we manage the transition.
> > > Initially, I hoped it was possible to have a compat layer during
> > > one release (supporting both prefixed and unprefixed names), but
> > > now that the patch is done, it seems the impact is too big, and
> > > impacts too many libraries.
> > >
> > > Few examples:
> > > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > > - many rte_flow structures use the rte_ prefixed net structures
> > > - the mac field of virtio_net structure is rte_ether_addr
> > > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > > ...
> > >
> > > Therefore, it is clear that doing this would break the compilation
> > > of many external applications.
> > >
> >
> > Can you clarify a bit as to why we can't keep around compatibility versions
> > of the headers, alongside the new versions? I'm not following the logic
> > above. Can we not introduce completely new headers with the replacements
> > while leaving the old ones intact?
>
> This is something I tried to do, it is not in the RFC because it was
> not satisfying, but you can find it there:
>
> http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
>
> With this patch, the usage of unprefixed structures, defines and
> functions in rte net is still possible by an external application,
> except if RTE_NET_NO_COMPAT is defined.
>
> However, functions and structures that are not in librte_net (the
> examples from my previous mail, quoted above) use the rte_ prefixed
> structures in their prototypes. For instance, an application that use
> rte_eth_macaddr_get() will no compile anymore because it will pass
> a (struct ether_addr *) instead of a (struct rte_ether_addr *).
>
> I don't see any good mean to fix this. Maybe we can do something with
> defines, but I don't think it is possible to provide both APIs for
> functions like rte_eth_macaddr_get(). I'm also not convinced it will be
> that helpful. At the end, if the patchset is applied, we want the
> applications to switch to the new API. To ease the transition, we can
> provide a script to patch an application, very similar to the one I use
> to generate the patchset.
>
Out of interest, about how many non rte_net functions are we talking about here?
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-26 10:15 0% ` Bruce Richardson
@ 2018-10-26 11:28 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 11:28 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Fri, Oct 26, 2018 at 11:15:14AM +0100, Bruce Richardson wrote:
> On Fri, Oct 26, 2018 at 09:20:15AM +0200, Olivier Matz wrote:
> > Hi,
> >
> > On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> > > On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > > > This RFC targets 19.02.
> > > >
> > > > The rte_net headers conflict with the libc headers, because
> > > > some definitions are duplicated, sometimes with few differences.
> > > > This was discussed in [1], and more recently at the techboard.
> > > >
> > > > Before sending the deprecation notice (target for this is 18.11),
> > > > here is a draft that can be discussed.
> > > >
> > > > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > > > and defines in rte_net library. This is a big changeset, that will
> > > > break the API of many functions, but not the ABI.
> > > >
> > > > One question I'm asking is how can we manage the transition.
> > > > Initially, I hoped it was possible to have a compat layer during
> > > > one release (supporting both prefixed and unprefixed names), but
> > > > now that the patch is done, it seems the impact is too big, and
> > > > impacts too many libraries.
> > > >
> > > > Few examples:
> > > > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > > > - many rte_flow structures use the rte_ prefixed net structures
> > > > - the mac field of virtio_net structure is rte_ether_addr
> > > > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > > > ...
> > > >
> > > > Therefore, it is clear that doing this would break the compilation
> > > > of many external applications.
> > > >
> > >
> > > Can you clarify a bit as to why we can't keep around compatibility versions
> > > of the headers, alongside the new versions? I'm not following the logic
> > > above. Can we not introduce completely new headers with the replacements
> > > while leaving the old ones intact?
> >
> > This is something I tried to do, it is not in the RFC because it was
> > not satisfying, but you can find it there:
> >
> > http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
> >
> > With this patch, the usage of unprefixed structures, defines and
> > functions in rte net is still possible by an external application,
> > except if RTE_NET_NO_COMPAT is defined.
> >
> > However, functions and structures that are not in librte_net (the
> > examples from my previous mail, quoted above) use the rte_ prefixed
> > structures in their prototypes. For instance, an application that use
> > rte_eth_macaddr_get() will no compile anymore because it will pass
> > a (struct ether_addr *) instead of a (struct rte_ether_addr *).
> >
> > I don't see any good mean to fix this. Maybe we can do something with
> > defines, but I don't think it is possible to provide both APIs for
> > functions like rte_eth_macaddr_get(). I'm also not convinced it will be
> > that helpful. At the end, if the patchset is applied, we want the
> > applications to switch to the new API. To ease the transition, we can
> > provide a script to patch an application, very similar to the one I use
> > to generate the patchset.
> >
>
> Out of interest, about how many non rte_net functions are we talking about here?
I didn't count, but many. And not only functions, also structures.
To give an idea, here is the output of:
git diff origin/master..HEAD lib/ | filterdiff -i '*.h' -x 'a/lib/librte_net/*'
diff --git a/lib/librte_ethdev/rte_eth_ctrl.h b/lib/librte_ethdev/rte_eth_ctrl.h
index 5ea8ae24c..821d971cd 100644
--- a/lib/librte_ethdev/rte_eth_ctrl.h
+++ b/lib/librte_ethdev/rte_eth_ctrl.h
@@ -110,7 +110,7 @@ struct rte_eth_mac_filter {
uint8_t is_vf; /**< 1 for VF, 0 for port dev */
uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
enum rte_mac_filter_type filter_type; /**< MAC filter type */
- struct ether_addr mac_addr;
+ struct rte_ether_addr mac_addr;
};
/**
@@ -126,7 +126,7 @@ struct rte_eth_mac_filter {
* RTE_ETH_FILTER_DELETE and RTE_ETH_FILTER_GET operations.
*/
struct rte_eth_ethertype_filter {
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
uint16_t ether_type; /**< Ether type to match */
uint16_t flags; /**< Flags from RTE_ETHTYPE_FLAGS_* */
uint16_t queue; /**< Queue assigned to when match*/
@@ -265,8 +265,8 @@ enum rte_tunnel_iptype {
* Tunneling Packet filter configuration.
*/
struct rte_eth_tunnel_filter_conf {
- struct ether_addr outer_mac; /**< Outer MAC address to match. */
- struct ether_addr inner_mac; /**< Inner MAC address to match. */
+ struct rte_ether_addr outer_mac; /**< Outer MAC address to match. */
+ struct rte_ether_addr inner_mac; /**< Inner MAC address to match. */
uint16_t inner_vlan; /**< Inner VLAN to match. */
enum rte_tunnel_iptype ip_type; /**< IP address type. */
/** Outer destination IP address to match if ETH_TUNNEL_FILTER_OIP
@@ -473,7 +473,7 @@ struct rte_eth_sctpv6_flow {
* A structure used to define the input for MAC VLAN flow
*/
struct rte_eth_mac_vlan_flow {
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
};
/**
@@ -493,7 +493,7 @@ struct rte_eth_tunnel_flow {
enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */
/** Tunnel ID to match. TNI, VNI... in big endian. */
uint32_t tunnel_id;
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
};
/**
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index fb40c89e0..5deb4e38e 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -2159,7 +2159,7 @@ int rte_eth_dev_set_rx_queue_stats_mapping(uint16_t port_id,
* A pointer to a structure of type *ether_addr* to be filled with
* the Ethernet address of the Ethernet device.
*/
-void rte_eth_macaddr_get(uint16_t port_id, struct ether_addr *mac_addr);
+void rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr);
/**
* Retrieve the contextual information of an Ethernet device.
@@ -2843,7 +2843,7 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
* - (-ENOSPC) if no more MAC addresses can be added.
* - (-EINVAL) if MAC address is invalid.
*/
-int rte_eth_dev_mac_addr_add(uint16_t port_id, struct ether_addr *mac_addr,
+int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
uint32_t pool);
/**
@@ -2859,7 +2859,7 @@ int rte_eth_dev_mac_addr_add(uint16_t port_id, struct ether_addr *mac_addr,
* - (-ENODEV) if *port* invalid.
* - (-EADDRINUSE) if attempting to remove the default MAC address
*/
-int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct ether_addr *mac_addr);
+int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct rte_ether_addr *mac_addr);
/**
* Set the default MAC address.
@@ -2875,7 +2875,7 @@ int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct ether_addr *mac_addr);
* - (-EINVAL) if MAC address is invalid.
*/
int rte_eth_dev_default_mac_addr_set(uint16_t port_id,
- struct ether_addr *mac_addr);
+ struct rte_ether_addr *mac_addr);
/**
* Update Redirection Table(RETA) of Receive Side Scaling of Ethernet device.
@@ -2936,7 +2936,7 @@ int rte_eth_dev_rss_reta_query(uint16_t port_id,
* - (-EIO) if device is removed.
* - (-EINVAL) if bad parameter.
*/
-int rte_eth_dev_uc_hash_table_set(uint16_t port_id, struct ether_addr *addr,
+int rte_eth_dev_uc_hash_table_set(uint16_t port_id, struct rte_ether_addr *addr,
uint8_t on);
/**
@@ -3479,7 +3479,7 @@ rte_eth_dev_get_module_eeprom(uint16_t port_id,
* - (-ENOSPC) if *port_id* has not enough multicast filtering resources.
*/
int rte_eth_dev_set_mc_addr_list(uint16_t port_id,
- struct ether_addr *mc_addr_set,
+ struct rte_ether_addr *mc_addr_set,
uint32_t nb_mc_addr);
/**
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 0d28fd902..fa518620e 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -250,17 +250,17 @@ typedef void (*eth_mac_addr_remove_t)(struct rte_eth_dev *dev, uint32_t index);
/**< @internal Remove MAC address from receive address register */
typedef int (*eth_mac_addr_add_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr,
+ struct rte_ether_addr *mac_addr,
uint32_t index,
uint32_t vmdq);
/**< @internal Set a MAC address into Receive Address Address Register */
typedef int (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr);
+ struct rte_ether_addr *mac_addr);
/**< @internal Set a MAC address into Receive Address Address Register */
typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr,
+ struct rte_ether_addr *mac_addr,
uint8_t on);
/**< @internal Set a Unicast Hash bitmap */
@@ -292,7 +292,7 @@ typedef int (*eth_udp_tunnel_port_del_t)(struct rte_eth_dev *dev,
/**< @internal Delete tunneling UDP port */
typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev,
- struct ether_addr *mc_addr_set,
+ struct rte_ether_addr *mc_addr_set,
uint32_t nb_mc_addr);
/**< @internal set the list of multicast addresses on an Ethernet device */
@@ -597,10 +597,10 @@ struct rte_eth_dev_data {
/**< Common rx buffer size handled by all queues */
uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
- struct ether_addr* mac_addrs;/**< Device Ethernet Link address. */
+ struct rte_ether_addr* mac_addrs;/**< Device Ethernet Link address. */
uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
/** bitmap array of associating Ethernet MAC addresses to pools */
- struct ether_addr* hash_mac_addrs;
+ struct rte_ether_addr* hash_mac_addrs;
/** Device Ethernet MAC addresses of hash filtering. */
uint16_t port_id; /**< Device [external] port identifier. */
__extension__
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 26e2fcfa0..c27d590a1 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -577,8 +577,8 @@ static const struct rte_flow_item_raw rte_flow_item_raw_mask = {
* same order as on the wire.
*/
struct rte_flow_item_eth {
- struct ether_addr dst; /**< Destination MAC. */
- struct ether_addr src; /**< Source MAC. */
+ struct rte_ether_addr dst; /**< Destination MAC. */
+ struct rte_ether_addr src; /**< Source MAC. */
rte_be16_t type; /**< EtherType or TPID. */
};
@@ -597,7 +597,7 @@ static const struct rte_flow_item_eth rte_flow_item_eth_mask = {
* Matches an 802.1Q/ad VLAN tag.
*
* The corresponding standard outer EtherType (TPID) values are
- * ETHER_TYPE_VLAN or ETHER_TYPE_QINQ. It can be overridden by the preceding
+ * RTE_ETHER_TYPE_VLAN or RTE_ETHER_TYPE_QINQ. It can be overridden by the preceding
* pattern item.
*/
struct rte_flow_item_vlan {
@@ -621,7 +621,7 @@ static const struct rte_flow_item_vlan rte_flow_item_vlan_mask = {
* Note: IPv4 options are handled by dedicated pattern items.
*/
struct rte_flow_item_ipv4 {
- struct ipv4_hdr hdr; /**< IPv4 header definition. */
+ struct rte_ipv4_hdr hdr; /**< IPv4 header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_IPV4. */
@@ -643,7 +643,7 @@ static const struct rte_flow_item_ipv4 rte_flow_item_ipv4_mask = {
* RTE_FLOW_ITEM_TYPE_IPV6_EXT.
*/
struct rte_flow_item_ipv6 {
- struct ipv6_hdr hdr; /**< IPv6 header definition. */
+ struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
@@ -666,7 +666,7 @@ static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
* Matches an ICMP header.
*/
struct rte_flow_item_icmp {
- struct icmp_hdr hdr; /**< ICMP header definition. */
+ struct rte_icmp_hdr hdr; /**< ICMP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP. */
@@ -685,7 +685,7 @@ static const struct rte_flow_item_icmp rte_flow_item_icmp_mask = {
* Matches a UDP header.
*/
struct rte_flow_item_udp {
- struct udp_hdr hdr; /**< UDP header definition. */
+ struct rte_udp_hdr hdr; /**< UDP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_UDP. */
@@ -704,7 +704,7 @@ static const struct rte_flow_item_udp rte_flow_item_udp_mask = {
* Matches a TCP header.
*/
struct rte_flow_item_tcp {
- struct tcp_hdr hdr; /**< TCP header definition. */
+ struct rte_tcp_hdr hdr; /**< TCP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_TCP. */
@@ -723,7 +723,7 @@ static const struct rte_flow_item_tcp rte_flow_item_tcp_mask = {
* Matches a SCTP header.
*/
struct rte_flow_item_sctp {
- struct sctp_hdr hdr; /**< SCTP header definition. */
+ struct rte_sctp_hdr hdr; /**< SCTP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_SCTP. */
@@ -761,7 +761,7 @@ static const struct rte_flow_item_vxlan rte_flow_item_vxlan_mask = {
* Matches a E-tag header.
*
* The corresponding standard outer EtherType (TPID) value is
- * ETHER_TYPE_ETAG. It can be overridden by the preceding pattern item.
+ * RTE_ETHER_TYPE_ETAG. It can be overridden by the preceding pattern item.
*/
struct rte_flow_item_e_tag {
/**
@@ -908,7 +908,7 @@ static const struct rte_flow_item_gtp rte_flow_item_gtp_mask = {
* Matches an ESP header.
*/
struct rte_flow_item_esp {
- struct esp_hdr hdr; /**< ESP header definition. */
+ struct rte_esp_hdr hdr; /**< ESP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ESP. */
@@ -974,9 +974,9 @@ struct rte_flow_item_arp_eth_ipv4 {
uint8_t hln; /**< Hardware address length, normally 6. */
uint8_t pln; /**< Protocol address length, normally 4. */
rte_be16_t op; /**< Opcode (1 for request, 2 for reply). */
- struct ether_addr sha; /**< Sender hardware address. */
+ struct rte_ether_addr sha; /**< Sender hardware address. */
rte_be32_t spa; /**< Sender IPv4 address. */
- struct ether_addr tha; /**< Target hardware address. */
+ struct rte_ether_addr tha; /**< Target hardware address. */
rte_be32_t tpa; /**< Target IPv4 address. */
};
@@ -1120,7 +1120,7 @@ rte_flow_item_icmp6_nd_opt_mask = {
struct rte_flow_item_icmp6_nd_opt_sla_eth {
uint8_t type; /**< ND option type, normally 1. */
uint8_t length; /**< ND option length, normally 1. */
- struct ether_addr sla; /**< Source Ethernet LLA. */
+ struct rte_ether_addr sla; /**< Source Ethernet LLA. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_SLA_ETH. */
@@ -1145,7 +1145,7 @@ rte_flow_item_icmp6_nd_opt_sla_eth_mask = {
struct rte_flow_item_icmp6_nd_opt_tla_eth {
uint8_t type; /**< ND option type, normally 2. */
uint8_t length; /**< ND option length, normally 1. */
- struct ether_addr tla; /**< Target Ethernet LLA. */
+ struct rte_ether_addr tla; /**< Target Ethernet LLA. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_TLA_ETH. */
@@ -2036,7 +2036,7 @@ struct rte_flow_action_set_ttl {
* Set MAC address from the matched flow
*/
struct rte_flow_action_set_mac {
- uint8_t mac_addr[ETHER_ADDR_LEN];
+ uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
};
/*
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
index 6bb30cdb9..63f06bec4 100644
--- a/lib/librte_gro/gro_tcp4.h
+++ b/lib/librte_gro/gro_tcp4.h
@@ -19,8 +19,8 @@
/* Header fields representing a TCP/IPv4 flow */
struct tcp4_flow_key {
- struct ether_addr eth_saddr;
- struct ether_addr eth_daddr;
+ struct rte_ether_addr eth_saddr;
+ struct rte_ether_addr eth_daddr;
uint32_t ip_src_addr;
uint32_t ip_dst_addr;
@@ -182,8 +182,8 @@ uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
static inline int
is_same_tcp4_flow(struct tcp4_flow_key k1, struct tcp4_flow_key k2)
{
- return (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
- is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) &&
+ return (rte_is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
+ rte_is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) &&
(k1.ip_src_addr == k2.ip_src_addr) &&
(k1.ip_dst_addr == k2.ip_dst_addr) &&
(k1.recv_ack == k2.recv_ack) &&
@@ -255,7 +255,7 @@ merge_two_tcp4_packets(struct gro_tcp4_item *item,
*/
static inline int
check_seq_option(struct gro_tcp4_item *item,
- struct tcp_hdr *tcph,
+ struct rte_tcp_hdr *tcph,
uint32_t sent_seq,
uint16_t ip_id,
uint16_t tcp_hl,
@@ -264,17 +264,17 @@ check_seq_option(struct gro_tcp4_item *item,
uint8_t is_atomic)
{
struct rte_mbuf *pkt_orig = item->firstseg;
- struct ipv4_hdr *iph_orig;
- struct tcp_hdr *tcph_orig;
+ struct rte_ipv4_hdr *iph_orig;
+ struct rte_tcp_hdr *tcph_orig;
uint16_t len, tcp_hl_orig;
- iph_orig = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) +
+ iph_orig = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) +
l2_offset + pkt_orig->l2_len);
- tcph_orig = (struct tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len);
+ tcph_orig = (struct rte_tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len);
tcp_hl_orig = pkt_orig->l4_len;
/* Check if TCP option fields equal */
- len = RTE_MAX(tcp_hl, tcp_hl_orig) - sizeof(struct tcp_hdr);
+ len = RTE_MAX(tcp_hl, tcp_hl_orig) - sizeof(struct rte_tcp_hdr);
if ((tcp_hl != tcp_hl_orig) || ((len > 0) &&
(memcmp(tcph + 1, tcph_orig + 1,
len) != 0)))
diff --git a/lib/librte_gro/gro_vxlan_tcp4.h b/lib/librte_gro/gro_vxlan_tcp4.h
index 0cafb9211..7832942a6 100644
--- a/lib/librte_gro/gro_vxlan_tcp4.h
+++ b/lib/librte_gro/gro_vxlan_tcp4.h
@@ -12,10 +12,10 @@
/* Header fields representing a VxLAN flow */
struct vxlan_tcp4_flow_key {
struct tcp4_flow_key inner_key;
- struct vxlan_hdr vxlan_hdr;
+ struct rte_vxlan_hdr vxlan_hdr;
- struct ether_addr outer_eth_saddr;
- struct ether_addr outer_eth_daddr;
+ struct rte_ether_addr outer_eth_saddr;
+ struct rte_ether_addr outer_eth_daddr;
uint32_t outer_ip_src_addr;
uint32_t outer_ip_dst_addr;
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index 6cd764ff5..48ad1686f 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -12,8 +12,8 @@
#include <rte_tcp.h>
#include <rte_udp.h>
-#define IS_FRAGMENTED(frag_off) (((frag_off) & IPV4_HDR_OFFSET_MASK) != 0 \
- || ((frag_off) & IPV4_HDR_MF_FLAG) == IPV4_HDR_MF_FLAG)
+#define IS_FRAGMENTED(frag_off) (((frag_off) & RTE_IPV4_HDR_OFFSET_MASK) != 0 \
+ || ((frag_off) & RTE_IPV4_HDR_MF_FLAG) == RTE_IPV4_HDR_MF_FLAG)
#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
@@ -46,9 +46,9 @@
static inline void
update_udp_header(struct rte_mbuf *pkt, uint16_t udp_offset)
{
- struct udp_hdr *udp_hdr;
+ struct rte_udp_hdr *udp_hdr;
- udp_hdr = (struct udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ udp_hdr = (struct rte_udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
udp_offset);
udp_hdr->dgram_len = rte_cpu_to_be_16(pkt->pkt_len - udp_offset);
}
@@ -71,9 +71,9 @@ static inline void
update_tcp_header(struct rte_mbuf *pkt, uint16_t l4_offset, uint32_t sent_seq,
uint8_t non_tail)
{
- struct tcp_hdr *tcp_hdr;
+ struct rte_tcp_hdr *tcp_hdr;
- tcp_hdr = (struct tcp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ tcp_hdr = (struct rte_tcp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
l4_offset);
tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
if (likely(non_tail))
@@ -98,9 +98,9 @@ update_tcp_header(struct rte_mbuf *pkt, uint16_t l4_offset, uint32_t sent_seq,
static inline void
update_ipv4_header(struct rte_mbuf *pkt, uint16_t l3_offset, uint16_t id)
{
- struct ipv4_hdr *ipv4_hdr;
+ struct rte_ipv4_hdr *ipv4_hdr;
- ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
l3_offset);
ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len - l3_offset);
ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
diff --git a/lib/librte_gso/rte_gso.h b/lib/librte_gso/rte_gso.h
index a626a11e3..3aab297f4 100644
--- a/lib/librte_gso/rte_gso.h
+++ b/lib/librte_gso/rte_gso.h
@@ -18,12 +18,12 @@ extern "C" {
#include <rte_mbuf.h>
/* Minimum GSO segment size for TCP based packets. */
-#define RTE_GSO_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
- sizeof(struct ipv4_hdr) + sizeof(struct tcp_hdr) + 1)
+#define RTE_GSO_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_tcp_hdr) + 1)
/* Minimum GSO segment size for UDP based packets. */
-#define RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
- sizeof(struct ipv4_hdr) + sizeof(struct udp_hdr) + 1)
+#define RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_udp_hdr) + 1)
/* GSO flags for rte_gso_ctx. */
#define RTE_GSO_FLAG_IPID_FIXED (1ULL << 0)
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
index a6ddb7bf7..adbaf8f70 100644
--- a/lib/librte_hash/rte_thash.h
+++ b/lib/librte_hash/rte_thash.h
@@ -168,7 +168,7 @@ rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
* Pointer to rte_ipv6_tuple structure
*/
static inline void
-rte_thash_load_v6_addrs(const struct ipv6_hdr *orig, union rte_thash_tuple *targ)
+rte_thash_load_v6_addrs(const struct rte_ipv6_hdr *orig, union rte_thash_tuple *targ)
{
#ifdef RTE_ARCH_X86
__m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 7f425f610..28ba33dac 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -210,7 +210,7 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
*/
struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
- struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
+ struct rte_mbuf *mb, uint64_t tms, struct rte_ipv6_hdr *ip_hdr,
struct ipv6_extension_fragment *frag_hdr);
/**
@@ -225,7 +225,7 @@ struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
* present.
*/
static inline struct ipv6_extension_fragment *
-rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
+rte_ipv6_frag_get_ipv6_fragment_header(struct rte_ipv6_hdr *hdr)
{
if (hdr->proto == IPPROTO_FRAGMENT) {
return (struct ipv6_extension_fragment *) ++hdr;
@@ -284,7 +284,7 @@ int32_t rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
*/
struct rte_mbuf * rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
- struct rte_mbuf *mb, uint64_t tms, struct ipv4_hdr *ip_hdr);
+ struct rte_mbuf *mb, uint64_t tms, struct rte_ipv4_hdr *ip_hdr);
/**
* Check if the IPv4 packet is fragmented
@@ -295,12 +295,12 @@ struct rte_mbuf * rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
* 1 if fragmented, 0 if not fragmented
*/
static inline int
-rte_ipv4_frag_pkt_is_fragmented(const struct ipv4_hdr * hdr) {
+rte_ipv4_frag_pkt_is_fragmented(const struct rte_ipv4_hdr * hdr) {
uint16_t flag_offset, ip_flag, ip_ofs;
flag_offset = rte_be_to_cpu_16(hdr->fragment_offset);
- ip_ofs = (uint16_t)(flag_offset & IPV4_HDR_OFFSET_MASK);
- ip_flag = (uint16_t)(flag_offset & IPV4_HDR_MF_FLAG);
+ ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
+ ip_flag = (uint16_t)(flag_offset & RTE_IPV4_HDR_MF_FLAG);
return ip_flag != 0 || ip_ofs != 0;
}
diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
index 601abdfc6..ce86b19a2 100644
--- a/lib/librte_kni/rte_kni.h
+++ b/lib/librte_kni/rte_kni.h
@@ -68,7 +68,7 @@ struct rte_kni_conf {
__extension__
uint8_t force_bind : 1; /* Flag to bind kernel thread */
- char mac_addr[ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
+ char mac_addr[RTE_ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
uint16_t mtu;
};
diff --git a/lib/librte_pipeline/rte_table_action.h b/lib/librte_pipeline/rte_table_action.h
index c96061291..400bd5e2c 100644
--- a/lib/librte_pipeline/rte_table_action.h
+++ b/lib/librte_pipeline/rte_table_action.h
@@ -384,8 +384,8 @@ enum rte_table_action_encap_type {
/** Pre-computed Ethernet header fields for encapsulation action. */
struct rte_table_action_ether_hdr {
- struct ether_addr da; /**< Destination address. */
- struct ether_addr sa; /**< Source address. */
+ struct rte_ether_addr da; /**< Destination address. */
+ struct rte_ether_addr sa; /**< Source address. */
};
/** Pre-computed VLAN header fields for encapsulation action. */
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b4abad30c..064ebb951 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -356,7 +356,7 @@ struct virtio_net {
uint64_t log_size;
uint64_t log_base;
uint64_t log_addr;
- struct ether_addr mac;
+ struct rte_ether_addr mac;
uint16_t mtu;
struct vhost_device_ops const *notify_ops;
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
@ 2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-27 21:19 UTC (permalink / raw)
To: Luca Boccassi
Cc: dev, bruce.richardson, tredaelli, christian.ehrhardt, mvarlese
02/10/2018 18:20, Luca Boccassi:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
Series applied, thanks
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
2018-10-26 10:12 3% ` Bruce Richardson
@ 2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-29 5:54 UTC (permalink / raw)
To: Bruce Richardson; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd
> > > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > > This commit improves the readwrite test to consider extendable
> > > > table feature and add more functional tests to cover more corner cases.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > > test/test/test_hash_readwrite.c | 70
> > > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > > insertions(+), 12 deletions(-)
> > > >
> > > With the extension of this test case, and the addition of other test
> > > cases by Honnappa in the other patch sets in this release, we are
> > > building up quite a large set of hash table autotests, some of whose
> > > meaning and use is a bit obscure. Are there any hash tests that you
> > > feel could be removed at this point, to simplify things?
> > >
> > (this comment does not apply to this patch) Looks like your concern is
> > about maintenance of the test code.
> > IMO, we need to reduce the number of configuration flags in this library
> which should reduce the number of test cases.
> > The flags I think that are not necessary are:
> > RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that
> this gives significant performance boost. IMO, if the platform supports it, it
> should be enabled without user consent (I am not an expert on TSX).
> > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this
> support. Only use case where this is not required is a single thread doing both
> inserts and lookups. Even if such a use case is valid, the lock over head should
> be small.
> >
> I agree with the idea. What I suggest is that only a single flag should be
> needed, and that only for the uncommon case, i.e. where we do not need any
> locking of the hash-table. Otherwise the hash should be thread safe by default
> and using the most effective locking mechanism for the platform.
>
RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF - should take care of this.
> Unfortunately, doing this requires an ABI change, but since it only should
> affect the create function, it should be doable with function versioning to
> keep backward compatibility.
>
Looks simple enough. Version the rte_hash_create function and map the existing function to 18.08. The new version of the function will always enable hw_trans_mem_support and rw_concurrency. Should we check to see if these flags are set by the user and print a warning message about deprecation of these flags in the newer version of the function?
> /Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
@ 2018-10-31 4:21 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-31 4:21 UTC (permalink / raw)
To: Bruce Richardson; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd, nd
> On Fri, Oct 26, 2018 at 12:23:56AM +0000, Honnappa Nagarahalli wrote:
> > >
> > > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > > This commit improves the readwrite test to consider extendable
> > > > table feature and add more functional tests to cover more corner cases.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > > test/test/test_hash_readwrite.c | 70
> > > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > > insertions(+), 12 deletions(-)
> > > >
> > > With the extension of this test case, and the addition of other test
> > > cases by Honnappa in the other patch sets in this release, we are
> > > building up quite a large set of hash table autotests, some of whose
> > > meaning and use is a bit obscure. Are there any hash tests that you
> > > feel could be removed at this point, to simplify things?
> > >
> > (this comment does not apply to this patch) Looks like your concern is
> > about maintenance of the test code.
> > IMO, we need to reduce the number of configuration flags in this library
> which should reduce the number of test cases.
> > The flags I think that are not necessary are:
> > RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that
> this gives significant performance boost. IMO, if the platform supports it, it
> should be enabled without user consent (I am not an expert on TSX).
> > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this
> support. Only use case where this is not required is a single thread doing both
> inserts and lookups. Even if such a use case is valid, the lock over head should
> be small.
> >
> I agree with the idea. What I suggest is that only a single flag should be
> needed, and that only for the uncommon case, i.e. where we do not need any
> locking of the hash-table. Otherwise the hash should be thread safe by default
> and using the most effective locking mechanism for the platform.
>
> Unfortunately, doing this requires an ABI change, but since it only should
> affect the create function, it should be doable with function versioning to
> keep backward compatibility.
>
I have made the changes. It seems to be working fine. I will post it once internal review completes.
We made this change (SHA: 9d033dac7d7cacca9559e0381f99b4c730e80979) to support 'no free on delete'. This was done by introducing another configuration flag 'RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL'. IMO, it makes sense to keep delete and free as two different operations always and deprecate 'free during delete' support. We can provide backward compatibility by making ABI change instead of introducing another configuration flag.
> /Bruce
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval
@ 2018-10-31 19:56 3% ` Ananyev, Konstantin
2018-11-07 20:21 0% ` Cody Doucette
0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-10-31 19:56 UTC (permalink / raw)
To: Cody Doucette, Dumitrescu, Cristian; +Cc: dev, Qiaobin Fu
Hi Cody,
>
> Add the ability to parse IPv6 extenders to find the
> IPv6 fragment header, and update callers.
>
> According to RFC 8200, there is no guarantee that the IPv6
> Fragment extension header will come before any other extension
> header, even though it is recommended.
>
> Signed-off-by: Cody Doucette <doucette@bu.edu>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> examples/ip_reassembly/main.c | 6 ++--
> lib/librte_ip_frag/rte_ip_frag.h | 23 ++++++-------
> lib/librte_ip_frag/rte_ip_frag_version.map | 1 +
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 38 +++++++++++++++++++++
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 4 +--
> lib/librte_port/rte_port_ras.c | 6 ++--
> 6 files changed, 59 insertions(+), 19 deletions(-)
>
> diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
> index 17b55d4c7..3a827bd6c 100644
> --- a/examples/ip_reassembly/main.c
> +++ b/examples/ip_reassembly/main.c
> @@ -365,12 +365,14 @@ reassemble(struct rte_mbuf *m, uint16_t portid, uint32_t queue,
> eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
> } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
> /* if packet is IPv6 */
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> struct ipv6_hdr *ip_hdr;
>
> ip_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
>
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(ip_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(m,
> + ip_hdr, &frag_hdr_buf);
I looked at the patch once again, and it seems incomplete to me.
Sorry for late comments.
Yes, right now te_ipv6_frag_get_ipv6_fragment_header can properly
retrieve ipv6 fragment info, but it is not enough to make things work
for situation when we have packet with frag header not the first and only
extension header.
In the same function, few lines below, we setup l3_len based on that assumption:
m->l3_len = sizeof(*ip_hdr) + sizeof(*frag_hdr);
mo = rte_ipv6_frag_reassemble_packet(tbl, dr, m, tms, ip_hdr, frag_hdr);
And inside rte_ipv6_frag_reassemble_packet() we still assume the same:
...
frag_hdr = (struct ipv6_extension_fragment *) (ip_hdr + 1);
ip_hdr->proto = frag_hdr->next_header;
I think we need a function that would allow us to get offset of frag_hdr.
Actually probably we can have a generic one here, that can return offset for
any requested ext header or total length of ipv6 header.
Something like that:
struct rte_ipv6_get_xhdr_ofs {
uint16_t find_proto; /* header proto to find */
uint16_t next_proto; /* next header proto */
uint32_t next_ofs; /* offset to start search */
};
struct int
rte_ipv6_get_xhdr_ofs(struct rte_mbuf *pkt, rte_ipv6_get_xhdr_ofs *find);
that would go through ipv6 ext headers till either requested proto is found, or end of IPv6 header.
Then user can do something like that:
/* find fragment extention */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_FRAGMENT,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
if(rc == 0)
frag_hdr = rte_pktmbuf_mtod_offset(m, .., ofs.ofs);
...
/* get size of IPv6 header plus all known extensions */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_MAX,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
>
> if (frag_hdr != NULL) {
> struct rte_mbuf *mo;
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
> index 7f425f610..6fc8106bc 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -211,28 +211,25 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
> struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> struct rte_ip_frag_death_row *dr,
> struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
> - struct ipv6_extension_fragment *frag_hdr);
> + const struct ipv6_extension_fragment *frag_hdr);
>
> /**
> * Return a pointer to the packet's fragment header, if found.
> - * It only looks at the extension header that's right after the fixed IPv6
> - * header, and doesn't follow the whole chain of extension headers.
> *
> - * @param hdr
> + * @param pkt
> + * Pointer to the mbuf of the packet.
> + * @param ip_hdr
> * Pointer to the IPv6 header.
> + * @param frag_hdr
> + * A pointer to the buffer where the fragment header
> + * will be copied if it is not contiguous in mbuf data.
> * @return
> * Pointer to the IPv6 fragment extension header, or NULL if it's not
> * present.
> */
> -static inline struct ipv6_extension_fragment *
> -rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
> -{
> - if (hdr->proto == IPPROTO_FRAGMENT) {
> - return (struct ipv6_extension_fragment *) ++hdr;
> - }
> - else
> - return NULL;
> -}
> +const struct ipv6_extension_fragment *rte_ipv6_frag_get_ipv6_fragment_header(
> + struct rte_mbuf *pkt, const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr);
Another thing - wouldn't it be ab API/ABI breakage?
One more question - making it non-inline - how much it would affect performance?
My guess - no big difference, but did you check?
Konstantin
>
> /**
> * IPv4 fragmentation.
> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map b/lib/librte_ip_frag/rte_ip_frag_version.map
> index d40d5515f..8b4c82d08 100644
> --- a/lib/librte_ip_frag/rte_ip_frag_version.map
> +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
> @@ -8,6 +8,7 @@ DPDK_2.0 {
> rte_ipv4_fragment_packet;
> rte_ipv6_frag_reassemble_packet;
> rte_ipv6_fragment_packet;
> + rte_ipv6_frag_get_ipv6_fragment_header;
>
> local: *;
> };
> diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> index 62a7e4e83..bd847dd3d 100644
> --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> @@ -176,3 +176,41 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
>
> return out_pkt_pos;
> }
> +
> +const struct ipv6_extension_fragment *
> +rte_ipv6_frag_get_ipv6_fragment_header(struct rte_mbuf *pkt,
> + const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr)
> +{
> + size_t offset = sizeof(struct ipv6_hdr);
> + uint8_t nexthdr = ip_hdr->proto;
> +
> + while (ipv6_ext_hdr(nexthdr)) {
> + struct ipv6_opt_hdr opt;
> + const struct ipv6_opt_hdr *popt = rte_pktmbuf_read(pkt,
> + offset, sizeof(opt), &opt);
> + if (popt == NULL)
> + return NULL;
> +
> + switch (nexthdr) {
> + case IPPROTO_NONE:
> + return NULL;
> +
> + case IPPROTO_FRAGMENT:
> + return rte_pktmbuf_read(pkt, offset,
> + sizeof(*frag_hdr), frag_hdr);
> +
> + case IPPROTO_AH:
> + offset += (popt->hdrlen + 2) << 2;
> + break;
> +
> + default:
> + offset += (popt->hdrlen + 1) << 3;
> + break;
> + }
> +
> + nexthdr = popt->nexthdr;
> + }
> +
> + return NULL;
> +}
> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> index db249fe60..b2d67a3f0 100644
> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> @@ -135,8 +135,8 @@ ipv6_frag_reassemble(struct ip_frag_pkt *fp)
> #define FRAG_OFFSET(x) (rte_cpu_to_be_16(x) >> 3)
> struct rte_mbuf *
> rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> - struct ipv6_hdr *ip_hdr, struct ipv6_extension_fragment *frag_hdr)
> + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> + struct ipv6_hdr *ip_hdr, const struct ipv6_extension_fragment *frag_hdr)
> {
> struct ip_frag_pkt *fp;
> struct ip_frag_key key;
> diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c
> index c8b2e19bf..28764f744 100644
> --- a/lib/librte_port/rte_port_ras.c
> +++ b/lib/librte_port/rte_port_ras.c
> @@ -184,9 +184,11 @@ process_ipv6(struct rte_port_ring_writer_ras *p, struct rte_mbuf *pkt)
> /* Assume there is no ethernet header */
> struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr *);
>
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> uint16_t frag_data = 0;
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt, pkt_hdr,
> + &frag_hdr_buf);
> if (frag_hdr != NULL)
> frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);
>
> --
> 2.17.1
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file
@ 2018-11-01 22:53 3% ` Thomas Monjalon
2018-11-02 11:50 0% ` Neil Horman
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-01 22:53 UTC (permalink / raw)
To: Neil Horman; +Cc: dev, doucette
01/11/2018 14:54, Neil Horman:
> the regex to determine the end of the map file chunk in a patch seems to
> be wrong, It was using perl regex syntax, which awk doesn't appear to
> support (I'm still not sure how it was working previously). Regardless,
> it wasn't triggering and as a result symbols were getting added to the
> mapdb that shouldn't be there.
>
> Fix it by converting the regex to use traditional posix syntax, matching
> only on the negation of the character class [^map]
>
> Tested and shown to be working on the ip_frag patch set provided by
> doucette@bu.edu
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: thomas@monjalon.net
> CC: doucette@bu.edu
> Reported-by: doucette@bu.edu
You could use these lines:
Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Reported-by: Cody Doucette <doucette@bu.edu>
> --- a/devtools/check-symbol-change.sh
> +++ b/devtools/check-symbol-change.sh
> - /[-+] a\/.*\.^(map)/ {in_map=0}
> + /[-+] a\/.*\.[^map]/ {in_map=0}
Not sure this is what you intend:
[^map] means any character except "m", "a" and "p".
I don't know whether awk supports this syntax: (?!foo)
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
@ 2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
2018-11-02 9:36 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Gavin Hu (Arm Technology China) @ 2018-11-02 7:15 UTC (permalink / raw)
To: Honnappa Nagarahalli, Stephen Hemminger
Cc: dev, thomas, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
> -----Original Message-----
> From: Honnappa Nagarahalli
> Sent: Friday, November 2, 2018 12:31 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> stable@dpdk.org; nd <nd@arm.com>
> Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
>
> <Fixing this to make the reply inline, making email plain text>
>
>
> On Thu, 1 Nov 2018 17:53:51 +0800
> Gavin Hu <mailto:gavin.hu@arm.com> wrote:
>
> > +* **Updated the ring library with C11 memory model.**
> > +
> > + Updated the ring library with C11 memory model including the following
> changes:
> > +
> > + * Synchronize the load and store of the tail
> > + * Move the atomic load of head above the loop
> > +
>
> Does this really need to be in the release notes? Is it a user visible change or
> just an internal/optimization and fix.
>
> [Gavin] There is no api changes, but this is a significant change as ring is
> fundamental and widely used, it decreases latency by 25% in our tests, it may
> do even better for cases with more contending producers/consumers or
> deeper depth of rings.
>
> [Honnappa] I agree with Stephen. Release notes should be written from
> DPDK user perspective. In the rte_ring case, the user has the option of
> choosing between c11 and non-c11 algorithms. Performance would be one
> of the criteria to choose between these 2 algorithms. IMO, it probably makes
> sense to indicate that the performance of c11 based algorithm has been
> improved. However, I do not know what DPDK has followed historically
> regarding performance optimizations. I would prefer to follow whatever has
> been followed so far.
> I do not think that we need to document the details of the internal changes
> since it does not help the user make a decision.
I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes.
https://doc.dpdk.org/guides-18.08/contributing/patches.html
section 5.5 says:
Important changes will require an addition to the release notes in doc/guides/rel_notes/.
See the Release Notes section of the Documentation Guidelines for details.
https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
"Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
New Features
Resolved Issues (see below)
Known Issues
API Changes
ABI Changes
Shared Library Versions
Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."
Suggested order in release notes items:
* Core libs (EAL, mempool, ring, mbuf, buses)
* Device abstraction libs and PMDs
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
@ 2018-11-02 9:36 0% ` Thomas Monjalon
2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-02 9:36 UTC (permalink / raw)
To: Gavin Hu (Arm Technology China), Honnappa Nagarahalli
Cc: Stephen Hemminger, dev, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
02/11/2018 08:15, Gavin Hu (Arm Technology China):
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli
> > Sent: Friday, November 2, 2018 12:31 PM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > Hemminger <stephen@networkplumber.org>
> > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > stable@dpdk.org; nd <nd@arm.com>
> > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> > loop
> >
> > <Fixing this to make the reply inline, making email plain text>
> >
> >
> > On Thu, 1 Nov 2018 17:53:51 +0800
> > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> >
> > > +* **Updated the ring library with C11 memory model.**
> > > +
> > > + Updated the ring library with C11 memory model including the following
> > changes:
> > > +
> > > + * Synchronize the load and store of the tail
> > > + * Move the atomic load of head above the loop
> > > +
> >
> > Does this really need to be in the release notes? Is it a user visible change or
> > just an internal/optimization and fix.
> >
> > [Gavin] There is no api changes, but this is a significant change as ring is
> > fundamental and widely used, it decreases latency by 25% in our tests, it may
> > do even better for cases with more contending producers/consumers or
> > deeper depth of rings.
> >
> > [Honnappa] I agree with Stephen. Release notes should be written from
> > DPDK user perspective. In the rte_ring case, the user has the option of
> > choosing between c11 and non-c11 algorithms. Performance would be one
> > of the criteria to choose between these 2 algorithms. IMO, it probably makes
> > sense to indicate that the performance of c11 based algorithm has been
> > improved. However, I do not know what DPDK has followed historically
> > regarding performance optimizations. I would prefer to follow whatever has
> > been followed so far.
> > I do not think that we need to document the details of the internal changes
> > since it does not help the user make a decision.
>
> I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
> In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes.
>
> https://doc.dpdk.org/guides-18.08/contributing/patches.html
> section 5.5 says:
> Important changes will require an addition to the release notes in doc/guides/rel_notes/.
> See the Release Notes section of the Documentation Guidelines for details.
> https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
> "Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
> New Features
> Resolved Issues (see below)
> Known Issues
> API Changes
> ABI Changes
> Shared Library Versions
> Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."
>
> Suggested order in release notes items:
> * Core libs (EAL, mempool, ring, mbuf, buses)
> * Device abstraction libs and PMDs
I agree with Honnappa.
You don't need to give details, but can explain that performance of
C11 version is improved.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
2018-11-02 9:36 0% ` Thomas Monjalon
@ 2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
0 siblings, 0 replies; 200+ results
From: Gavin Hu (Arm Technology China) @ 2018-11-02 11:23 UTC (permalink / raw)
To: Thomas Monjalon, Honnappa Nagarahalli
Cc: Stephen Hemminger, dev, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 2, 2018 5:37 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org;
> olivier.matz@6wind.com; chaozhu@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> jerin.jacob@caviumnetworks.com; stable@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
>
> 02/11/2018 08:15, Gavin Hu (Arm Technology China):
> >
> > > -----Original Message-----
> > > From: Honnappa Nagarahalli
> > > Sent: Friday, November 2, 2018 12:31 PM
> > > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > > Hemminger <stephen@networkplumber.org>
> > > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > > stable@dpdk.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above
> > > the loop
> > >
> > > <Fixing this to make the reply inline, making email plain text>
> > >
> > >
> > > On Thu, 1 Nov 2018 17:53:51 +0800
> > > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> > >
> > > > +* **Updated the ring library with C11 memory model.**
> > > > +
> > > > + Updated the ring library with C11 memory model including the
> > > > + following
> > > changes:
> > > > +
> > > > + * Synchronize the load and store of the tail
> > > > + * Move the atomic load of head above the loop
> > > > +
> > >
> > > Does this really need to be in the release notes? Is it a user
> > > visible change or just an internal/optimization and fix.
> > >
> > > [Gavin] There is no api changes, but this is a significant change as
> > > ring is fundamental and widely used, it decreases latency by 25% in
> > > our tests, it may do even better for cases with more contending
> > > producers/consumers or deeper depth of rings.
> > >
> > > [Honnappa] I agree with Stephen. Release notes should be written
> > > from DPDK user perspective. In the rte_ring case, the user has the
> > > option of choosing between c11 and non-c11 algorithms. Performance
> > > would be one of the criteria to choose between these 2 algorithms.
> > > IMO, it probably makes sense to indicate that the performance of c11
> > > based algorithm has been improved. However, I do not know what DPDK
> > > has followed historically regarding performance optimizations. I
> > > would prefer to follow whatever has been followed so far.
> > > I do not think that we need to document the details of the internal
> > > changes since it does not help the user make a decision.
> >
> > I read through the online guidelines for release notes, besides API and new
> features, resolved issues which are significant and not newly introduced in
> this release cycle, should also be included.
> > In this case, the resolved issue existed for long time, across multiple
> release cycles and ring is a core lib, so it should be a candidate for release
> notes.
> >
> > https://doc.dpdk.org/guides-18.08/contributing/patches.html
> > section 5.5 says:
> > Important changes will require an addition to the release notes in
> doc/guides/rel_notes/.
> > See the Release Notes section of the Documentation Guidelines for details.
> > https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-
> > guidelines "Developers should include updates to the Release Notes
> > with patch sets that relate to any of the following sections:
> > New Features
> > Resolved Issues (see below)
> > Known Issues
> > API Changes
> > ABI Changes
> > Shared Library Versions
> > Resolved Issues should only include issues from previous releases that
> have been resolved in the current release. Issues that are introduced and
> then fixed within a release cycle do not have to be included here."
> >
> > Suggested order in release notes items:
> > * Core libs (EAL, mempool, ring, mbuf, buses)
> > * Device abstraction libs and PMDs
>
> I agree with Honnappa.
> You don't need to give details, but can explain that performance of
> C11 version is improved.
>
V5 was submitted to indicate the improvement by the change, without giving more technical details, please have a review, thanks!
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file
2018-11-01 22:53 3% ` Thomas Monjalon
@ 2018-11-02 11:50 0% ` Neil Horman
2018-11-18 22:25 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-11-02 11:50 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, doucette
On Thu, Nov 01, 2018 at 11:53:00PM +0100, Thomas Monjalon wrote:
> 01/11/2018 14:54, Neil Horman:
> > the regex to determine the end of the map file chunk in a patch seems to
> > be wrong, It was using perl regex syntax, which awk doesn't appear to
> > support (I'm still not sure how it was working previously). Regardless,
> > it wasn't triggering and as a result symbols were getting added to the
> > mapdb that shouldn't be there.
> >
> > Fix it by converting the regex to use traditional posix syntax, matching
> > only on the negation of the character class [^map]
> >
> > Tested and shown to be working on the ip_frag patch set provided by
> > doucette@bu.edu
> >
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: thomas@monjalon.net
> > CC: doucette@bu.edu
> > Reported-by: doucette@bu.edu
>
> You could use these lines:
>
> Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
>
> Reported-by: Cody Doucette <doucette@bu.edu>
>
I'm fine with the second line, and the first is fine I guess, but I'm not sure
there is an exact correlation
> > --- a/devtools/check-symbol-change.sh
> > +++ b/devtools/check-symbol-change.sh
> > - /[-+] a\/.*\.^(map)/ {in_map=0}
> > + /[-+] a\/.*\.[^map]/ {in_map=0}
>
> Not sure this is what you intend:
> [^map] means any character except "m", "a" and "p".
>
Its not 100%, but its pretty close. The regex for exact matching on not a
specific string is pretty large and complex. Since we have no files that that
end in .m .a or .p, this should give us what we want for the forseeable future.
> I don't know whether awk supports this syntax: (?!foo)
>
It unfortunately doesn't, thats perl syntax, and while grep I think supports it,
awk is more strictly posix compliant.
Neil
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 0/4] hash: deprecate lock ellision and read/write concurreny flags
@ 2018-11-02 17:38 3% ` Honnappa Nagarahalli
0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-11-02 17:38 UTC (permalink / raw)
To: Bruce Richardson
Cc: pablo.de.lara.guarch, dev, Gavin Hu (Arm Technology China),
Dharmik Thakkar, nd, yipeng1.wang, sameh.gobriel, nd
>
> On Thu, Nov 01, 2018 at 06:25:18PM -0500, Honnappa Nagarahalli wrote:
> > Various configuration flags in rte_hash library result in increase of
> > number of test cases. Configuration flags for enabling transactional
> > memory use and read/write concurrency are not required. These features
> > should be supported by default. Please refer to [1] for more context.
> >
> > This patch marks these flags for deprecation in 19.02 release and
> > cleans up the test cases.
> >
> > [1] http://mails.dpdk.org/archives/dev/2018-October/117268.html
> >
> > Honnappa Nagarahalli (4): hash: prepare for deprecation of flags hash:
> > deprecate lock ellision and read/write concurreny flags test/hash:
> > stop using lock ellision and read/write concurreny flags doc/hash:
> > deprecate lock ellision and read/write concurreny flags
> >
> While I'd like to reduce the flags and do cleanup, I'm a little concerned about
> putting this scope of changes in so late in the release. I wonder if less drastic
> changes could work as well for this release, and do the cleanup later.
Thank you Bruce for the review. This patch series is not fixing any user related problems, let us skip this for 18.11. It will give us time as well to think through and get this right.
> For example, rather than deprecating the flags now, how about just change
> the default for when no flags are set? If user has set flags, follow the existing
> path - if flags is set to zero, then have the defaults be to use RW concurrency
> or TSX.
This changes the behavior of the library and what the flags mean, still requires ABI change, but does not need deprecation of flags (I guess this is what you meant). However, it will not solve the problem of losing the capability to disable TSX.
>
> /Bruce
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH] doc: document KNI limitation in release notes
@ 2018-11-05 17:09 4% Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-05 17:09 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic; +Cc: dev, Ferruh Yigit, Thomas Monjalon
Commit a9460a0b2efb ("kni: fix build on Linux 4.19") disables some
ethtool commands because they are removed in newer (4.19) kernels.
This patch documents removed functionality.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 27b67e0fd..6ce276b22 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -377,6 +377,12 @@ API Changes
* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
has been changed from uint8_t to uint16_t.
+* KNI, when ethtool support enabled (``CONFIG_RTE_KNI_KMOD_ETHTOOL=y``)
+ ethtool commands ``ETHTOOL_GSET & ETHTOOL_SSET`` are no more supported for the
+ kernels that has ``ETHTOOL_GLINKSETTINGS & ETHTOOL_SLINKSETTINGS`` support.
+ This means ``ethtool "-a|--show-pause", "-s|--change"`` won't work, and
+ ``ethtool <iface>`` output will have less information.
+
ABI Changes
-----------
--
2.17.2
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH] doc: update release notes for default KNI carries status
@ 2018-11-05 17:28 4% Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-05 17:28 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic
Cc: dev, Ferruh Yigit, Thomas Monjalon, Dan Gora
Commit 89397a01ce4a ("kni: set default carrier state of interface")
changes the KNI interface default carrier status. Which prevents traffic
flow by default and may break some existing usage / testing.
Document this behavior change in release notes.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Dan Gora <dg@adax.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ce276b22..69c4d1bf6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -383,6 +383,13 @@ API Changes
This means ``ethtool "-a|--show-pause", "-s|--change"`` won't work, and
``ethtool <iface>`` output will have less information.
+* KNI, by default interface carrier status is ``off`` which means there won't be any traffic.
+ It can be set to ``on`` via ``rte_kni_update_link()`` API or via ``sysfs`` interface:
+ ``echo 1 > /sys/class/net/vEth0/carrier``. Note interface should be ``up`` to be able
+ to read/write sysfs interface.
+ When KNI sample application is used ``-m`` parameter can be used to automatically update
+ the carrier status for the interface.
+
ABI Changes
-----------
--
2.17.2
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH] doc/linux_gsg: fix numa lib name error
@ 2018-11-06 6:15 4% Yong Wang
0 siblings, 0 replies; 200+ results
From: Yong Wang @ 2018-11-06 6:15 UTC (permalink / raw)
To: bruce.richardson; +Cc: dev, Yong Wang
The library for handling NUMA is numactl-devel, not libnuma-devel.
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
---
doc/guides/linux_gsg/sys_reqs.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index e2230f3..1cb14b5 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -64,7 +64,7 @@ Compilation of the DPDK
x86_x32 ABI is currently supported with distribution packages only on Ubuntu
higher than 13.10 or recent Debian distribution. The only supported compiler is gcc 4.9+.
-* libnuma-devel - library for handling NUMA (Non Uniform Memory Access).
+* numactl-devel - library for handling NUMA (Non Uniform Memory Access).
* Python, version 2.7+ or 3.2+, to use various helper scripts included in the DPDK package.
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
@ 2018-11-06 10:53 3% ` Burakov, Anatoly
2018-11-06 11:41 0% ` Ananyev, Konstantin
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-11-06 10:53 UTC (permalink / raw)
To: Konstantin Ananyev, dev; +Cc: stable, ryan.e.hall, alexander.v.gutkin
On 05-Nov-18 12:18 PM, Konstantin Ananyev wrote:
> Right now reassembly code relies on src_dst[] being all zeroes to
> determine is it free/occupied entry in the fragments table.
> This is suboptimal and error prone - user can crash DPDK ip_reassembly
> app by something like the following scapy script:
> x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
> frags=fragment(x, fragsize=500)
> sendp(frags, iface=...)
> To overcome that issue and reduce overhead of
> 'key invalidate' and 'key is empty' operations -
> add key_len into keys comparision procedure.
>
> Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
> Cc: stable@dpdk.org
>
> Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> @@ -44,9 +44,17 @@ struct ip_frag {
>
> /** @internal <src addr, dst_addr, id> to uniquely identify fragmented datagram. */
> struct ip_frag_key {
> - uint64_t src_dst[4]; /**< src address, first 8 bytes used for IPv4 */
> - uint32_t id; /**< dst address */
> - uint32_t key_len; /**< src/dst key length */
> + uint64_t src_dst[4];
> + /**< src and dst address, only first 8 bytes used for IPv4 */
> + RTE_STD_C11
> + union {
> + uint64_t id_key_len; /**< combined for easy fetch */
> + __extension__
> + struct {
> + uint32_t id; /**< packet id */
> + uint32_t key_len; /**< src/dst key length */
> + };
> + };
> };
Would that break ABI?
>
> /**
>
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
2018-11-06 10:53 3% ` Burakov, Anatoly
@ 2018-11-06 11:41 0% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-06 11:41 UTC (permalink / raw)
To: Burakov, Anatoly, dev; +Cc: stable, Hall, Ryan E, Gutkin, Alexander V
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, November 6, 2018 10:54 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Hall, Ryan E <ryan.e.hall@intel.com>; Gutkin, Alexander V <alexander.v.gutkin@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
>
> On 05-Nov-18 12:18 PM, Konstantin Ananyev wrote:
> > Right now reassembly code relies on src_dst[] being all zeroes to
> > determine is it free/occupied entry in the fragments table.
> > This is suboptimal and error prone - user can crash DPDK ip_reassembly
> > app by something like the following scapy script:
> > x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
> > frags=fragment(x, fragsize=500)
> > sendp(frags, iface=...)
> > To overcome that issue and reduce overhead of
> > 'key invalidate' and 'key is empty' operations -
> > add key_len into keys comparision procedure.
> >
> > Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
> > Cc: stable@dpdk.org
> >
> > Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
> > Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
>
>
>
> > @@ -44,9 +44,17 @@ struct ip_frag {
> >
> > /** @internal <src addr, dst_addr, id> to uniquely identify fragmented datagram. */
> > struct ip_frag_key {
> > - uint64_t src_dst[4]; /**< src address, first 8 bytes used for IPv4 */
> > - uint32_t id; /**< dst address */
> > - uint32_t key_len; /**< src/dst key length */
> > + uint64_t src_dst[4];
> > + /**< src and dst address, only first 8 bytes used for IPv4 */
> > + RTE_STD_C11
> > + union {
> > + uint64_t id_key_len; /**< combined for easy fetch */
> > + __extension__
> > + struct {
> > + uint32_t id; /**< packet id */
> > + uint32_t key_len; /**< src/dst key length */
> > + };
> > + };
> > };
>
> Would that break ABI?
No, size and layout of the structure remains the same.
Konstantin
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] doc/linux_gsg: fix numa lib name error
@ 2018-11-07 2:40 4% Yong Wang
0 siblings, 0 replies; 200+ results
From: Yong Wang @ 2018-11-07 2:40 UTC (permalink / raw)
To: anatoly.burakov; +Cc: dev, Yong Wang
The library for handling NUMA is not libnuma-devel, but numactl-devel
in Red Hat/Fedora and libnuma-dev in Debian/Ubuntu.
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
---
v2:
* Add lib name in Ubuntu.
---
doc/guides/linux_gsg/sys_reqs.rst | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index e2230f3..fbc9d54 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -64,7 +64,11 @@ Compilation of the DPDK
x86_x32 ABI is currently supported with distribution packages only on Ubuntu
higher than 13.10 or recent Debian distribution. The only supported compiler is gcc 4.9+.
-* libnuma-devel - library for handling NUMA (Non Uniform Memory Access).
+* Library for handling NUMA (Non Uniform Memory Access).
+
+ * numactl-devel in Red Hat/Fedora;
+
+ * libnuma-dev in Debian/Ubuntu;
* Python, version 2.7+ or 3.2+, to use various helper scripts included in the DPDK package.
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval
2018-10-31 19:56 3% ` Ananyev, Konstantin
@ 2018-11-07 20:21 0% ` Cody Doucette
2018-11-07 23:00 3% ` Ananyev, Konstantin
0 siblings, 1 reply; 200+ results
From: Cody Doucette @ 2018-11-07 20:21 UTC (permalink / raw)
To: Ananyev, Konstantin; +Cc: Dumitrescu, Cristian, dev, Fu, Qiaobin
Hey Konstantin,
Thanks for reviewing -- I see your point about this patch only removing one
of the places where the code makes assumptions about the position of the
fragmentation header.
Unfortunately at the moment I don't have the resources to dedicate to
writing the complete solution and doing a performance check, so I think I
should withdraw the patch. I hope it at least serves as a blueprint in case
someone comes back to it.
I might be able to come back to it eventually, but likely not soon.
Thanks,
Cody
On Wed, Oct 31, 2018 at 3:57 PM Ananyev, Konstantin <
konstantin.ananyev@intel.com> wrote:
> Hi Cody,
>
> >
> > Add the ability to parse IPv6 extenders to find the
> > IPv6 fragment header, and update callers.
> >
> > According to RFC 8200, there is no guarantee that the IPv6
> > Fragment extension header will come before any other extension
> > header, even though it is recommended.
> >
> > Signed-off-by: Cody Doucette <doucette@bu.edu>
> > Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> > Reviewed-by: Michel Machado <michel@digirati.com.br>
> > ---
> > examples/ip_reassembly/main.c | 6 ++--
> > lib/librte_ip_frag/rte_ip_frag.h | 23 ++++++-------
> > lib/librte_ip_frag/rte_ip_frag_version.map | 1 +
> > lib/librte_ip_frag/rte_ipv6_fragmentation.c | 38 +++++++++++++++++++++
> > lib/librte_ip_frag/rte_ipv6_reassembly.c | 4 +--
> > lib/librte_port/rte_port_ras.c | 6 ++--
> > 6 files changed, 59 insertions(+), 19 deletions(-)
> >
> > diff --git a/examples/ip_reassembly/main.c
> b/examples/ip_reassembly/main.c
> > index 17b55d4c7..3a827bd6c 100644
> > --- a/examples/ip_reassembly/main.c
> > +++ b/examples/ip_reassembly/main.c
> > @@ -365,12 +365,14 @@ reassemble(struct rte_mbuf *m, uint16_t portid,
> uint32_t queue,
> > eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
> > } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
> > /* if packet is IPv6 */
> > - struct ipv6_extension_fragment *frag_hdr;
> > + const struct ipv6_extension_fragment *frag_hdr;
> > + struct ipv6_extension_fragment frag_hdr_buf;
> > struct ipv6_hdr *ip_hdr;
> >
> > ip_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
> >
> > - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(ip_hdr);
> > + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(m,
> > + ip_hdr, &frag_hdr_buf);
>
> I looked at the patch once again, and it seems incomplete to me.
> Sorry for late comments.
> Yes, right now te_ipv6_frag_get_ipv6_fragment_header can properly
> retrieve ipv6 fragment info, but it is not enough to make things work
> for situation when we have packet with frag header not the first and only
> extension header.
> In the same function, few lines below, we setup l3_len based on that
> assumption:
>
> m->l3_len = sizeof(*ip_hdr) + sizeof(*frag_hdr);
>
> mo = rte_ipv6_frag_reassemble_packet(tbl, dr, m, tms, ip_hdr, frag_hdr);
>
> And inside rte_ipv6_frag_reassemble_packet() we still assume the same:
> ...
> frag_hdr = (struct ipv6_extension_fragment *) (ip_hdr + 1);
> ip_hdr->proto = frag_hdr->next_header;
>
> I think we need a function that would allow us to get offset of frag_hdr.
> Actually probably we can have a generic one here, that can return offset
> for
> any requested ext header or total length of ipv6 header.
> Something like that:
>
> struct rte_ipv6_get_xhdr_ofs {
> uint16_t find_proto; /* header proto to find */
> uint16_t next_proto; /* next header proto */
> uint32_t next_ofs; /* offset to start search */
> };
>
> struct int
> rte_ipv6_get_xhdr_ofs(struct rte_mbuf *pkt, rte_ipv6_get_xhdr_ofs *find);
>
> that would go through ipv6 ext headers till either requested proto is
> found, or end of IPv6 header.
> Then user can do something like that:
>
> /* find fragment extention */
> ipv6_get_xhdr_ofs ofs = {
> .find_proto = IPPROTO_FRAGMENT,
> .next_proto = ipv6_hdr->proto,
> .ofs = sizeof(struct ipv6_hdr),
> };
> rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
> if(rc == 0)
> frag_hdr = rte_pktmbuf_mtod_offset(m, .., ofs.ofs);
> ...
>
> /* get size of IPv6 header plus all known extensions */
> ipv6_get_xhdr_ofs ofs = {
> .find_proto = IPPROTO_MAX,
> .next_proto = ipv6_hdr->proto,
> .ofs = sizeof(struct ipv6_hdr),
> };
> rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
>
>
> >
> > if (frag_hdr != NULL) {
> > struct rte_mbuf *mo;
> > diff --git a/lib/librte_ip_frag/rte_ip_frag.h
> b/lib/librte_ip_frag/rte_ip_frag.h
> > index 7f425f610..6fc8106bc 100644
> > --- a/lib/librte_ip_frag/rte_ip_frag.h
> > +++ b/lib/librte_ip_frag/rte_ip_frag.h
> > @@ -211,28 +211,25 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
> > struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl
> *tbl,
> > struct rte_ip_frag_death_row *dr,
> > struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
> > - struct ipv6_extension_fragment *frag_hdr);
> > + const struct ipv6_extension_fragment *frag_hdr);
> >
> > /**
> > * Return a pointer to the packet's fragment header, if found.
> > - * It only looks at the extension header that's right after the fixed
> IPv6
> > - * header, and doesn't follow the whole chain of extension headers.
> > *
> > - * @param hdr
> > + * @param pkt
> > + * Pointer to the mbuf of the packet.
> > + * @param ip_hdr
> > * Pointer to the IPv6 header.
> > + * @param frag_hdr
> > + * A pointer to the buffer where the fragment header
> > + * will be copied if it is not contiguous in mbuf data.
> > * @return
> > * Pointer to the IPv6 fragment extension header, or NULL if it's not
> > * present.
> > */
> > -static inline struct ipv6_extension_fragment *
> > -rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
> > -{
> > - if (hdr->proto == IPPROTO_FRAGMENT) {
> > - return (struct ipv6_extension_fragment *) ++hdr;
> > - }
> > - else
> > - return NULL;
> > -}
> > +const struct ipv6_extension_fragment
> *rte_ipv6_frag_get_ipv6_fragment_header(
> > + struct rte_mbuf *pkt, const struct ipv6_hdr *ip_hdr,
> > + struct ipv6_extension_fragment *frag_hdr);
>
> Another thing - wouldn't it be ab API/ABI breakage?
> One more question - making it non-inline - how much it would affect
> performance?
> My guess - no big difference, but did you check?
> Konstantin
>
> >
> > /**
> > * IPv4 fragmentation.
> > diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map
> b/lib/librte_ip_frag/rte_ip_frag_version.map
> > index d40d5515f..8b4c82d08 100644
> > --- a/lib/librte_ip_frag/rte_ip_frag_version.map
> > +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
> > @@ -8,6 +8,7 @@ DPDK_2.0 {
> > rte_ipv4_fragment_packet;
> > rte_ipv6_frag_reassemble_packet;
> > rte_ipv6_fragment_packet;
> > + rte_ipv6_frag_get_ipv6_fragment_header;
> >
> > local: *;
> > };
> > diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> > index 62a7e4e83..bd847dd3d 100644
> > --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> > +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> > @@ -176,3 +176,41 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
> >
> > return out_pkt_pos;
> > }
> > +
> > +const struct ipv6_extension_fragment *
> > +rte_ipv6_frag_get_ipv6_fragment_header(struct rte_mbuf *pkt,
> > + const struct ipv6_hdr *ip_hdr,
> > + struct ipv6_extension_fragment *frag_hdr)
> > +{
> > + size_t offset = sizeof(struct ipv6_hdr);
> > + uint8_t nexthdr = ip_hdr->proto;
> > +
> > + while (ipv6_ext_hdr(nexthdr)) {
> > + struct ipv6_opt_hdr opt;
> > + const struct ipv6_opt_hdr *popt = rte_pktmbuf_read(pkt,
> > + offset, sizeof(opt), &opt);
> > + if (popt == NULL)
> > + return NULL;
> > +
> > + switch (nexthdr) {
> > + case IPPROTO_NONE:
> > + return NULL;
> > +
> > + case IPPROTO_FRAGMENT:
> > + return rte_pktmbuf_read(pkt, offset,
> > + sizeof(*frag_hdr), frag_hdr);
> > +
> > + case IPPROTO_AH:
> > + offset += (popt->hdrlen + 2) << 2;
> > + break;
> > +
> > + default:
> > + offset += (popt->hdrlen + 1) << 3;
> > + break;
> > + }
> > +
> > + nexthdr = popt->nexthdr;
> > + }
> > +
> > + return NULL;
> > +}
> > diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> > index db249fe60..b2d67a3f0 100644
> > --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> > +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> > @@ -135,8 +135,8 @@ ipv6_frag_reassemble(struct ip_frag_pkt *fp)
> > #define FRAG_OFFSET(x) (rte_cpu_to_be_16(x) >> 3)
> > struct rte_mbuf *
> > rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> > - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb,
> uint64_t tms,
> > - struct ipv6_hdr *ip_hdr, struct ipv6_extension_fragment
> *frag_hdr)
> > + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t
> tms,
> > + struct ipv6_hdr *ip_hdr, const struct ipv6_extension_fragment
> *frag_hdr)
> > {
> > struct ip_frag_pkt *fp;
> > struct ip_frag_key key;
> > diff --git a/lib/librte_port/rte_port_ras.c
> b/lib/librte_port/rte_port_ras.c
> > index c8b2e19bf..28764f744 100644
> > --- a/lib/librte_port/rte_port_ras.c
> > +++ b/lib/librte_port/rte_port_ras.c
> > @@ -184,9 +184,11 @@ process_ipv6(struct rte_port_ring_writer_ras *p,
> struct rte_mbuf *pkt)
> > /* Assume there is no ethernet header */
> > struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr
> *);
> >
> > - struct ipv6_extension_fragment *frag_hdr;
> > + const struct ipv6_extension_fragment *frag_hdr;
> > + struct ipv6_extension_fragment frag_hdr_buf;
> > uint16_t frag_data = 0;
> > - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
> > + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt, pkt_hdr,
> > + &frag_hdr_buf);
> > if (frag_hdr != NULL)
> > frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);
> >
> > --
> > 2.17.1
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval
2018-11-07 20:21 0% ` Cody Doucette
@ 2018-11-07 23:00 3% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-07 23:00 UTC (permalink / raw)
To: Cody Doucette; +Cc: Dumitrescu, Cristian, dev, Fu, Qiaobin
Hi Cody,
>Hey Konstantin,
>Thanks for reviewing -- I see your point about this patch only removing one of the places where the code makes assumptions about the position of the fragmentation header.
>Unfortunately at the moment I don't have the resources to dedicate to writing the complete solution and doing a performance check, so I think I should withdraw the patch.
Ok, NP that's understandable.
Again I provided my comments quite late.
> I hope it at least serves as a blueprint in case someone comes back to it.
Yes, I think it definitely will be useful.
>I might be able to come back to it eventually, but likely not soon.
Hopefully you'll will :)
Thanks
Konstantin
>
> Add the ability to parse IPv6 extenders to find the
> IPv6 fragment header, and update callers.
>
> According to RFC 8200, there is no guarantee that the IPv6
> Fragment extension header will come before any other extension
> header, even though it is recommended.
>
> Signed-off-by: Cody Doucette <doucette@bu.edu>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> examples/ip_reassembly/main.c | 6 ++--
> lib/librte_ip_frag/rte_ip_frag.h | 23 ++++++-------
> lib/librte_ip_frag/rte_ip_frag_version.map | 1 +
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 38 +++++++++++++++++++++
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 4 +--
> lib/librte_port/rte_port_ras.c | 6 ++--
> 6 files changed, 59 insertions(+), 19 deletions(-)
>
> diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
> index 17b55d4c7..3a827bd6c 100644
> --- a/examples/ip_reassembly/main.c
> +++ b/examples/ip_reassembly/main.c
> @@ -365,12 +365,14 @@ reassemble(struct rte_mbuf *m, uint16_t portid, uint32_t queue,
> eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
> } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
> /* if packet is IPv6 */
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> struct ipv6_hdr *ip_hdr;
>
> ip_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
>
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(ip_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(m,
> + ip_hdr, &frag_hdr_buf);
I looked at the patch once again, and it seems incomplete to me.
Sorry for late comments.
Yes, right now te_ipv6_frag_get_ipv6_fragment_header can properly
retrieve ipv6 fragment info, but it is not enough to make things work
for situation when we have packet with frag header not the first and only
extension header.
In the same function, few lines below, we setup l3_len based on that assumption:
m->l3_len = sizeof(*ip_hdr) + sizeof(*frag_hdr);
mo = rte_ipv6_frag_reassemble_packet(tbl, dr, m, tms, ip_hdr, frag_hdr);
And inside rte_ipv6_frag_reassemble_packet() we still assume the same:
...
frag_hdr = (struct ipv6_extension_fragment *) (ip_hdr + 1);
ip_hdr->proto = frag_hdr->next_header;
I think we need a function that would allow us to get offset of frag_hdr.
Actually probably we can have a generic one here, that can return offset for
any requested ext header or total length of ipv6 header.
Something like that:
struct rte_ipv6_get_xhdr_ofs {
uint16_t find_proto; /* header proto to find */
uint16_t next_proto; /* next header proto */
uint32_t next_ofs; /* offset to start search */
};
struct int
rte_ipv6_get_xhdr_ofs(struct rte_mbuf *pkt, rte_ipv6_get_xhdr_ofs *find);
that would go through ipv6 ext headers till either requested proto is found, or end of IPv6 header.
Then user can do something like that:
/* find fragment extention */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_FRAGMENT,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
if(rc == 0)
frag_hdr = rte_pktmbuf_mtod_offset(m, .., ofs.ofs);
...
/* get size of IPv6 header plus all known extensions */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_MAX,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
>
> if (frag_hdr != NULL) {
> struct rte_mbuf *mo;
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
> index 7f425f610..6fc8106bc 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -211,28 +211,25 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
> struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> struct rte_ip_frag_death_row *dr,
> struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
> - struct ipv6_extension_fragment *frag_hdr);
> + const struct ipv6_extension_fragment *frag_hdr);
>
> /**
> * Return a pointer to the packet's fragment header, if found.
> - * It only looks at the extension header that's right after the fixed IPv6
> - * header, and doesn't follow the whole chain of extension headers.
> *
> - * @param hdr
> + * @param pkt
> + * Pointer to the mbuf of the packet.
> + * @param ip_hdr
> * Pointer to the IPv6 header.
> + * @param frag_hdr
> + * A pointer to the buffer where the fragment header
> + * will be copied if it is not contiguous in mbuf data.
> * @return
> * Pointer to the IPv6 fragment extension header, or NULL if it's not
> * present.
> */
> -static inline struct ipv6_extension_fragment *
> -rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
> -{
> - if (hdr->proto == IPPROTO_FRAGMENT) {
> - return (struct ipv6_extension_fragment *) ++hdr;
> - }
> - else
> - return NULL;
> -}
> +const struct ipv6_extension_fragment *rte_ipv6_frag_get_ipv6_fragment_header(
> + struct rte_mbuf *pkt, const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr);
Another thing - wouldn't it be ab API/ABI breakage?
One more question - making it non-inline - how much it would affect performance?
My guess - no big difference, but did you check?
Konstantin
>
> /**
> * IPv4 fragmentation.
> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map b/lib/librte_ip_frag/rte_ip_frag_version.map
> index d40d5515f..8b4c82d08 100644
> --- a/lib/librte_ip_frag/rte_ip_frag_version.map
> +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
> @@ -8,6 +8,7 @@ DPDK_2.0 {
> rte_ipv4_fragment_packet;
> rte_ipv6_frag_reassemble_packet;
> rte_ipv6_fragment_packet;
> + rte_ipv6_frag_get_ipv6_fragment_header;
>
> local: *;
> };
> diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> index 62a7e4e83..bd847dd3d 100644
> --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> @@ -176,3 +176,41 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
>
> return out_pkt_pos;
> }
> +
> +const struct ipv6_extension_fragment *
> +rte_ipv6_frag_get_ipv6_fragment_header(struct rte_mbuf *pkt,
> + const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr)
> +{
> + size_t offset = sizeof(struct ipv6_hdr);
> + uint8_t nexthdr = ip_hdr->proto;
> +
> + while (ipv6_ext_hdr(nexthdr)) {
> + struct ipv6_opt_hdr opt;
> + const struct ipv6_opt_hdr *popt = rte_pktmbuf_read(pkt,
> + offset, sizeof(opt), &opt);
> + if (popt == NULL)
> + return NULL;
> +
> + switch (nexthdr) {
> + case IPPROTO_NONE:
> + return NULL;
> +
> + case IPPROTO_FRAGMENT:
> + return rte_pktmbuf_read(pkt, offset,
> + sizeof(*frag_hdr), frag_hdr);
> +
> + case IPPROTO_AH:
> + offset += (popt->hdrlen + 2) << 2;
> + break;
> +
> + default:
> + offset += (popt->hdrlen + 1) << 3;
> + break;
> + }
> +
> + nexthdr = popt->nexthdr;
> + }
> +
> + return NULL;
> +}
> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> index db249fe60..b2d67a3f0 100644
> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> @@ -135,8 +135,8 @@ ipv6_frag_reassemble(struct ip_frag_pkt *fp)
> #define FRAG_OFFSET(x) (rte_cpu_to_be_16(x) >> 3)
> struct rte_mbuf *
> rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> - struct ipv6_hdr *ip_hdr, struct ipv6_extension_fragment *frag_hdr)
> + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> + struct ipv6_hdr *ip_hdr, const struct ipv6_extension_fragment *frag_hdr)
> {
> struct ip_frag_pkt *fp;
> struct ip_frag_key key;
> diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c
> index c8b2e19bf..28764f744 100644
> --- a/lib/librte_port/rte_port_ras.c
> +++ b/lib/librte_port/rte_port_ras.c
> @@ -184,9 +184,11 @@ process_ipv6(struct rte_port_ring_writer_ras *p, struct rte_mbuf *pkt)
> /* Assume there is no ethernet header */
> struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr *);
>
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> uint16_t frag_data = 0;
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt, pkt_hdr,
> + &frag_hdr_buf);
> if (frag_hdr != NULL)
> frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);
>
> --
> 2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] Which counters are set by rte_eth_stats_get
@ 2018-11-09 8:28 3% Tom Barbette
2018-11-09 8:38 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Tom Barbette @ 2018-11-09 8:28 UTC (permalink / raw)
To: dev; +Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Hi ethdev maintainers,
Support of drivers for the fields in rte_eth_stats is a bit random, and never mentioned in the doc. A quick survey showed me :
ipackets : implemented by all drivers
ibytes : all except null, ring
ierror : all except af_packet, ark, avf, axgbe, fm10k, kni, null, pcap, ring, szedata2, vhost
imissed : *only* af_packet, avp, axgbe, fm10k, kni, liquidio, mlx4, mlx5, null, pcap, ring, szedata2, tap, vhost, virtio
rx_nombuf : *only* bnx2x,bnxt,bonding,ena,enic,failsafe,mlx4,mlx5,netvsc,nfp,qede,szedata2,tap,virtio
With no way to know if we can rely on the value or not, as a DPDK user pov. The only way to know if we can rely on a given counter is to grep the driver code. Except if I missed something?
Also the doc of rte_eth_stats_get only mention io packets, bytes and errors. Not the other fields, and the way it is written let the reader think it is always supported if the function does not return 0.
I can update the doc to reflect the state of things. But maybe we could make that function return a bitmask which tells which counter has been set. But that would break the ABI... We could also have the bitmask set through a passed pointer, so it does not break code checking the return value is 0. Or maybe have the bitmask elsewhere, like for the offloads? Which fields are supported is probably a constant. So that may make more sense.
If you give me directions, I can propose a patch.
Cheers,
Tom
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] Which counters are set by rte_eth_stats_get
2018-11-09 8:28 3% [dpdk-dev] Which counters are set by rte_eth_stats_get Tom Barbette
@ 2018-11-09 8:38 0% ` Thomas Monjalon
2018-11-09 16:23 0% ` Stephen Hemminger
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-09 8:38 UTC (permalink / raw)
To: Tom Barbette; +Cc: dev, Ferruh Yigit, Andrew Rybchenko
09/11/2018 09:28, Tom Barbette:
> Hi ethdev maintainers,
>
>
> Support of drivers for the fields in rte_eth_stats is a bit random, and never mentioned in the doc. A quick survey showed me :
>
>
> ipackets : implemented by all drivers
> ibytes : all except null, ring
> ierror : all except af_packet, ark, avf, axgbe, fm10k, kni, null, pcap, ring, szedata2, vhost
> imissed : *only* af_packet, avp, axgbe, fm10k, kni, liquidio, mlx4, mlx5, null, pcap, ring, szedata2, tap, vhost, virtio
> rx_nombuf : *only* bnx2x,bnxt,bonding,ena,enic,failsafe,mlx4,mlx5,netvsc,nfp,qede,szedata2,tap,virtio
>
> With no way to know if we can rely on the value or not, as a DPDK user pov. The only way to know if we can rely on a given counter is to grep the driver code. Except if I missed something?
>
> Also the doc of rte_eth_stats_get only mention io packets, bytes and errors. Not the other fields, and the way it is written let the reader think it is always supported if the function does not return 0.
>
> I can update the doc to reflect the state of things. But maybe we could make that function return a bitmask which tells which counter has been set. But that would break the ABI... We could also have the bitmask set through a passed pointer, so it does not break code checking the return value is 0. Or maybe have the bitmask elsewhere, like for the offloads? Which fields are supported is probably a constant. So that may make more sense.
I think having capabilities, as for offload, is reasonnable.
The other option would be to push for implementing all basic stats
in all drivers, and consider an unimplemented stat as a bug.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] Which counters are set by rte_eth_stats_get
2018-11-09 8:38 0% ` Thomas Monjalon
@ 2018-11-09 16:23 0% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-11-09 16:23 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: Tom Barbette, dev, Ferruh Yigit, Andrew Rybchenko
On Fri, 09 Nov 2018 09:38:46 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:
> 09/11/2018 09:28, Tom Barbette:
> > Hi ethdev maintainers,
> >
> >
> > Support of drivers for the fields in rte_eth_stats is a bit random, and never mentioned in the doc. A quick survey showed me :
> >
> >
> > ipackets : implemented by all drivers
> > ibytes : all except null, ring
> > ierror : all except af_packet, ark, avf, axgbe, fm10k, kni, null, pcap, ring, szedata2, vhost
> > imissed : *only* af_packet, avp, axgbe, fm10k, kni, liquidio, mlx4, mlx5, null, pcap, ring, szedata2, tap, vhost, virtio
> > rx_nombuf : *only* bnx2x,bnxt,bonding,ena,enic,failsafe,mlx4,mlx5,netvsc,nfp,qede,szedata2,tap,virtio
> >
> > With no way to know if we can rely on the value or not, as a DPDK user pov. The only way to know if we can rely on a given counter is to grep the driver code. Except if I missed something?
> >
> > Also the doc of rte_eth_stats_get only mention io packets, bytes and errors. Not the other fields, and the way it is written let the reader think it is always supported if the function does not return 0.
> >
> > I can update the doc to reflect the state of things. But maybe we could make that function return a bitmask which tells which counter has been set. But that would break the ABI... We could also have the bitmask set through a passed pointer, so it does not break code checking the return value is 0. Or maybe have the bitmask elsewhere, like for the offloads? Which fields are supported is probably a constant. So that may make more sense.
>
> I think having capabilities, as for offload, is reasonnable.
> The other option would be to push for implementing all basic stats
> in all drivers, and consider an unimplemented stat as a bug.
>
>
>
More capabablities makes it harder for applications.
For the examples you give, some of these are just *bugs* in the drivers. Like the ibytes field.
Let's fix the bugs rather than expect application to workaround them.
For others, if the driver has no places it allocates mbufs or drops packets in the driver I see no
reason that the driver needs to do anything with those fields.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
2018-10-11 14:20 4% [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes Konstantin Ananyev
@ 2018-11-12 12:03 0% ` Akhil Goyal
2018-11-14 0:50 0% ` Trahe, Fiona
2018-11-14 3:15 0% ` Joseph, Anoob
0 siblings, 2 replies; 200+ results
From: Akhil Goyal @ 2018-11-12 12:03 UTC (permalink / raw)
To: Konstantin Ananyev, dev, Ravi Kumar, Jerin Jacob, Anoob Joseph,
Declan Doherty, Fiona Trahe, Tomasz Duszynski, Dmitri Epshtein,
Natalie Samsonov, Jay Zhou
On 10/11/2018 7:50 PM, Konstantin Ananyev wrote:
> Below are details and reasoning for proposed changes.
>
> 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> operate based on cytpodev device id, though inside
> rte_cryptodev_sym_session device specific data is addressed
> by driver id (not device id).
> That creates a problem with current implementation when we have
> two or more devices with the same driver used by the same session.
> Consider the following example:
>
> struct rte_cryptodev_sym_session *sess;
> rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> rte_cryptodev_sym_session_clear(dev_id=X, sess);
>
> After that point if X and Y uses the same driver,
> then sess can't be used by device Y any more.
> The reason for that - driver specific (not device specific)
> data per session, plus there is no information
> how many device instances use that data.
> Probably the simplest way to deal with that issue -
> add a reference counter per each driver data.
>
> 2.rte_cryptodev_sym_session_set_user_data() and
> rte_cryptodev_sym_session_get_user_data() -
> with current implementation there is no defined way for the user to
> determine what is the max allowed size of the private data.
> rte_cryptodev_sym_session_set_user_data() just blindly copies
> user provided data without checking memory boundaries violation.
> To overcome that issue propose to add 'uint16_t priv_size' into
> rte_cryptodev_sym_session structure.
>
> 3.rte_cryptodev_sym_session contains an array of variable size for
> driver specific data.
> Though number of elements in that array is determined by static
> variable nb_drivers, that could be modified by
> rte_cryptodev_allocate_driver().
> That construction seems to work ok so far, as right now users register
> all their PMDs at startup, though it doesn't mean that it would always
> remain like that.
> To make it less error prone propose to add 'uint16_t nb_drivers'
> into the rte_cryptodev_sym_session structure.
> At least that allows related functions to check that provided
> driver id wouldn't overrun variable array boundaries,
> again it allows to determine size of already allocated session
> without accessing global variable.
>
> 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> would have sort of readonly type data (init once at allocation time,
> keep unmodified through session life-time).
> That requires more changes in current cryptodev implementation:
> Right now inside cryptodev framework both rte_cryptodev_sym_session
> and driver specific session data are two completely different sctrucures
> (e.g. struct cryptodev_sym_session and struct null_crypto_session).
> Though current cryptodev implementation implicitly assumes that driver
> will allocate both of them from within the same mempool.
> Plus this is done in a manner that they override each other fields
> (reuse the same space - sort of implicit C union).
> That's probably not the best programming practice,
> plus make impossible to have readonly fields inside both of them.
> To overcome that situation propose to changed an API a bit, to allow
> to use two different mempools for these two distinct data structures.
>
> 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> I suppose that self-explanatory, and might be used in a lot of places
> (would be quite useful for ipsec library we develop).
>
> The new proposed layout for rte_cryptodev_sym_session:
> struct rte_cryptodev_sym_session {
> uint64_t userdata;
> /**< Can be used for external metadata */
> uint16_t nb_drivers;
> /**< number of elements in sess_data array */
> uint16_t priv_size;
> /**< session private data will be placed after sess_data */
> __extension__ struct {
> void *data;
> uint16_t refcnt;
> } sess_data[0];
> /**< Driver specific session material, variable size */
> };
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Adding maintainers to ack this deprecation notice. These changes will
impact all the PMDs and everyone
should agree to these changes.
from NXP dpaa_sec, dpaa2_sec, caam_jr PMDs:
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
> ---
> doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index d2aec64d1..998a0d92c 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -74,3 +74,12 @@ Deprecation Notices
>
> This is due to a lack of flexibility and reliance on a type unusable with
> C++ programs (struct rte_flow_desc).
> +
> +* cryptodev: several API and ABI changes are planned for rte_cryptodev
> + in v19.02:
> +
> + - The size and layout of ``rte_cryptodev_sym_session`` will change
> + to fix existing issues.
> + - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
> + ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
> + two different mempools for crypto and device private sessions.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [dpdk-techboard] DPDK techboard minutes of October 24
@ 2018-11-12 16:43 3% ` Stephen Hemminger
2018-11-12 16:55 3% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-11-12 16:43 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Richardson, Bruce, Burakov, Anatoly, Jerin Jacob, dev, techboard
On Mon, 12 Nov 2018 12:36:45 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Monday, November 12, 2018 12:22 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Jerin Jacob
> > <jerin.jacob@caviumnetworks.com>; dev@dpdk.org
> > Cc: techboard@dpdk.org
> > Subject: RE: [dpdk-techboard] [dpdk-dev] DPDK techboard minutes of October 24
> >
> >
> >
> > > -----Original Message-----
> > > From: techboard [mailto:techboard-bounces@dpdk.org] On Behalf Of Ananyev,
> > > Konstantin
> > > Sent: Monday, November 12, 2018 11:24 AM
> > > To: Burakov, Anatoly <anatoly.burakov@intel.com>; Jerin Jacob
> > > <jerin.jacob@caviumnetworks.com>; dev@dpdk.org
> > > Cc: techboard@dpdk.org
> > > Subject: Re: [dpdk-techboard] [dpdk-dev] DPDK techboard minutes of October
> > > 24
> > >
> > >
> > > Hi Anatoly,
> > >
> > > > > Meeting notes for the DPDK technical board meeting held on
> > > > > 2018-10-24
> > > > >
> > > > > Attendees:
> > > > > - Bruce Richardson
> > > > > - Ferruh Yigit
> > > > > - Hemant Agrawal
> > > > > - Jerin Jacob
> > > > > - Konstantin Ananyev
> > > > > - Maxime Coquelin
> > > > > - Olivier Matz
> > > > > - Stephen Hemminger
> > > > > - Thomas Monjalon
> > > > >
> > > > > 0) DPDK acceptance policy on un-implemented API
> > > > > - New APIs without implementation is not accepted.
> > > > > - In order to accept a new API, At minimum
> > > > > a) Need to provide an unit test case or example application
> > > > > b) If the API is about HW abstraction, at least one driver should be
> > > > > implemented. Preferably two.
> > > > > c) If there are strong objections on ML about the need for more than
> > > > > one driver for a specific API then the technical board can make a
> > > > > decision.
> > > > > - Konstantin volunteered to send existing un-implemented API to the
> > > > > mailing list.
> > > > > - The existing un-implemented APIs will be deprecated in v19.05.
> > > > > - Deprecated un-implemented API will be removed in v19.08
> > > > >
> > > >
> > > > Does this also apply to unimplemented parts of the existing API? For
> > > > example, malloc API has long had a "name" parameter which goes
> > > > unimplemented through entire lifetime of DPDK project. It would be
> > > > good to drop this thing entirely as it's clear it's not going to be
> > > > implemented any time soon :)
> > > >
> > >
> > > Sounds like a good idea to me.
> > > Konstantin
> >
> > While a good idea in theory, I'm not sure the cost-benefit pays off for this one. Given the fact that the extra parameter is rather harmless,
> > the benefit seems minimal compared to the effort which would be involved for everyone to have to change every rte_malloc call in every
> > app!
>
> I am agree about massive amount of changes, though I thought Anatoly sort of volunteering for it :)
> About benefit - it would save us spilling/restoring one register for each rte_malloc() call.
> Probably not that important, as rte_malloc() usually is used from data-path, but still.
> Plus it doesn't look good to have a function with parameter that would never be used.
> Konstantin
>
>
I agree, we should do these kind of cleanups, but only on ABI breaking releases.
Too late now for 18.11 and next one is probably 19.11
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [dpdk-techboard] DPDK techboard minutes of October 24
2018-11-12 16:43 3% ` Stephen Hemminger
@ 2018-11-12 16:55 3% ` Thomas Monjalon
2018-11-13 9:33 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-12 16:55 UTC (permalink / raw)
To: Stephen Hemminger
Cc: techboard, Ananyev, Konstantin, Richardson, Bruce, Burakov,
Anatoly, Jerin Jacob, dev
12/11/2018 17:43, Stephen Hemminger:
> On Mon, 12 Nov 2018 12:36:45 +0000
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
> > From: Richardson, Bruce
> > > From: techboard [mailto:techboard-bounces@dpdk.org] On Behalf Of Ananyev,
> > > > Konstantin
> > > >
> > > > Hi Anatoly,
> > > >
> > > > > > Meeting notes for the DPDK technical board meeting held on
> > > > > > 2018-10-24
[...]
> > > > > > 0) DPDK acceptance policy on un-implemented API
> > > > > > - New APIs without implementation is not accepted.
> > > > > > - In order to accept a new API, At minimum
> > > > > > a) Need to provide an unit test case or example application
> > > > > > b) If the API is about HW abstraction, at least one driver should be
> > > > > > implemented. Preferably two.
> > > > > > c) If there are strong objections on ML about the need for more than
> > > > > > one driver for a specific API then the technical board can make a
> > > > > > decision.
> > > > > > - Konstantin volunteered to send existing un-implemented API to the
> > > > > > mailing list.
> > > > > > - The existing un-implemented APIs will be deprecated in v19.05.
> > > > > > - Deprecated un-implemented API will be removed in v19.08
> > > > > >
> > > > >
> > > > > Does this also apply to unimplemented parts of the existing API? For
> > > > > example, malloc API has long had a "name" parameter which goes
> > > > > unimplemented through entire lifetime of DPDK project. It would be
> > > > > good to drop this thing entirely as it's clear it's not going to be
> > > > > implemented any time soon :)
> > > > >
> > > >
> > > > Sounds like a good idea to me.
> > > > Konstantin
> > >
> > > While a good idea in theory, I'm not sure the cost-benefit pays off for this one. Given the fact that the extra parameter is rather harmless,
> > > the benefit seems minimal compared to the effort which would be involved for everyone to have to change every rte_malloc call in every
> > > app!
> >
> > I am agree about massive amount of changes, though I thought Anatoly sort of volunteering for it :)
> > About benefit - it would save us spilling/restoring one register for each rte_malloc() call.
> > Probably not that important, as rte_malloc() usually is used from data-path, but still.
> > Plus it doesn't look good to have a function with parameter that would never be used.
> > Konstantin
> >
> >
>
> I agree, we should do these kind of cleanups, but only on ABI breaking releases.
> Too late now for 18.11 and next one is probably 19.11
We can discuss which release can break ABI.
I think 19.05 is a good candidate.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
2018-10-05 11:05 0% ` Ananyev, Konstantin
@ 2018-11-12 21:01 0% ` Trahe, Fiona
2018-11-13 18:56 0% ` Ananyev, Konstantin
1 sibling, 1 reply; 200+ results
From: Trahe, Fiona @ 2018-11-12 21:01 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
Cc: De Lara Guarch, Pablo, Akhil Goyal, Doherty, Declan, Ravi Kumar,
Jerin Jacob, Zhang, Roy Fan, Tomasz Duszynski, Hemant Agrawal,
Natalie Samsonov, Dmitri Epshtein, Jay Zhou, Trahe, Fiona
Hi Konstantin,
Sorry for the delay in reviewing this and thanks for your proposals.
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, August 24, 2018 10:48 AM
> To: dev@dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Doherty, Declan
> <declan.doherty@intel.com>; Ravi Kumar <ravi1.kumar@amd.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Tomasz Duszynski <tdu@semihalf.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>; Natalie Samsonov <nsamsono@marvell.com>; Dmitri Epshtein
> <dima@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> Subject: [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
>
> This RFC for proposes several changes inside rte_cryptodev_sym_session.
> Note that this is just RFC not a complete patch, so for now
> I modified only the librte_cryptodev itself,
> some cryptodev PMD, test-crypto-perf and ipsec-secgw example.
> Proposed changes means ABI/API breakage inside cryptodev,
> so looking for feedback from crypto-dev lib and crypto-PMD maintainiers.
> Below are details and reasoning for proposed changes.
>
> 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> operate based on cytpodev device id, though inside
> rte_cryptodev_sym_session device specific data is addressed
> by driver id (not device id).
> That creates a problem with current implementation when we have
> two or more devices with the same driver used by the same session.
> Consider the following example:
>
> struct rte_cryptodev_sym_session *sess;
> rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> rte_cryptodev_sym_session_clear(dev_id=X, sess);
>
> After that point if X and Y uses the same driver,
> then sess can't be used by device Y any more.
> The reason for that - driver specific (not device specific)
> data per session, plus there is no information
> how many device instances use that data.
> Probably the simplest way to deal with that issue -
> add a reference counter per each driver data.
[Fiona] Ok, I agree with this issue and proposed fix.
We need to also document that it's user's responsibility
not to call either init() or clear() twice on same device, as
that would break the ref count.
The same should be added to asym_session - though I accept
it'sbe outside of the scope of this patch.
> 2.rte_cryptodev_sym_session_set_user_data() and
> rte_cryptodev_sym_session_get_user_data() -
> with current implementation there is no defined way for the user to
> determine what is the max allowed size of the private data.
> Even within rte_cryptodev_sym_session_set_user_data() we just blindly
> copying user provided data without checking memory boundaries violation.
> To overcome that issue I added 'uint16_t priv_size' into
> rte_cryptodev_sym_session structure.
[Fiona] I agree, this is needed.
But I propose to call it user_data_sz NOT priv_size.
See https://patches.dpdk.org/patch/42515/ to understand why.
> 3.rte_cryptodev_sym_session contains an array of variable size for
> driver specific data.
> Though number of elements in that array is determined by static
> variable nb_drivers, that could be modified by
> rte_cryptodev_allocate_driver().
> That construction seems to work ok so far, as right now users register
> all their PMDs at startup, though it doesn't mean that it would always
> remain like that.
> To make it less error prone I added 'uint16_t nb_drivers' into the
> rte_cryptodev_sym_session structure.
> At least that allows related functions to check that provided
> driver id wouldn't overrun variable array boundaries,
> again it allows to determine size of already allocated session
> without accessing global variable.
[Fiona] I agree with both issue and solution.
The same should be added to asym_session - though again
it's outside of the scope of this patch.
> 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> would have sort of readonly type data (init once at allocation time,
> keep unmodified through session life-time).
> That requires more changes in current cryptodev implementation:
> Right now inside cryptodev framework both rte_cryptodev_sym_session
> and driver specific session data are two completely different sctrucures
> (e.g. struct struct null_crypto_session and struct null_crypto_session).
> Though current cryptodev implementation implicitly assumes that driver
> will allocate both of them from within the same mempool.
> Plus this is done in a manner that they override each other fields
> (reuse the same space - sort of implicit C union).
> That's probably not the best programming practice,
> plus make impossible to have readonly fields inside both of them.
> So to overcome that situation I changed an API a bit, to allow
> to use two different mempools for these two distinct data structures.
>
> 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> I suppose that self-explanatory, and might be used in a lot of places
> (would be quite useful for ipsec library we develop).
[Fiona] Seems unnecessary - the set_user_data can be used. Why have 2
separate user data spaces in the session? - it's confusing.
If these is some good reason, then a different name should be used for clarity.
Not private. And not user. Possibly opaque data. Though afaik we had this in the op
and removed it as it was felt appending user_data was enough.
> So the new proposed layout for rte_cryptodev_sym_session:
> struct rte_cryptodev_sym_session {
> uint64_t userdata;
> /**< Can be used for external metadata */
> uint16_t nb_drivers;
> /**< number of elements in sess_data array */
> uint16_t priv_size;
> /**< session private data will be placed after sess_data */
> __extension__ struct {
> void *data;
> uint16_t refcnt;
> } sess_data[0];
> /**< Driver specific session material, variable size */
> };
>
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> app/test-crypto-perf/cperf.h | 1 +
> app/test-crypto-perf/cperf_ops.c | 11 +-
> app/test-crypto-perf/cperf_ops.h | 2 +-
> app/test-crypto-perf/cperf_test_latency.c | 5 +-
> app/test-crypto-perf/cperf_test_latency.h | 1 +
> app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 5 +-
> app/test-crypto-perf/cperf_test_pmd_cyclecount.h | 1 +
> app/test-crypto-perf/cperf_test_throughput.c | 5 +-
> app/test-crypto-perf/cperf_test_throughput.h | 1 +
> app/test-crypto-perf/cperf_test_verify.c | 5 +-
> app/test-crypto-perf/cperf_test_verify.h | 1 +
> app/test-crypto-perf/main.c | 111 +++++++++++------
> drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 10 +-
> drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 5 +-
> drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 4 +-
> drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 10 +-
> drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c | 5 +-
> drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h | 4 +-
> drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 3 +-
> drivers/crypto/dpaa_sec/dpaa_sec.c | 3 +-
> drivers/crypto/null/null_crypto_pmd.c | 14 ++-
> drivers/crypto/null/null_crypto_pmd_ops.c | 5 +-
> drivers/crypto/null/null_crypto_pmd_private.h | 4 +-
> drivers/crypto/scheduler/scheduler_pmd_ops.c | 5 +-
> drivers/crypto/virtio/virtio_cryptodev.c | 6 +-
> examples/ipsec-secgw/ipsec-secgw.c | 116 ++++++++++++------
> examples/ipsec-secgw/ipsec.h | 2 +
> lib/librte_cryptodev/rte_cryptodev.c | 134 ++++++++++++---------
> lib/librte_cryptodev/rte_cryptodev.h | 53 ++++++--
> lib/librte_cryptodev/rte_cryptodev_pmd.h | 16 ++-
> 30 files changed, 356 insertions(+), 192 deletions(-)
>
///snip///
> struct cnt_blk {
> diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
> index 63ae23f00..e25282445 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.c
> +++ b/lib/librte_cryptodev/rte_cryptodev.c
> @@ -943,8 +943,7 @@ rte_cryptodev_close(uint8_t dev_id)
>
> int
> rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> - const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
> - struct rte_mempool *session_pool)
> + const struct rte_cryptodev_qp_conf *qp_conf, int socket_id)
>
> {
> struct rte_cryptodev *dev;
> @@ -954,6 +953,12 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> return -EINVAL;
> }
>
> + if (qp_conf == NULL || qp_conf->sess_pool == NULL ||
> + qp_conf->priv_sess_pool == NULL) {
> + CDEV_LOG_ERR("Invalid queue_pair config");
> + return -EINVAL;
> + }
> +
> dev = &rte_crypto_devices[dev_id];
> if (queue_pair_id >= dev->data->nb_queue_pairs) {
> CDEV_LOG_ERR("Invalid queue_pair_id=%d", queue_pair_id);
> @@ -969,7 +974,7 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_pair_setup, -ENOTSUP);
>
> return (*dev->dev_ops->queue_pair_setup)(dev, queue_pair_id, qp_conf,
> - socket_id, session_pool);
> + socket_id);
> }
>
>
> @@ -1146,6 +1151,41 @@ rte_cryptodev_pmd_callback_process(struct rte_cryptodev *dev,
> rte_spinlock_unlock(&rte_cryptodev_cb_lock);
> }
>
> +static void
> +cryptodev_sym_session_init_elem(__rte_unused struct rte_mempool *pool,
> + void *arg, void *obj, __rte_unused uint32_t idx)
> +{
> + struct rte_cryptodev_sym_session *ds;
> + const struct rte_cryptodev_sym_session *ss;
> +
> + ds = obj;
> + ss = arg;
> +
> + *ds = *ss;
> + memset(ds->sess_data, 0, rte_cryptodev_sym_session_data_size(ds));
> +}
> +
> +struct rte_mempool *
> +rte_cryptodev_sym_session_pool_create(const char *name,
> + uint32_t nb_elts, uint32_t cache_size, uint16_t priv_size,
> + int socket_id)
> +{
> + struct rte_mempool *mp;
> + uint32_t elt_size;
> + struct rte_cryptodev_sym_session s = {
> + .nb_drivers = nb_drivers,
> + .priv_size = priv_size,
> + };
> +
> + elt_size = rte_cryptodev_sym_session_max_size(priv_size);
> + mp = rte_mempool_create(name, nb_elts, elt_size, cache_size, 0,
> + NULL, NULL, cryptodev_sym_session_init_elem, &s,
> + socket_id, 0);
> + if (mp == NULL)
> + CDEV_LOG_ERR("%s(name=%s) failed, rte_errno=%d\n",
> + __func__, name, rte_errno);
> + return mp;
> +}
>
> int
> rte_cryptodev_sym_session_init(uint8_t dev_id,
> @@ -1163,12 +1203,15 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
> return -EINVAL;
>
> index = dev->driver_id;
> + if (index > sess->nb_drivers)
> + return -EINVAL;
>
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_configure, -ENOTSUP);
>
> - if (sess->sess_private_data[index] == NULL) {
> + if (sess->sess_data[index].refcnt == 0) {
> ret = dev->dev_ops->sym_session_configure(dev, xforms,
> - sess, mp);
> + sess, mp);
> +
> if (ret < 0) {
> CDEV_LOG_ERR(
> "dev_id %d failed to configure session details",
> @@ -1177,6 +1220,7 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
> }
> }
>
> + sess->sess_data[index].refcnt++;
> return 0;
> }
>
> @@ -1229,8 +1273,7 @@ rte_cryptodev_sym_session_create(struct rte_mempool *mp)
> /* Clear device session pointer.
> * Include the flag indicating presence of user data
> */
> - memset(sess, 0, (sizeof(void *) * nb_drivers) + sizeof(uint8_t));
> -
> + memset(sess->sess_data, 0, rte_cryptodev_sym_session_data_size(sess));
> return sess;
> }
>
> @@ -1258,16 +1301,20 @@ rte_cryptodev_sym_session_clear(uint8_t dev_id,
> struct rte_cryptodev_sym_session *sess)
> {
> struct rte_cryptodev *dev;
> + uint32_t idx;
>
> dev = rte_cryptodev_pmd_get_dev(dev_id);
>
> - if (dev == NULL || sess == NULL)
> + if (dev == NULL || sess == NULL || dev->driver_id > sess->nb_drivers)
> return -EINVAL;
>
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_clear, -ENOTSUP);
>
> - dev->dev_ops->sym_session_clear(dev, sess);
> + idx = dev->driver_id;
> + if (--sess->sess_data[idx].refcnt != 0)
> + return -EBUSY;
>
> + dev->dev_ops->sym_session_clear(dev, sess);
> return 0;
> }
>
> @@ -1285,7 +1332,6 @@ rte_cryptodev_asym_session_clear(uint8_t dev_id,
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->asym_session_clear, -ENOTSUP);
>
> dev->dev_ops->asym_session_clear(dev, sess);
> -
> return 0;
> }
>
> @@ -1293,7 +1339,6 @@ int
> rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
> {
> uint8_t i;
> - void *sess_priv;
> struct rte_mempool *sess_mp;
>
> if (sess == NULL)
> @@ -1301,8 +1346,7 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
>
> /* Check that all device private data has been freed */
> for (i = 0; i < nb_drivers; i++) {
[Fiona] Use the sess.nb_drivers rather than the global.
Actually maybe name slightly differently to reduce the chance of that mistake happening, e.g.
rename sess.nb_drivers to sess.num_drivers?
> - sess_priv = get_sym_session_private_data(sess, i);
> - if (sess_priv != NULL)
> + if (sess->sess_data[i].refcnt != 0)
> return -EBUSY;
> }
>
> @@ -1313,6 +1357,23 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
> return 0;
> }
>
> +unsigned int
> +rte_cryptodev_sym_session_max_data_size(void)
[Fiona] Suggest renaming this
rte_cryptodev_sym_session_max_array_size()
> +{
> + struct rte_cryptodev_sym_session *sess = NULL;
> +
> + return (sizeof(sess->sess_data[0]) * nb_drivers);
> +}
> +
> +size_t
> +rte_cryptodev_sym_session_max_size(uint16_t priv_size)
> +{
> + struct rte_cryptodev_sym_session *sess = NULL;
> +
> + return (sizeof(*sess) + priv_size +
> + rte_cryptodev_sym_session_max_data_size());
> +}
> +
> int __rte_experimental
> rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
> {
> @@ -1337,18 +1398,6 @@ rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
> return 0;
> }
>
> -
> -unsigned int
> -rte_cryptodev_sym_get_header_session_size(void)
> -{
> - /*
> - * Header contains pointers to the private data
> - * of all registered drivers, and a flag which
> - * indicates presence of user data
> - */
> - return ((sizeof(void *) * nb_drivers) + sizeof(uint8_t));
> -}
> -
> unsigned int __rte_experimental
> rte_cryptodev_asym_get_header_session_size(void)
> {
> @@ -1361,11 +1410,9 @@ rte_cryptodev_asym_get_header_session_size(void)
> }
>
> unsigned int
> -rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
> +rte_cryptodev_sym_private_session_size(uint8_t dev_id)
> {
> struct rte_cryptodev *dev;
> - unsigned int header_size = sizeof(void *) * nb_drivers;
> - unsigned int priv_sess_size;
>
> if (!rte_cryptodev_pmd_is_valid_dev(dev_id))
> return 0;
> @@ -1375,18 +1422,7 @@ rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
> if (*dev->dev_ops->sym_session_get_size == NULL)
> return 0;
>
> - priv_sess_size = (*dev->dev_ops->sym_session_get_size)(dev);
> -
> - /*
> - * If size is less than session header size,
> - * return the latter, as this guarantees that
> - * sessionless operations will work
> - */
> - if (priv_sess_size < header_size)
> - return header_size;
> -
> - return priv_sess_size;
> -
> + return (*dev->dev_ops->sym_session_get_size)(dev);
> }
>
> unsigned int __rte_experimental
> @@ -1409,7 +1445,6 @@ rte_cryptodev_asym_get_private_session_size(uint8_t dev_id)
> return header_size;
>
> return priv_sess_size;
> -
> }
>
> int __rte_experimental
> @@ -1418,15 +1453,10 @@ rte_cryptodev_sym_session_set_user_data(
> void *data,
> uint16_t size)
> {
> - uint16_t off_set = sizeof(void *) * nb_drivers;
> - uint8_t *user_data_present = (uint8_t *)sess + off_set;
> -
> - if (sess == NULL)
> + if (sess == NULL || sess->priv_size < size)
> return -EINVAL;
>
> - *user_data_present = 1;
> - off_set += sizeof(uint8_t);
> - rte_memcpy((uint8_t *)sess + off_set, data, size);
> + rte_memcpy(sess->sess_data + sess->nb_drivers, data, size);
> return 0;
> }
>
> @@ -1434,14 +1464,10 @@ void * __rte_experimental
> rte_cryptodev_sym_session_get_user_data(
> struct rte_cryptodev_sym_session *sess)
> {
> - uint16_t off_set = sizeof(void *) * nb_drivers;
> - uint8_t *user_data_present = (uint8_t *)sess + off_set;
> -
> - if (sess == NULL || !*user_data_present)
> + if (sess == NULL || sess->priv_size == 0)
> return NULL;
>
> - off_set += sizeof(uint8_t);
> - return (uint8_t *)sess + off_set;
> + return (sess->sess_data + sess->nb_drivers);
> }
>
> /** Initialise rte_crypto_op mempool element */
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
> index 4099823f1..d88454f02 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -495,6 +495,14 @@ enum rte_cryptodev_event_type {
> /** Crypto device queue pair configuration structure. */
> struct rte_cryptodev_qp_conf {
> uint32_t nb_descriptors; /**< Number of descriptors per queue pair */
> + struct rte_mempool *sess_pool;
> + /**< Pointer to crypto sessions mempool,
> + * used for session-less operations.
> + */
> + struct rte_mempool *priv_sess_pool;
> + /**< Pointer to device specific sessions mempool,
> + * used for session-less operations.
> + */
> };
>
> /**
> @@ -680,17 +688,13 @@ rte_cryptodev_close(uint8_t dev_id);
> * *SOCKET_ID_ANY* if there is no NUMA constraint
> * for the DMA memory allocated for the receive
> * queue pair.
> - * @param session_pool Pointer to device session mempool, used
> - * for session-less operations.
> - *
> * @return
> * - 0: Success, queue pair correctly set up.
> * - <0: Queue pair configuration failed
> */
> extern int
> rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> - const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
> - struct rte_mempool *session_pool);
> + const struct rte_cryptodev_qp_conf *qp_conf, int socket_id);
>
> /**
> * Get the number of queue pairs on a specific crypto device
> @@ -954,10 +958,43 @@ rte_cryptodev_enqueue_burst(uint8_t dev_id, uint16_t qp_id,
> * has a fixed algo, key, op-type, digest_len etc.
> */
> struct rte_cryptodev_sym_session {
> - __extension__ void *sess_private_data[0];
> - /**< Private symmetric session material */
> + uint64_t userdata;
> + /**< Can be used for external metadata */
> + uint16_t nb_drivers;
> + /**< number of elements in sess_data array */
> + uint16_t priv_size;
> + /**< session private data will be placed after sess_data */
> + __extension__ struct {
> + void *data;
> + uint16_t refcnt;
> + } sess_data[0];
> + /**< Driver specific session material, variable size */
> };
>
> +static inline size_t
> +rte_cryptodev_sym_session_data_size(const struct rte_cryptodev_sym_session *s)
> +{
> + return (sizeof(s->sess_data[0]) * s->nb_drivers);
> +}
> +
> +static inline size_t
> +rte_cryptodev_sym_session_size(const struct rte_cryptodev_sym_session *s)
> +{
> + return (sizeof(*s) + (s)->priv_size +
> + rte_cryptodev_sym_session_data_size(s));
> +}
> +
[Fiona] Are above 2 fns used?
Look like duplicates of the "max" fns?
> +unsigned int
> +rte_cryptodev_sym_session_max_data_size(void);
> +
> +size_t
> +rte_cryptodev_sym_session_max_size(uint16_t priv_size);
> +
> +struct rte_mempool *
> +rte_cryptodev_sym_session_pool_create(const char *name,
> + uint32_t nb_elts, uint32_t cache_size, uint16_t priv_size,
> + int socket_id);
> +
> /** Cryptodev asymmetric crypto session */
> struct rte_cryptodev_asym_session {
> __extension__ void *sess_private_data[0];
> @@ -1123,7 +1160,7 @@ rte_cryptodev_asym_get_header_session_size(void);
> * symmetric session
> */
> unsigned int
> -rte_cryptodev_sym_get_private_session_size(uint8_t dev_id);
> +rte_cryptodev_sym_private_session_size(uint8_t dev_id);
>
> /**
> * Get the size of the private data for asymmetric session
> diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> index 6ff49d64d..2f98f65d1 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
> +++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
> @@ -191,13 +191,12 @@ typedef void (*cryptodev_info_get_t)(struct rte_cryptodev *dev,
> * @param qp_id Queue Pair Index
> * @param qp_conf Queue configuration structure
> * @param socket_id Socket Index
> - * @param session_pool Pointer to device session mempool
> *
> * @return Returns 0 on success.
> */
> typedef int (*cryptodev_queue_pair_setup_t)(struct rte_cryptodev *dev,
> uint16_t qp_id, const struct rte_cryptodev_qp_conf *qp_conf,
> - int socket_id, struct rte_mempool *session_pool);
> + int socket_id);
>
> /**
> * Release memory resources allocated by given queue pair.
> @@ -478,20 +477,25 @@ RTE_INIT(init_ ##driver_id)\
>
> static inline void *
> get_sym_session_private_data(const struct rte_cryptodev_sym_session *sess,
> - uint8_t driver_id) {
> - return sess->sess_private_data[driver_id];
> + uint8_t driver_id)
> +{
> + if (driver_id < sess->nb_drivers)
> + return sess->sess_data[driver_id].data;
> + return NULL;
> }
>
> static inline void
> set_sym_session_private_data(struct rte_cryptodev_sym_session *sess,
> uint8_t driver_id, void *private_data)
> {
> - sess->sess_private_data[driver_id] = private_data;
> + if (driver_id < sess->nb_drivers)
> + sess->sess_data[driver_id].data = private_data;
> }
>
> static inline void *
> get_asym_session_private_data(const struct rte_cryptodev_asym_session *sess,
> - uint8_t driver_id) {
> + uint8_t driver_id)
> +{
> return sess->sess_private_data[driver_id];
> }
>
> --
> 2.13.6
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [dpdk-techboard] DPDK techboard minutes of October 24
2018-11-12 16:55 3% ` Thomas Monjalon
@ 2018-11-13 9:33 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-11-13 9:33 UTC (permalink / raw)
To: Thomas Monjalon, Stephen Hemminger
Cc: techboard, Ananyev, Konstantin, Richardson, Bruce, Jerin Jacob, dev
On 12-Nov-18 4:55 PM, Thomas Monjalon wrote:
> 12/11/2018 17:43, Stephen Hemminger:
>> On Mon, 12 Nov 2018 12:36:45 +0000
>> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:
>>> From: Richardson, Bruce
>>>> From: techboard [mailto:techboard-bounces@dpdk.org] On Behalf Of Ananyev,
>>>>> Konstantin
>>>>>
>>>>> Hi Anatoly,
>>>>>
>>>>>>> Meeting notes for the DPDK technical board meeting held on
>>>>>>> 2018-10-24
> [...]
>>>>>>> 0) DPDK acceptance policy on un-implemented API
>>>>>>> - New APIs without implementation is not accepted.
>>>>>>> - In order to accept a new API, At minimum
>>>>>>> a) Need to provide an unit test case or example application
>>>>>>> b) If the API is about HW abstraction, at least one driver should be
>>>>>>> implemented. Preferably two.
>>>>>>> c) If there are strong objections on ML about the need for more than
>>>>>>> one driver for a specific API then the technical board can make a
>>>>>>> decision.
>>>>>>> - Konstantin volunteered to send existing un-implemented API to the
>>>>>>> mailing list.
>>>>>>> - The existing un-implemented APIs will be deprecated in v19.05.
>>>>>>> - Deprecated un-implemented API will be removed in v19.08
>>>>>>>
>>>>>>
>>>>>> Does this also apply to unimplemented parts of the existing API? For
>>>>>> example, malloc API has long had a "name" parameter which goes
>>>>>> unimplemented through entire lifetime of DPDK project. It would be
>>>>>> good to drop this thing entirely as it's clear it's not going to be
>>>>>> implemented any time soon :)
>>>>>>
>>>>>
>>>>> Sounds like a good idea to me.
>>>>> Konstantin
>>>>
>>>> While a good idea in theory, I'm not sure the cost-benefit pays off for this one. Given the fact that the extra parameter is rather harmless,
>>>> the benefit seems minimal compared to the effort which would be involved for everyone to have to change every rte_malloc call in every
>>>> app!
>>>
>>> I am agree about massive amount of changes, though I thought Anatoly sort of volunteering for it :)
>>> About benefit - it would save us spilling/restoring one register for each rte_malloc() call.
>>> Probably not that important, as rte_malloc() usually is used from data-path, but still.
>>> Plus it doesn't look good to have a function with parameter that would never be used.
>>> Konstantin
>>>
>>>
>>
>> I agree, we should do these kind of cleanups, but only on ABI breaking releases.
>> Too late now for 18.11 and next one is probably 19.11
>
> We can discuss which release can break ABI.
> I think 19.05 is a good candidate.
>
There's not much *actual* work involved in the rte_malloc change -
mostly search-and-replace. Given the head-start, i can go on with this
in the background so that it doesn't take away from my day-to-day
activities, and get it ready for 19.05 in time.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
2018-11-12 21:01 0% ` Trahe, Fiona
@ 2018-11-13 18:56 0% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-13 18:56 UTC (permalink / raw)
To: Trahe, Fiona, dev
Cc: De Lara Guarch, Pablo, Akhil Goyal, Doherty, Declan, Ravi Kumar,
Jerin Jacob, Zhang, Roy Fan, Tomasz Duszynski, Hemant Agrawal,
Natalie Samsonov, Dmitri Epshtein, Jay Zhou
Hi Fiona,
> -----Original Message-----
> From: Trahe, Fiona
> Sent: Monday, November 12, 2018 9:01 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Doherty, Declan
> <declan.doherty@intel.com>; Ravi Kumar <ravi1.kumar@amd.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; Tomasz Duszynski <tdu@semihalf.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Natalie Samsonov
> <nsamsono@marvell.com>; Dmitri Epshtein <dima@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>; Trahe, Fiona
> <fiona.trahe@intel.com>
> Subject: RE: [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
>
> Hi Konstantin,
> Sorry for the delay in reviewing this and thanks for your proposals.
NP, thanks for review.
My comments/answers inline.
Can you also have a look at related deprecation note:
http://patches.dpdk.org/patch/46633/
and provide the feedback?
Konstantin
>
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, August 24, 2018 10:48 AM
> > To: dev@dpdk.org
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Doherty, Declan
> > <declan.doherty@intel.com>; Ravi Kumar <ravi1.kumar@amd.com>; Jerin Jacob
> > <jerin.jacob@caviumnetworks.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; Trahe, Fiona
> > <fiona.trahe@intel.com>; Tomasz Duszynski <tdu@semihalf.com>; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; Natalie Samsonov <nsamsono@marvell.com>; Dmitri Epshtein
> > <dima@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> > Subject: [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
> >
> > This RFC for proposes several changes inside rte_cryptodev_sym_session.
> > Note that this is just RFC not a complete patch, so for now
> > I modified only the librte_cryptodev itself,
> > some cryptodev PMD, test-crypto-perf and ipsec-secgw example.
> > Proposed changes means ABI/API breakage inside cryptodev,
> > so looking for feedback from crypto-dev lib and crypto-PMD maintainiers.
> > Below are details and reasoning for proposed changes.
> >
> > 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> > operate based on cytpodev device id, though inside
> > rte_cryptodev_sym_session device specific data is addressed
> > by driver id (not device id).
> > That creates a problem with current implementation when we have
> > two or more devices with the same driver used by the same session.
> > Consider the following example:
> >
> > struct rte_cryptodev_sym_session *sess;
> > rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> > rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> > rte_cryptodev_sym_session_clear(dev_id=X, sess);
> >
> > After that point if X and Y uses the same driver,
> > then sess can't be used by device Y any more.
> > The reason for that - driver specific (not device specific)
> > data per session, plus there is no information
> > how many device instances use that data.
> > Probably the simplest way to deal with that issue -
> > add a reference counter per each driver data.
> [Fiona] Ok, I agree with this issue and proposed fix.
> We need to also document that it's user's responsibility
> not to call either init() or clear() twice on same device, as
> that would break the ref count.
I suppose it is obvious constrain, but sure, extra wording
can be put into the comments/docs, np with that.
> The same should be added to asym_session - though I accept
> it'sbe outside of the scope of this patch.
Agree on both - yes similar changes need to be done for asym,
and yes that patch targets sym session only.
>
>
> > 2.rte_cryptodev_sym_session_set_user_data() and
> > rte_cryptodev_sym_session_get_user_data() -
> > with current implementation there is no defined way for the user to
> > determine what is the max allowed size of the private data.
> > Even within rte_cryptodev_sym_session_set_user_data() we just blindly
> > copying user provided data without checking memory boundaries violation.
> > To overcome that issue I added 'uint16_t priv_size' into
> > rte_cryptodev_sym_session structure.
> [Fiona] I agree, this is needed.
> But I propose to call it user_data_sz NOT priv_size.
> See https://patches.dpdk.org/patch/42515/ to understand why.
Hmm that differs with mbuf naming scheme
(which I tried to follow here), but ok -
I don't have strong opinion here.
>
>
> > 3.rte_cryptodev_sym_session contains an array of variable size for
> > driver specific data.
> > Though number of elements in that array is determined by static
> > variable nb_drivers, that could be modified by
> > rte_cryptodev_allocate_driver().
> > That construction seems to work ok so far, as right now users register
> > all their PMDs at startup, though it doesn't mean that it would always
> > remain like that.
> > To make it less error prone I added 'uint16_t nb_drivers' into the
> > rte_cryptodev_sym_session structure.
> > At least that allows related functions to check that provided
> > driver id wouldn't overrun variable array boundaries,
> > again it allows to determine size of already allocated session
> > without accessing global variable.
> [Fiona] I agree with both issue and solution.
> The same should be added to asym_session - though again
> it's outside of the scope of this patch.
>
> > 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> > would have sort of readonly type data (init once at allocation time,
> > keep unmodified through session life-time).
> > That requires more changes in current cryptodev implementation:
> > Right now inside cryptodev framework both rte_cryptodev_sym_session
> > and driver specific session data are two completely different sctrucures
> > (e.g. struct struct null_crypto_session and struct null_crypto_session).
> > Though current cryptodev implementation implicitly assumes that driver
> > will allocate both of them from within the same mempool.
> > Plus this is done in a manner that they override each other fields
> > (reuse the same space - sort of implicit C union).
> > That's probably not the best programming practice,
> > plus make impossible to have readonly fields inside both of them.
> > So to overcome that situation I changed an API a bit, to allow
> > to use two different mempools for these two distinct data structures.
> >
> > 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> > I suppose that self-explanatory, and might be used in a lot of places
> > (would be quite useful for ipsec library we develop).
> [Fiona] Seems unnecessary - the set_user_data can be used. Why have 2
> separate user data spaces in the session? - it's confusing.
It allows quickly set/get external metadata associated with that session.
As an example - we plan to use it for pointer to ipsec SA associated
with given session.
Storing it inside priv_data section (user_data in your naming convention)
has several limitations:
- extra function call and extra memory dereference.
- each app would have to take into account that field when calculates size
for session mempool element.
Also note that inside one app could exist several session pools,
possibly with different layout for user private data,
unknown for generic libs.
Again, here I just used current mbuf approach:
userdata - (pointer to) some external metadata
(possibly temporally) associated with given mbuf.
priv_size (you suggest to call it user_data_sz) -
size of the application private data for given mbuf.
> If these is some good reason, then a different name should be used for clarity.
> Not private. And not user. Possibly opaque data.
Ok.
> Though afaik we had this in the op
> and removed it as it was felt appending user_data was enough.
>
> > So the new proposed layout for rte_cryptodev_sym_session:
> > struct rte_cryptodev_sym_session {
> > uint64_t userdata;
> > /**< Can be used for external metadata */
> > uint16_t nb_drivers;
> > /**< number of elements in sess_data array */
> > uint16_t priv_size;
> > /**< session private data will be placed after sess_data */
> > __extension__ struct {
> > void *data;
> > uint16_t refcnt;
> > } sess_data[0];
> > /**< Driver specific session material, variable size */
> > };
> >
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> > app/test-crypto-perf/cperf.h | 1 +
> > app/test-crypto-perf/cperf_ops.c | 11 +-
> > app/test-crypto-perf/cperf_ops.h | 2 +-
> > app/test-crypto-perf/cperf_test_latency.c | 5 +-
> > app/test-crypto-perf/cperf_test_latency.h | 1 +
> > app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 5 +-
> > app/test-crypto-perf/cperf_test_pmd_cyclecount.h | 1 +
> > app/test-crypto-perf/cperf_test_throughput.c | 5 +-
> > app/test-crypto-perf/cperf_test_throughput.h | 1 +
> > app/test-crypto-perf/cperf_test_verify.c | 5 +-
> > app/test-crypto-perf/cperf_test_verify.h | 1 +
> > app/test-crypto-perf/main.c | 111 +++++++++++------
> > drivers/crypto/aesni_gcm/aesni_gcm_pmd.c | 10 +-
> > drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c | 5 +-
> > drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h | 4 +-
> > drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c | 10 +-
> > drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c | 5 +-
> > drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h | 4 +-
> > drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 3 +-
> > drivers/crypto/dpaa_sec/dpaa_sec.c | 3 +-
> > drivers/crypto/null/null_crypto_pmd.c | 14 ++-
> > drivers/crypto/null/null_crypto_pmd_ops.c | 5 +-
> > drivers/crypto/null/null_crypto_pmd_private.h | 4 +-
> > drivers/crypto/scheduler/scheduler_pmd_ops.c | 5 +-
> > drivers/crypto/virtio/virtio_cryptodev.c | 6 +-
> > examples/ipsec-secgw/ipsec-secgw.c | 116 ++++++++++++------
> > examples/ipsec-secgw/ipsec.h | 2 +
> > lib/librte_cryptodev/rte_cryptodev.c | 134 ++++++++++++---------
> > lib/librte_cryptodev/rte_cryptodev.h | 53 ++++++--
> > lib/librte_cryptodev/rte_cryptodev_pmd.h | 16 ++-
> > 30 files changed, 356 insertions(+), 192 deletions(-)
> >
> ///snip///
>
> > struct cnt_blk {
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
> > index 63ae23f00..e25282445 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.c
> > +++ b/lib/librte_cryptodev/rte_cryptodev.c
> > @@ -943,8 +943,7 @@ rte_cryptodev_close(uint8_t dev_id)
> >
> > int
> > rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> > - const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
> > - struct rte_mempool *session_pool)
> > + const struct rte_cryptodev_qp_conf *qp_conf, int socket_id)
> >
> > {
> > struct rte_cryptodev *dev;
> > @@ -954,6 +953,12 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> > return -EINVAL;
> > }
> >
> > + if (qp_conf == NULL || qp_conf->sess_pool == NULL ||
> > + qp_conf->priv_sess_pool == NULL) {
> > + CDEV_LOG_ERR("Invalid queue_pair config");
> > + return -EINVAL;
> > + }
> > +
> > dev = &rte_crypto_devices[dev_id];
> > if (queue_pair_id >= dev->data->nb_queue_pairs) {
> > CDEV_LOG_ERR("Invalid queue_pair_id=%d", queue_pair_id);
> > @@ -969,7 +974,7 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_pair_setup, -ENOTSUP);
> >
> > return (*dev->dev_ops->queue_pair_setup)(dev, queue_pair_id, qp_conf,
> > - socket_id, session_pool);
> > + socket_id);
> > }
> >
> >
> > @@ -1146,6 +1151,41 @@ rte_cryptodev_pmd_callback_process(struct rte_cryptodev *dev,
> > rte_spinlock_unlock(&rte_cryptodev_cb_lock);
> > }
> >
> > +static void
> > +cryptodev_sym_session_init_elem(__rte_unused struct rte_mempool *pool,
> > + void *arg, void *obj, __rte_unused uint32_t idx)
> > +{
> > + struct rte_cryptodev_sym_session *ds;
> > + const struct rte_cryptodev_sym_session *ss;
> > +
> > + ds = obj;
> > + ss = arg;
> > +
> > + *ds = *ss;
> > + memset(ds->sess_data, 0, rte_cryptodev_sym_session_data_size(ds));
> > +}
> > +
> > +struct rte_mempool *
> > +rte_cryptodev_sym_session_pool_create(const char *name,
> > + uint32_t nb_elts, uint32_t cache_size, uint16_t priv_size,
> > + int socket_id)
> > +{
> > + struct rte_mempool *mp;
> > + uint32_t elt_size;
> > + struct rte_cryptodev_sym_session s = {
> > + .nb_drivers = nb_drivers,
> > + .priv_size = priv_size,
> > + };
> > +
> > + elt_size = rte_cryptodev_sym_session_max_size(priv_size);
> > + mp = rte_mempool_create(name, nb_elts, elt_size, cache_size, 0,
> > + NULL, NULL, cryptodev_sym_session_init_elem, &s,
> > + socket_id, 0);
> > + if (mp == NULL)
> > + CDEV_LOG_ERR("%s(name=%s) failed, rte_errno=%d\n",
> > + __func__, name, rte_errno);
> > + return mp;
> > +}
> >
> > int
> > rte_cryptodev_sym_session_init(uint8_t dev_id,
> > @@ -1163,12 +1203,15 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
> > return -EINVAL;
> >
> > index = dev->driver_id;
> > + if (index > sess->nb_drivers)
> > + return -EINVAL;
> >
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_configure, -ENOTSUP);
> >
> > - if (sess->sess_private_data[index] == NULL) {
> > + if (sess->sess_data[index].refcnt == 0) {
> > ret = dev->dev_ops->sym_session_configure(dev, xforms,
> > - sess, mp);
> > + sess, mp);
> > +
> > if (ret < 0) {
> > CDEV_LOG_ERR(
> > "dev_id %d failed to configure session details",
> > @@ -1177,6 +1220,7 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
> > }
> > }
> >
> > + sess->sess_data[index].refcnt++;
> > return 0;
> > }
> >
> > @@ -1229,8 +1273,7 @@ rte_cryptodev_sym_session_create(struct rte_mempool *mp)
> > /* Clear device session pointer.
> > * Include the flag indicating presence of user data
> > */
> > - memset(sess, 0, (sizeof(void *) * nb_drivers) + sizeof(uint8_t));
> > -
> > + memset(sess->sess_data, 0, rte_cryptodev_sym_session_data_size(sess));
> > return sess;
> > }
> >
> > @@ -1258,16 +1301,20 @@ rte_cryptodev_sym_session_clear(uint8_t dev_id,
> > struct rte_cryptodev_sym_session *sess)
> > {
> > struct rte_cryptodev *dev;
> > + uint32_t idx;
> >
> > dev = rte_cryptodev_pmd_get_dev(dev_id);
> >
> > - if (dev == NULL || sess == NULL)
> > + if (dev == NULL || sess == NULL || dev->driver_id > sess->nb_drivers)
> > return -EINVAL;
> >
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_clear, -ENOTSUP);
> >
> > - dev->dev_ops->sym_session_clear(dev, sess);
> > + idx = dev->driver_id;
> > + if (--sess->sess_data[idx].refcnt != 0)
> > + return -EBUSY;
> >
> > + dev->dev_ops->sym_session_clear(dev, sess);
> > return 0;
> > }
> >
> > @@ -1285,7 +1332,6 @@ rte_cryptodev_asym_session_clear(uint8_t dev_id,
> > RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->asym_session_clear, -ENOTSUP);
> >
> > dev->dev_ops->asym_session_clear(dev, sess);
> > -
> > return 0;
> > }
> >
> > @@ -1293,7 +1339,6 @@ int
> > rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
> > {
> > uint8_t i;
> > - void *sess_priv;
> > struct rte_mempool *sess_mp;
> >
> > if (sess == NULL)
> > @@ -1301,8 +1346,7 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
> >
> > /* Check that all device private data has been freed */
> > for (i = 0; i < nb_drivers; i++) {
> [Fiona] Use the sess.nb_drivers rather than the global.
Ok, though doesn't really matter here.
get_sym_session_private_data() will return NULL for invalid index anyway.
> Actually maybe name slightly differently to reduce the chance of that mistake happening, e.g.
> rename sess.nb_drivers to sess.num_drivers?
>
> > - sess_priv = get_sym_session_private_data(sess, i);
> > - if (sess_priv != NULL)
> > + if (sess->sess_data[i].refcnt != 0)
> > return -EBUSY;
> > }
> >
> > @@ -1313,6 +1357,23 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
> > return 0;
> > }
> >
> > +unsigned int
> > +rte_cryptodev_sym_session_max_data_size(void)
> [Fiona] Suggest renaming this
> rte_cryptodev_sym_session_max_array_size()
I usually don't care much about names, but that seems just confusing:
totally unclear which array we are talking about.
Any better naming ideas?
>
> > +{
> > + struct rte_cryptodev_sym_session *sess = NULL;
> > +
> > + return (sizeof(sess->sess_data[0]) * nb_drivers);
> > +}
> > +
> > +size_t
> > +rte_cryptodev_sym_session_max_size(uint16_t priv_size)
> > +{
> > + struct rte_cryptodev_sym_session *sess = NULL;
> > +
> > + return (sizeof(*sess) + priv_size +
> > + rte_cryptodev_sym_session_max_data_size());
> > +}
> > +
> > int __rte_experimental
> > rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
> > {
> > @@ -1337,18 +1398,6 @@ rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
> > return 0;
> > }
> >
> > -
> > -unsigned int
> > -rte_cryptodev_sym_get_header_session_size(void)
> > -{
> > - /*
> > - * Header contains pointers to the private data
> > - * of all registered drivers, and a flag which
> > - * indicates presence of user data
> > - */
> > - return ((sizeof(void *) * nb_drivers) + sizeof(uint8_t));
> > -}
> > -
> > unsigned int __rte_experimental
> > rte_cryptodev_asym_get_header_session_size(void)
> > {
> > @@ -1361,11 +1410,9 @@ rte_cryptodev_asym_get_header_session_size(void)
> > }
> >
> > unsigned int
> > -rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
> > +rte_cryptodev_sym_private_session_size(uint8_t dev_id)
> > {
> > struct rte_cryptodev *dev;
> > - unsigned int header_size = sizeof(void *) * nb_drivers;
> > - unsigned int priv_sess_size;
> >
> > if (!rte_cryptodev_pmd_is_valid_dev(dev_id))
> > return 0;
> > @@ -1375,18 +1422,7 @@ rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
> > if (*dev->dev_ops->sym_session_get_size == NULL)
> > return 0;
> >
> > - priv_sess_size = (*dev->dev_ops->sym_session_get_size)(dev);
> > -
> > - /*
> > - * If size is less than session header size,
> > - * return the latter, as this guarantees that
> > - * sessionless operations will work
> > - */
> > - if (priv_sess_size < header_size)
> > - return header_size;
> > -
> > - return priv_sess_size;
> > -
> > + return (*dev->dev_ops->sym_session_get_size)(dev);
> > }
> >
> > unsigned int __rte_experimental
> > @@ -1409,7 +1445,6 @@ rte_cryptodev_asym_get_private_session_size(uint8_t dev_id)
> > return header_size;
> >
> > return priv_sess_size;
> > -
> > }
> >
> > int __rte_experimental
> > @@ -1418,15 +1453,10 @@ rte_cryptodev_sym_session_set_user_data(
> > void *data,
> > uint16_t size)
> > {
> > - uint16_t off_set = sizeof(void *) * nb_drivers;
> > - uint8_t *user_data_present = (uint8_t *)sess + off_set;
> > -
> > - if (sess == NULL)
> > + if (sess == NULL || sess->priv_size < size)
> > return -EINVAL;
> >
> > - *user_data_present = 1;
> > - off_set += sizeof(uint8_t);
> > - rte_memcpy((uint8_t *)sess + off_set, data, size);
> > + rte_memcpy(sess->sess_data + sess->nb_drivers, data, size);
> > return 0;
> > }
> >
> > @@ -1434,14 +1464,10 @@ void * __rte_experimental
> > rte_cryptodev_sym_session_get_user_data(
> > struct rte_cryptodev_sym_session *sess)
> > {
> > - uint16_t off_set = sizeof(void *) * nb_drivers;
> > - uint8_t *user_data_present = (uint8_t *)sess + off_set;
> > -
> > - if (sess == NULL || !*user_data_present)
> > + if (sess == NULL || sess->priv_size == 0)
> > return NULL;
> >
> > - off_set += sizeof(uint8_t);
> > - return (uint8_t *)sess + off_set;
> > + return (sess->sess_data + sess->nb_drivers);
> > }
> >
> > /** Initialise rte_crypto_op mempool element */
> > diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
> > index 4099823f1..d88454f02 100644
> > --- a/lib/librte_cryptodev/rte_cryptodev.h
> > +++ b/lib/librte_cryptodev/rte_cryptodev.h
> > @@ -495,6 +495,14 @@ enum rte_cryptodev_event_type {
> > /** Crypto device queue pair configuration structure. */
> > struct rte_cryptodev_qp_conf {
> > uint32_t nb_descriptors; /**< Number of descriptors per queue pair */
> > + struct rte_mempool *sess_pool;
> > + /**< Pointer to crypto sessions mempool,
> > + * used for session-less operations.
> > + */
> > + struct rte_mempool *priv_sess_pool;
> > + /**< Pointer to device specific sessions mempool,
> > + * used for session-less operations.
> > + */
> > };
> >
> > /**
> > @@ -680,17 +688,13 @@ rte_cryptodev_close(uint8_t dev_id);
> > * *SOCKET_ID_ANY* if there is no NUMA constraint
> > * for the DMA memory allocated for the receive
> > * queue pair.
> > - * @param session_pool Pointer to device session mempool, used
> > - * for session-less operations.
> > - *
> > * @return
> > * - 0: Success, queue pair correctly set up.
> > * - <0: Queue pair configuration failed
> > */
> > extern int
> > rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> > - const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
> > - struct rte_mempool *session_pool);
> > + const struct rte_cryptodev_qp_conf *qp_conf, int socket_id);
> >
> > /**
> > * Get the number of queue pairs on a specific crypto device
> > @@ -954,10 +958,43 @@ rte_cryptodev_enqueue_burst(uint8_t dev_id, uint16_t qp_id,
> > * has a fixed algo, key, op-type, digest_len etc.
> > */
> > struct rte_cryptodev_sym_session {
> > - __extension__ void *sess_private_data[0];
> > - /**< Private symmetric session material */
> > + uint64_t userdata;
> > + /**< Can be used for external metadata */
> > + uint16_t nb_drivers;
> > + /**< number of elements in sess_data array */
> > + uint16_t priv_size;
> > + /**< session private data will be placed after sess_data */
> > + __extension__ struct {
> > + void *data;
> > + uint16_t refcnt;
> > + } sess_data[0];
> > + /**< Driver specific session material, variable size */
> > };
> >
> > +static inline size_t
> > +rte_cryptodev_sym_session_data_size(const struct rte_cryptodev_sym_session *s)
> > +{
> > + return (sizeof(s->sess_data[0]) * s->nb_drivers);
> > +}
> > +
> > +static inline size_t
> > +rte_cryptodev_sym_session_size(const struct rte_cryptodev_sym_session *s)
> > +{
> > + return (sizeof(*s) + (s)->priv_size +
> > + rte_cryptodev_sym_session_data_size(s));
> > +}
> > +
> [Fiona] Are above 2 fns used?
> Look like duplicates of the "max" fns?
Not really: rte_cryptodev_sym_session_max_data_size() and
rte_cryptodev_sym_session_max_size() use global var nb_drivers.
Returns amount of space required to create a session that
can work with all attached at that moment drivers.
Planned to be used by in rte_cryptodev_sym_session_max_size().
While these 2 funcs return size of particular session object.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
2018-11-12 12:03 0% ` Akhil Goyal
@ 2018-11-14 0:50 0% ` Trahe, Fiona
2018-11-14 3:15 0% ` Joseph, Anoob
1 sibling, 0 replies; 200+ results
From: Trahe, Fiona @ 2018-11-14 0:50 UTC (permalink / raw)
To: Akhil Goyal, Ananyev, Konstantin, dev, Ravi Kumar, Jerin Jacob,
Anoob Joseph, Doherty, Declan, Tomasz Duszynski, Dmitri Epshtein,
Natalie Samsonov, Jay Zhou
Cc: Trahe, Fiona, Zhang, Roy Fan
> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Monday, November 12, 2018 5:04 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org; Ravi Kumar
> <ravi1.kumar@amd.com>; Jerin Jacob <jerin.jacob@caviumnetworks.com>; Anoob Joseph
> <anoob.joseph@caviumnetworks.com>; Doherty, Declan <declan.doherty@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Tomasz Duszynski <tdu@semihalf.com>; Dmitri Epshtein <dima@marvell.com>;
> Natalie Samsonov <nsamsono@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
>
>
>
> On 10/11/2018 7:50 PM, Konstantin Ananyev wrote:
> > Below are details and reasoning for proposed changes.
> >
> > 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> > operate based on cytpodev device id, though inside
> > rte_cryptodev_sym_session device specific data is addressed
> > by driver id (not device id).
> > That creates a problem with current implementation when we have
> > two or more devices with the same driver used by the same session.
> > Consider the following example:
> >
> > struct rte_cryptodev_sym_session *sess;
> > rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> > rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> > rte_cryptodev_sym_session_clear(dev_id=X, sess);
> >
> > After that point if X and Y uses the same driver,
> > then sess can't be used by device Y any more.
> > The reason for that - driver specific (not device specific)
> > data per session, plus there is no information
> > how many device instances use that data.
> > Probably the simplest way to deal with that issue -
> > add a reference counter per each driver data.
> >
> > 2.rte_cryptodev_sym_session_set_user_data() and
> > rte_cryptodev_sym_session_get_user_data() -
> > with current implementation there is no defined way for the user to
> > determine what is the max allowed size of the private data.
> > rte_cryptodev_sym_session_set_user_data() just blindly copies
> > user provided data without checking memory boundaries violation.
> > To overcome that issue propose to add 'uint16_t priv_size' into
> > rte_cryptodev_sym_session structure.
> >
> > 3.rte_cryptodev_sym_session contains an array of variable size for
> > driver specific data.
> > Though number of elements in that array is determined by static
> > variable nb_drivers, that could be modified by
> > rte_cryptodev_allocate_driver().
> > That construction seems to work ok so far, as right now users register
> > all their PMDs at startup, though it doesn't mean that it would always
> > remain like that.
> > To make it less error prone propose to add 'uint16_t nb_drivers'
> > into the rte_cryptodev_sym_session structure.
> > At least that allows related functions to check that provided
> > driver id wouldn't overrun variable array boundaries,
> > again it allows to determine size of already allocated session
> > without accessing global variable.
> >
> > 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> > would have sort of readonly type data (init once at allocation time,
> > keep unmodified through session life-time).
> > That requires more changes in current cryptodev implementation:
> > Right now inside cryptodev framework both rte_cryptodev_sym_session
> > and driver specific session data are two completely different sctrucures
> > (e.g. struct cryptodev_sym_session and struct null_crypto_session).
> > Though current cryptodev implementation implicitly assumes that driver
> > will allocate both of them from within the same mempool.
> > Plus this is done in a manner that they override each other fields
> > (reuse the same space - sort of implicit C union).
> > That's probably not the best programming practice,
> > plus make impossible to have readonly fields inside both of them.
> > To overcome that situation propose to changed an API a bit, to allow
> > to use two different mempools for these two distinct data structures.
> >
> > 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> > I suppose that self-explanatory, and might be used in a lot of places
> > (would be quite useful for ipsec library we develop).
> >
> > The new proposed layout for rte_cryptodev_sym_session:
> > struct rte_cryptodev_sym_session {
> > uint64_t userdata;
> > /**< Can be used for external metadata */
> > uint16_t nb_drivers;
> > /**< number of elements in sess_data array */
> > uint16_t priv_size;
> > /**< session private data will be placed after sess_data */
> > __extension__ struct {
> > void *data;
> > uint16_t refcnt;
> > } sess_data[0];
> > /**< Driver specific session material, variable size */
> > };
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
> Adding maintainers to ack this deprecation notice. These changes will
> impact all the PMDs and everyone
> should agree to these changes.
>
> from NXP dpaa_sec, dpaa2_sec, caam_jr PMDs:
>
> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
> > ---
> > doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index d2aec64d1..998a0d92c 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -74,3 +74,12 @@ Deprecation Notices
> >
> > This is due to a lack of flexibility and reliance on a type unusable with
> > C++ programs (struct rte_flow_desc).
> > +
> > +* cryptodev: several API and ABI changes are planned for rte_cryptodev
> > + in v19.02:
> > +
> > + - The size and layout of ``rte_cryptodev_sym_session`` will change
> > + to fix existing issues.
> > + - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
> > + ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
> > + two different mempools for crypto and device private sessions.
Along with the naming changes agreed in the related RFC thread
Acked-by: Fiona Trahe <fiona.trahe@intel.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
2018-11-12 12:03 0% ` Akhil Goyal
2018-11-14 0:50 0% ` Trahe, Fiona
@ 2018-11-14 3:15 0% ` Joseph, Anoob
2018-11-14 10:08 0% ` Ananyev, Konstantin
1 sibling, 1 reply; 200+ results
From: Joseph, Anoob @ 2018-11-14 3:15 UTC (permalink / raw)
To: Akhil Goyal, Konstantin Ananyev, dev, Ravi Kumar, Jacob, Jerin,
Declan Doherty, Fiona Trahe, Tomasz Duszynski, Dmitri Epshtein,
Natalie Samsonov, Jay Zhou
Cc: Athreya, Narayana Prasad, Verma, Shally, Dwivedi, Ankur
Hi Akhil, Konstantin,
Wouldn't the new element, userdata, conflict with the one referred by
rte_cryptodev_sym_session_set_user_data()
rte_cryptodev_sym_session_get_user_data()
Do you mind a name change for either? Or do you have a clear picture of when one should be used over the other?
Thanks,
Anoob
> -----Original Message-----
> From: Akhil Goyal <akhil.goyal@nxp.com>
> Sent: 12 November 2018 17:34
> To: Konstantin Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org; Ravi
> Kumar <ravi1.kumar@amd.com>; Jacob, Jerin
> <Jerin.JacobKollanukkaran@cavium.com>; Joseph, Anoob
> <Anoob.Joseph@cavium.com>; Declan Doherty <declan.doherty@intel.com>;
> Fiona Trahe <fiona.trahe@intel.com>; Tomasz Duszynski <tdu@semihalf.com>;
> Dmitri Epshtein <dima@marvell.com>; Natalie Samsonov
> <nsamsono@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym
> session changes
>
> External Email
>
> On 10/11/2018 7:50 PM, Konstantin Ananyev wrote:
> > Below are details and reasoning for proposed changes.
> >
> > 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> > operate based on cytpodev device id, though inside
> > rte_cryptodev_sym_session device specific data is addressed
> > by driver id (not device id).
> > That creates a problem with current implementation when we have
> > two or more devices with the same driver used by the same session.
> > Consider the following example:
> >
> > struct rte_cryptodev_sym_session *sess;
> > rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> > rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> > rte_cryptodev_sym_session_clear(dev_id=X, sess);
> >
> > After that point if X and Y uses the same driver,
> > then sess can't be used by device Y any more.
> > The reason for that - driver specific (not device specific)
> > data per session, plus there is no information
> > how many device instances use that data.
> > Probably the simplest way to deal with that issue -
> > add a reference counter per each driver data.
> >
> > 2.rte_cryptodev_sym_session_set_user_data() and
> > rte_cryptodev_sym_session_get_user_data() -
> > with current implementation there is no defined way for the user to
> > determine what is the max allowed size of the private data.
> > rte_cryptodev_sym_session_set_user_data() just blindly copies
> > user provided data without checking memory boundaries violation.
> > To overcome that issue propose to add 'uint16_t priv_size' into
> > rte_cryptodev_sym_session structure.
> >
> > 3.rte_cryptodev_sym_session contains an array of variable size for
> > driver specific data.
> > Though number of elements in that array is determined by static
> > variable nb_drivers, that could be modified by
> > rte_cryptodev_allocate_driver().
> > That construction seems to work ok so far, as right now users register
> > all their PMDs at startup, though it doesn't mean that it would always
> > remain like that.
> > To make it less error prone propose to add 'uint16_t nb_drivers'
> > into the rte_cryptodev_sym_session structure.
> > At least that allows related functions to check that provided
> > driver id wouldn't overrun variable array boundaries,
> > again it allows to determine size of already allocated session
> > without accessing global variable.
> >
> > 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> > would have sort of readonly type data (init once at allocation time,
> > keep unmodified through session life-time).
> > That requires more changes in current cryptodev implementation:
> > Right now inside cryptodev framework both rte_cryptodev_sym_session
> > and driver specific session data are two completely different sctrucures
> > (e.g. struct cryptodev_sym_session and struct null_crypto_session).
> > Though current cryptodev implementation implicitly assumes that driver
> > will allocate both of them from within the same mempool.
> > Plus this is done in a manner that they override each other fields
> > (reuse the same space - sort of implicit C union).
> > That's probably not the best programming practice,
> > plus make impossible to have readonly fields inside both of them.
> > To overcome that situation propose to changed an API a bit, to allow
> > to use two different mempools for these two distinct data structures.
> >
> > 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> > I suppose that self-explanatory, and might be used in a lot of places
> > (would be quite useful for ipsec library we develop).
> >
> > The new proposed layout for rte_cryptodev_sym_session:
> > struct rte_cryptodev_sym_session {
> > uint64_t userdata;
> > /**< Can be used for external metadata */
> > uint16_t nb_drivers;
> > /**< number of elements in sess_data array */
> > uint16_t priv_size;
> > /**< session private data will be placed after sess_data */
> > __extension__ struct {
> > void *data;
> > uint16_t refcnt;
> > } sess_data[0];
> > /**< Driver specific session material, variable size */ };
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
> Adding maintainers to ack this deprecation notice. These changes will impact all
> the PMDs and everyone should agree to these changes.
>
> from NXP dpaa_sec, dpaa2_sec, caam_jr PMDs:
>
> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
> > ---
> > doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> > b/doc/guides/rel_notes/deprecation.rst
> > index d2aec64d1..998a0d92c 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -74,3 +74,12 @@ Deprecation Notices
> >
> > This is due to a lack of flexibility and reliance on a type unusable with
> > C++ programs (struct rte_flow_desc).
> > +
> > +* cryptodev: several API and ABI changes are planned for
> > +rte_cryptodev
> > + in v19.02:
> > +
> > + - The size and layout of ``rte_cryptodev_sym_session`` will change
> > + to fix existing issues.
> > + - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
> > + ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
> > + two different mempools for crypto and device private sessions.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
2018-11-14 3:15 0% ` Joseph, Anoob
@ 2018-11-14 10:08 0% ` Ananyev, Konstantin
2018-11-14 10:12 0% ` Joseph, Anoob
0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-11-14 10:08 UTC (permalink / raw)
To: Joseph, Anoob, Akhil Goyal, dev, Ravi Kumar, Jacob, Jerin,
Doherty, Declan, Trahe, Fiona, Tomasz Duszynski, Dmitri Epshtein,
Natalie Samsonov, Jay Zhou
Cc: Athreya, Narayana Prasad, Verma, Shally, Dwivedi, Ankur
Hi Anoob,
>
> Hi Akhil, Konstantin,
>
> Wouldn't the new element, userdata, conflict with the one referred by
>
> rte_cryptodev_sym_session_set_user_data()
> rte_cryptodev_sym_session_get_user_data()
>
> Do you mind a name change for either? Or do you have a clear picture of when one should be used over the other?
Yes, Fiona also pointed to that naming collision.
Current suggestion is to name this new element 'opaque_data' or so.
Konstantin
>
> Thanks,
> Anoob
>
> > -----Original Message-----
> > From: Akhil Goyal <akhil.goyal@nxp.com>
> > Sent: 12 November 2018 17:34
> > To: Konstantin Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org; Ravi
> > Kumar <ravi1.kumar@amd.com>; Jacob, Jerin
> > <Jerin.JacobKollanukkaran@cavium.com>; Joseph, Anoob
> > <Anoob.Joseph@cavium.com>; Declan Doherty <declan.doherty@intel.com>;
> > Fiona Trahe <fiona.trahe@intel.com>; Tomasz Duszynski <tdu@semihalf.com>;
> > Dmitri Epshtein <dima@marvell.com>; Natalie Samsonov
> > <nsamsono@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> > Subject: Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym
> > session changes
> >
> > External Email
> >
> > On 10/11/2018 7:50 PM, Konstantin Ananyev wrote:
> > > Below are details and reasoning for proposed changes.
> > >
> > > 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> > > operate based on cytpodev device id, though inside
> > > rte_cryptodev_sym_session device specific data is addressed
> > > by driver id (not device id).
> > > That creates a problem with current implementation when we have
> > > two or more devices with the same driver used by the same session.
> > > Consider the following example:
> > >
> > > struct rte_cryptodev_sym_session *sess;
> > > rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> > > rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> > > rte_cryptodev_sym_session_clear(dev_id=X, sess);
> > >
> > > After that point if X and Y uses the same driver,
> > > then sess can't be used by device Y any more.
> > > The reason for that - driver specific (not device specific)
> > > data per session, plus there is no information
> > > how many device instances use that data.
> > > Probably the simplest way to deal with that issue -
> > > add a reference counter per each driver data.
> > >
> > > 2.rte_cryptodev_sym_session_set_user_data() and
> > > rte_cryptodev_sym_session_get_user_data() -
> > > with current implementation there is no defined way for the user to
> > > determine what is the max allowed size of the private data.
> > > rte_cryptodev_sym_session_set_user_data() just blindly copies
> > > user provided data without checking memory boundaries violation.
> > > To overcome that issue propose to add 'uint16_t priv_size' into
> > > rte_cryptodev_sym_session structure.
> > >
> > > 3.rte_cryptodev_sym_session contains an array of variable size for
> > > driver specific data.
> > > Though number of elements in that array is determined by static
> > > variable nb_drivers, that could be modified by
> > > rte_cryptodev_allocate_driver().
> > > That construction seems to work ok so far, as right now users register
> > > all their PMDs at startup, though it doesn't mean that it would always
> > > remain like that.
> > > To make it less error prone propose to add 'uint16_t nb_drivers'
> > > into the rte_cryptodev_sym_session structure.
> > > At least that allows related functions to check that provided
> > > driver id wouldn't overrun variable array boundaries,
> > > again it allows to determine size of already allocated session
> > > without accessing global variable.
> > >
> > > 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> > > would have sort of readonly type data (init once at allocation time,
> > > keep unmodified through session life-time).
> > > That requires more changes in current cryptodev implementation:
> > > Right now inside cryptodev framework both rte_cryptodev_sym_session
> > > and driver specific session data are two completely different sctrucures
> > > (e.g. struct cryptodev_sym_session and struct null_crypto_session).
> > > Though current cryptodev implementation implicitly assumes that driver
> > > will allocate both of them from within the same mempool.
> > > Plus this is done in a manner that they override each other fields
> > > (reuse the same space - sort of implicit C union).
> > > That's probably not the best programming practice,
> > > plus make impossible to have readonly fields inside both of them.
> > > To overcome that situation propose to changed an API a bit, to allow
> > > to use two different mempools for these two distinct data structures.
> > >
> > > 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> > > I suppose that self-explanatory, and might be used in a lot of places
> > > (would be quite useful for ipsec library we develop).
> > >
> > > The new proposed layout for rte_cryptodev_sym_session:
> > > struct rte_cryptodev_sym_session {
> > > uint64_t userdata;
> > > /**< Can be used for external metadata */
> > > uint16_t nb_drivers;
> > > /**< number of elements in sess_data array */
> > > uint16_t priv_size;
> > > /**< session private data will be placed after sess_data */
> > > __extension__ struct {
> > > void *data;
> > > uint16_t refcnt;
> > > } sess_data[0];
> > > /**< Driver specific session material, variable size */ };
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Adding maintainers to ack this deprecation notice. These changes will impact all
> > the PMDs and everyone should agree to these changes.
> >
> > from NXP dpaa_sec, dpaa2_sec, caam_jr PMDs:
> >
> > Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
> > > ---
> > > doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> > > 1 file changed, 9 insertions(+)
> > >
> > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > b/doc/guides/rel_notes/deprecation.rst
> > > index d2aec64d1..998a0d92c 100644
> > > --- a/doc/guides/rel_notes/deprecation.rst
> > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > @@ -74,3 +74,12 @@ Deprecation Notices
> > >
> > > This is due to a lack of flexibility and reliance on a type unusable with
> > > C++ programs (struct rte_flow_desc).
> > > +
> > > +* cryptodev: several API and ABI changes are planned for
> > > +rte_cryptodev
> > > + in v19.02:
> > > +
> > > + - The size and layout of ``rte_cryptodev_sym_session`` will change
> > > + to fix existing issues.
> > > + - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
> > > + ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
> > > + two different mempools for crypto and device private sessions.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
2018-11-14 10:08 0% ` Ananyev, Konstantin
@ 2018-11-14 10:12 0% ` Joseph, Anoob
0 siblings, 0 replies; 200+ results
From: Joseph, Anoob @ 2018-11-14 10:12 UTC (permalink / raw)
To: Ananyev, Konstantin, Akhil Goyal, dev, Ravi Kumar, Jacob, Jerin,
Doherty, Declan, Trahe, Fiona, Tomasz Duszynski, Dmitri Epshtein,
Natalie Samsonov, Jay Zhou
Cc: Athreya, Narayana Prasad, Verma, Shally, Dwivedi, Ankur
> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: 14 November 2018 15:38
> To: Joseph, Anoob <Anoob.Joseph@cavium.com>; Akhil Goyal
> <akhil.goyal@nxp.com>; dev@dpdk.org; Ravi Kumar <ravi1.kumar@amd.com>;
> Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; Doherty, Declan
> <declan.doherty@intel.com>; Trahe, Fiona <fiona.trahe@intel.com>; Tomasz
> Duszynski <tdu@semihalf.com>; Dmitri Epshtein <dima@marvell.com>; Natalie
> Samsonov <nsamsono@marvell.com>; Jay Zhou <jianjay.zhou@huawei.com>
> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya@cavium.com>; Verma,
> Shally <Shally.Verma@cavium.com>; Dwivedi, Ankur
> <Ankur.Dwivedi@cavium.com>
> Subject: RE: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym
> session changes
>
> External Email
>
> Hi Anoob,
>
> >
> > Hi Akhil, Konstantin,
> >
> > Wouldn't the new element, userdata, conflict with the one referred by
> >
> > rte_cryptodev_sym_session_set_user_data()
> > rte_cryptodev_sym_session_get_user_data()
> >
> > Do you mind a name change for either? Or do you have a clear picture of when
> one should be used over the other?
>
> Yes, Fiona also pointed to that naming collision.
> Current suggestion is to name this new element 'opaque_data' or so.
> Konstantin
>
> >
> > Thanks,
> > Anoob
> >
> > > -----Original Message-----
> > > From: Akhil Goyal <akhil.goyal@nxp.com>
> > > Sent: 12 November 2018 17:34
> > > To: Konstantin Ananyev <konstantin.ananyev@intel.com>; dev@dpdk.org;
> > > Ravi Kumar <ravi1.kumar@amd.com>; Jacob, Jerin
> > > <Jerin.JacobKollanukkaran@cavium.com>; Joseph, Anoob
> > > <Anoob.Joseph@cavium.com>; Declan Doherty
> > > <declan.doherty@intel.com>; Fiona Trahe <fiona.trahe@intel.com>;
> > > Tomasz Duszynski <tdu@semihalf.com>; Dmitri Epshtein
> > > <dima@marvell.com>; Natalie Samsonov <nsamsono@marvell.com>; Jay
> > > Zhou <jianjay.zhou@huawei.com>
> > > Subject: Re: [dpdk-dev] [PATCH] doc: cryptodev deprecation notice
> > > for sym session changes
> > >
> > > External Email
> > >
> > > On 10/11/2018 7:50 PM, Konstantin Ananyev wrote:
> > > > Below are details and reasoning for proposed changes.
> > > >
> > > > 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> > > > operate based on cytpodev device id, though inside
> > > > rte_cryptodev_sym_session device specific data is addressed
> > > > by driver id (not device id).
> > > > That creates a problem with current implementation when we have
> > > > two or more devices with the same driver used by the same session.
> > > > Consider the following example:
> > > >
> > > > struct rte_cryptodev_sym_session *sess;
> > > > rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> > > > rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> > > > rte_cryptodev_sym_session_clear(dev_id=X, sess);
> > > >
> > > > After that point if X and Y uses the same driver,
> > > > then sess can't be used by device Y any more.
> > > > The reason for that - driver specific (not device specific)
> > > > data per session, plus there is no information
> > > > how many device instances use that data.
> > > > Probably the simplest way to deal with that issue -
> > > > add a reference counter per each driver data.
> > > >
> > > > 2.rte_cryptodev_sym_session_set_user_data() and
> > > > rte_cryptodev_sym_session_get_user_data() -
> > > > with current implementation there is no defined way for the user to
> > > > determine what is the max allowed size of the private data.
> > > > rte_cryptodev_sym_session_set_user_data() just blindly copies
> > > > user provided data without checking memory boundaries violation.
> > > > To overcome that issue propose to add 'uint16_t priv_size' into
> > > > rte_cryptodev_sym_session structure.
> > > >
> > > > 3.rte_cryptodev_sym_session contains an array of variable size for
> > > > driver specific data.
> > > > Though number of elements in that array is determined by static
> > > > variable nb_drivers, that could be modified by
> > > > rte_cryptodev_allocate_driver().
> > > > That construction seems to work ok so far, as right now users register
> > > > all their PMDs at startup, though it doesn't mean that it would always
> > > > remain like that.
> > > > To make it less error prone propose to add 'uint16_t nb_drivers'
> > > > into the rte_cryptodev_sym_session structure.
> > > > At least that allows related functions to check that provided
> > > > driver id wouldn't overrun variable array boundaries,
> > > > again it allows to determine size of already allocated session
> > > > without accessing global variable.
> > > >
> > > > 4.#2 and #3 above implies that now each struct
> rte_cryptodev_sym_session
> > > > would have sort of readonly type data (init once at allocation time,
> > > > keep unmodified through session life-time).
> > > > That requires more changes in current cryptodev implementation:
> > > > Right now inside cryptodev framework both rte_cryptodev_sym_session
> > > > and driver specific session data are two completely different sctrucures
> > > > (e.g. struct cryptodev_sym_session and struct null_crypto_session).
> > > > Though current cryptodev implementation implicitly assumes that driver
> > > > will allocate both of them from within the same mempool.
> > > > Plus this is done in a manner that they override each other fields
> > > > (reuse the same space - sort of implicit C union).
> > > > That's probably not the best programming practice,
> > > > plus make impossible to have readonly fields inside both of them.
> > > > To overcome that situation propose to changed an API a bit, to allow
> > > > to use two different mempools for these two distinct data structures.
> > > >
> > > > 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> > > > I suppose that self-explanatory, and might be used in a lot of places
> > > > (would be quite useful for ipsec library we develop).
> > > >
> > > > The new proposed layout for rte_cryptodev_sym_session:
> > > > struct rte_cryptodev_sym_session {
> > > > uint64_t userdata;
> > > > /**< Can be used for external metadata */
> > > > uint16_t nb_drivers;
> > > > /**< number of elements in sess_data array */
> > > > uint16_t priv_size;
> > > > /**< session private data will be placed after sess_data */
> > > > __extension__ struct {
> > > > void *data;
> > > > uint16_t refcnt;
> > > > } sess_data[0];
> > > > /**< Driver specific session material, variable size */
> > > > };
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > >
> > > Adding maintainers to ack this deprecation notice. These changes
> > > will impact all the PMDs and everyone should agree to these changes.
> > >
> > > from NXP dpaa_sec, dpaa2_sec, caam_jr PMDs:
> > >
> > > Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
With the naming changes,
Acked-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
> > > > ---
> > > > doc/guides/rel_notes/deprecation.rst | 9 +++++++++
> > > > 1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/doc/guides/rel_notes/deprecation.rst
> > > > b/doc/guides/rel_notes/deprecation.rst
> > > > index d2aec64d1..998a0d92c 100644
> > > > --- a/doc/guides/rel_notes/deprecation.rst
> > > > +++ b/doc/guides/rel_notes/deprecation.rst
> > > > @@ -74,3 +74,12 @@ Deprecation Notices
> > > >
> > > > This is due to a lack of flexibility and reliance on a type unusable with
> > > > C++ programs (struct rte_flow_desc).
> > > > +
> > > > +* cryptodev: several API and ABI changes are planned for
> > > > +rte_cryptodev
> > > > + in v19.02:
> > > > +
> > > > + - The size and layout of ``rte_cryptodev_sym_session`` will change
> > > > + to fix existing issues.
> > > > + - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
> > > > + ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
> > > > + two different mempools for crypto and device private sessions.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] doc: security deprecation notice for session changes
@ 2018-11-14 11:23 5% Konstantin Ananyev
2018-11-14 11:32 0% ` Mohammad Abdul Awal
2018-11-14 12:39 0% ` Akhil Goyal
0 siblings, 2 replies; 200+ results
From: Konstantin Ananyev @ 2018-11-14 11:23 UTC (permalink / raw)
To: dev; +Cc: akhil.goyal, declan.doherty, Konstantin Ananyev
Add 'uint64_t opaque_data' inside struct rte_security_session.
That allows upper layer to easily associate some user defined
data with the session.
Proposed new layout for:
struct rte_security_session {
void *sess_private_data;
/**< Private session material */
+ uint64_t opaque_data;
+ /**< Opaque user defined data */
};
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
doc/guides/rel_notes/deprecation.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 34b28234c..0cdc42842 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -55,3 +55,9 @@ Deprecation Notices
- ``rte_pdump_set_socket_dir`` will be removed;
- The parameter, ``path``, of ``rte_pdump_init`` will be removed;
- The enum ``rte_pdump_socktype`` will be removed.
+
+* security: ABI change:
+
+ New field ``uint64_t opaque_data`` is planned to add into
+ ``rte_security_session`` structure. That would allow upper layer to easily
+ associate/de-associate some user defined data with the security session.
--
2.17.1
^ permalink raw reply [relevance 5%]
* Re: [dpdk-dev] [PATCH] doc: security deprecation notice for session changes
2018-11-14 11:23 5% [dpdk-dev] [PATCH] doc: security deprecation notice for session changes Konstantin Ananyev
@ 2018-11-14 11:32 0% ` Mohammad Abdul Awal
2018-11-14 12:39 0% ` Akhil Goyal
1 sibling, 0 replies; 200+ results
From: Mohammad Abdul Awal @ 2018-11-14 11:32 UTC (permalink / raw)
To: Konstantin Ananyev, dev; +Cc: akhil.goyal, declan.doherty
On 14/11/2018 11:23, Konstantin Ananyev wrote:
> Add 'uint64_t opaque_data' inside struct rte_security_session.
> That allows upper layer to easily associate some user defined
> data with the session.
> Proposed new layout for:
> struct rte_security_session {
> void *sess_private_data;
> /**< Private session material */
> + uint64_t opaque_data;
> + /**< Opaque user defined data */
> };
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> doc/guides/rel_notes/deprecation.rst | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 34b28234c..0cdc42842 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -55,3 +55,9 @@ Deprecation Notices
> - ``rte_pdump_set_socket_dir`` will be removed;
> - The parameter, ``path``, of ``rte_pdump_init`` will be removed;
> - The enum ``rte_pdump_socktype`` will be removed.
> +
> +* security: ABI change:
> +
> + New field ``uint64_t opaque_data`` is planned to add into
> + ``rte_security_session`` structure. That would allow upper layer to easily
> + associate/de-associate some user defined data with the security session.
Acked-by: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: security deprecation notice for session changes
2018-11-14 11:23 5% [dpdk-dev] [PATCH] doc: security deprecation notice for session changes Konstantin Ananyev
2018-11-14 11:32 0% ` Mohammad Abdul Awal
@ 2018-11-14 12:39 0% ` Akhil Goyal
2018-11-14 13:02 0% ` Ananyev, Konstantin
1 sibling, 1 reply; 200+ results
From: Akhil Goyal @ 2018-11-14 12:39 UTC (permalink / raw)
To: Konstantin Ananyev, dev; +Cc: declan.doherty
On 11/14/2018 4:53 PM, Konstantin Ananyev wrote:
> Add 'uint64_t opaque_data' inside struct rte_security_session.
> That allows upper layer to easily associate some user defined
> data with the session.
> Proposed new layout for:
> struct rte_security_session {
> void *sess_private_data;
> /**< Private session material */
> + uint64_t opaque_data;
> + /**< Opaque user defined data */
> };
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Does this also mean you have given the Ack for removing the experimental
tag from security library? Because otherwise there is no point of this
deprecation notice if the library is not formal.
> doc/guides/rel_notes/deprecation.rst | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 34b28234c..0cdc42842 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -55,3 +55,9 @@ Deprecation Notices
> - ``rte_pdump_set_socket_dir`` will be removed;
> - The parameter, ``path``, of ``rte_pdump_init`` will be removed;
> - The enum ``rte_pdump_socktype`` will be removed.
> +
> +* security: ABI change:
> +
> + New field ``uint64_t opaque_data`` is planned to add into
> + ``rte_security_session`` structure. That would allow upper layer to easily
> + associate/de-associate some user defined data with the security session.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: security deprecation notice for session changes
2018-11-14 12:39 0% ` Akhil Goyal
@ 2018-11-14 13:02 0% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-14 13:02 UTC (permalink / raw)
To: Akhil Goyal, dev; +Cc: Doherty, Declan
> -----Original Message-----
> From: Akhil Goyal [mailto:akhil.goyal@nxp.com]
> Sent: Wednesday, November 14, 2018 12:40 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: Doherty, Declan <declan.doherty@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] doc: security deprecation notice for session changes
>
>
>
> On 11/14/2018 4:53 PM, Konstantin Ananyev wrote:
> > Add 'uint64_t opaque_data' inside struct rte_security_session.
> > That allows upper layer to easily associate some user defined
> > data with the session.
> > Proposed new layout for:
> > struct rte_security_session {
> > void *sess_private_data;
> > /**< Private session material */
> > + uint64_t opaque_data;
> > + /**< Opaque user defined data */
> > };
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
>
> Does this also mean you have given the Ack for removing the experimental
> tag from security library? Because otherwise there is no point of this
> deprecation notice if the library is not formal.
For the whole library - yes.
Though I still suggest to keep 'experimental' for non-implemented functions
(rte_security_get_userdata()).
Hope that wouldn't block you guys.
Konstantin
> > doc/guides/rel_notes/deprecation.rst | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index 34b28234c..0cdc42842 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -55,3 +55,9 @@ Deprecation Notices
> > - ``rte_pdump_set_socket_dir`` will be removed;
> > - The parameter, ``path``, of ``rte_pdump_init`` will be removed;
> > - The enum ``rte_pdump_socktype`` will be removed.
> > +
> > +* security: ABI change:
> > +
> > + New field ``uint64_t opaque_data`` is planned to add into
> > + ``rte_security_session`` structure. That would allow upper layer to easily
> > + associate/de-associate some user defined data with the security session.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH 0/9] ipsec: new library for IPsec data-path processing
2018-10-09 18:23 2% ` [dpdk-dev] [RFC v2 0/9] " Konstantin Ananyev
@ 2018-11-15 23:53 2% ` Konstantin Ananyev
1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-11-15 23:53 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
This patch series targets 19.02 release.
This patch series depends on the patch:
http://patches.dpdk.org/patch/48044/
to be applied first.
RFCv2 -> v1
- Changes per Jerin comments
- Implement transport mode
- Several bug fixes
- UT largely reworked and extended
This patch introduces a new library within DPDK: librte_ipsec.
The aim is to provide DPDK native high performance library for IPsec
data-path processing.
The library is supposed to utilize existing DPDK crypto-dev and
security API to provide application with transparent IPsec processing API.
The library is concentrated on data-path protocols processing (ESP and AH),
IKE protocol(s) implementation is out of scope for that library.
Current patch introduces SA-level API.
SA (low) level API
==================
API described below operates on SA level.
It provides functionality that allows user for given SA to process
inbound and outbound IPsec packets.
To be more specific:
- for inbound ESP/AH packets perform decryption, authentication,
integrity checking, remove ESP/AH related headers
- for outbound packets perform payload encryption, attach ICV,
update/add IP headers, add ESP/AH headers/trailers,
setup related mbuf felids (ol_flags, tx_offloads, etc.).
- initialize/un-initialize given SA based on user provided parameters.
The following functionality:
- match inbound/outbound packets to particular SA
- manage crypto/security devices
- provide SAD/SPD related functionality
- determine what crypto/security device has to be used
for given packet(s)
is out of scope for SA-level API.
SA-level API is based on top of crypto-dev/security API and relies on them
to perform actual cipher and integrity checking.
To have an ability to easily map crypto/security sessions into related
IPSec SA opaque userdata field was added into
rte_cryptodev_sym_session and rte_security_session structures.
That implies ABI change for both librte_crytpodev and librte_security.
Due to the nature of crypto-dev API (enqueue/deque model) we use
asynchronous API for IPsec packets destined to be processed
by crypto-device.
Expected API call sequence would be:
/* enqueue for processing by crypto-device */
rte_ipsec_pkt_crypto_prepare(...);
rte_cryptodev_enqueue_burst(...);
/* dequeue from crypto-device and do final processing (if any) */
rte_cryptodev_dequeue_burst(...);
rte_ipsec_pkt_crypto_group(...); /* optional */
rte_ipsec_pkt_process(...);
Though for packets destined for inline processing no extra overhead
is required and synchronous API call: rte_ipsec_pkt_process()
is sufficient for that case.
Current implementation supports all four currently defined rte_security types.
Though to accommodate future custom implementations function pointers
model is used for both for *crypto_prepare* and *process* impelementations.
Implemented:
------------
- ESP tunnel mode support (both IPv4/IPv6)
- ESP transport mode support (both IPv4/IPv6)
- Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL
- Anti-Replay window and ESN support
- Unit Test
TODO list
---------
- update examples/ipsec-secgw to use librte_ipsec
(will be subject of a separate patch).
Konstantin Ananyev (9):
cryptodev: add opaque userdata pointer into crypto sym session
security: add opaque userdata pointer into security session
net: add ESP trailer structure definition
lib: introduce ipsec library
ipsec: add SA data-path API
ipsec: implement SA data-path API
ipsec: rework SA replay window/SQN for MT environment
ipsec: helper functions to group completed crypto-ops
test/ipsec: introduce functional test
config/common_base | 5 +
lib/Makefile | 2 +
lib/librte_cryptodev/rte_cryptodev.h | 2 +
lib/librte_ipsec/Makefile | 27 +
lib/librte_ipsec/crypto.h | 119 ++
lib/librte_ipsec/iph.h | 63 +
lib/librte_ipsec/ipsec_sqn.h | 343 ++++
lib/librte_ipsec/meson.build | 10 +
lib/librte_ipsec/pad.h | 45 +
lib/librte_ipsec/rte_ipsec.h | 156 ++
lib/librte_ipsec/rte_ipsec_group.h | 151 ++
lib/librte_ipsec/rte_ipsec_sa.h | 166 ++
lib/librte_ipsec/rte_ipsec_version.map | 15 +
lib/librte_ipsec/sa.c | 1387 +++++++++++++++
lib/librte_ipsec/sa.h | 98 ++
lib/librte_ipsec/ses.c | 45 +
lib/librte_net/rte_esp.h | 10 +-
lib/librte_security/rte_security.h | 2 +
lib/meson.build | 2 +
mk/rte.app.mk | 2 +
test/test/Makefile | 3 +
test/test/meson.build | 3 +
test/test/test_ipsec.c | 2209 ++++++++++++++++++++++++
23 files changed, 4864 insertions(+), 1 deletion(-)
create mode 100644 lib/librte_ipsec/Makefile
create mode 100644 lib/librte_ipsec/crypto.h
create mode 100644 lib/librte_ipsec/iph.h
create mode 100644 lib/librte_ipsec/ipsec_sqn.h
create mode 100644 lib/librte_ipsec/meson.build
create mode 100644 lib/librte_ipsec/pad.h
create mode 100644 lib/librte_ipsec/rte_ipsec.h
create mode 100644 lib/librte_ipsec/rte_ipsec_group.h
create mode 100644 lib/librte_ipsec/rte_ipsec_sa.h
create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
create mode 100644 lib/librte_ipsec/sa.c
create mode 100644 lib/librte_ipsec/sa.h
create mode 100644 lib/librte_ipsec/ses.c
create mode 100644 test/test/test_ipsec.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] Direct using of 'rte_eth_devices' in DPDK apps.
@ 2018-11-16 9:51 3% ` Ananyev, Konstantin
2018-11-16 14:16 4% ` Wiles, Keith
2018-11-16 14:19 3% ` Wiles, Keith
0 siblings, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-16 9:51 UTC (permalink / raw)
To: Thomas Monjalon, Ilya Maximets
Cc: dev, Yigit, Ferruh, ovs-dev, Stokes, Ian, Kevin Traynor,
Ophir Munk, Shahaf Shuler, Eelco Chaudron, arybchenko
Hi everyone,
>
> Hi,
>
> 16/11/2018 09:42, Ilya Maximets:
> > Hi,
> > While discussing the ways to enable DPDK 18.11 new features in OVS
> > there was suggestions to use 'rte_eth_devices[]' array directly.
> > But this array is marked as '@internal' and also it located in
> > the internal header 'lib/librte_ethdev/rte_ethdev_core.h' with the
> > following disclaimer:
> >
> > /**
> > * @file
> > *
> > * RTE Ethernet Device internal header.
> > *
> > * This header contains internal data types. But they are still part of the
> > * public API because they are used by inline functions in the published API.
> > *
> > * Applications should not use these directly.
> > *
> > */
> >
> > From the other hand, test-pmd and some example apps in DPDK source
> > tree are using this array for various reasons.
> >
> > So, is it OK to use this array directly or not?
>
> Good question :)
> Thanks for bringing this discussion.
>
> As you said, it is public because of inline functions using it directly
> for performance purpose. The DPDK API is bad for separating public and
> internal stuff. And over time, there is not a lot of attention on trying
> to not use internal symbols in applications.
>
> > In general we need to change the API, i.e. make 'rte_eth_devices' part
> > of a public API. Or change the test-pmd and example apps to stop
> > using it.
>
> I agree we need to decide an option and make it clear.
>
> We can try to make this variable private and add more public functions
> to use it (I'm thinking at more iterators like sibling ones).
> It would clarify the API.
> It can be evaluated what is the real cost after compiler optimization
> for Rx/Tx functions. It can also be evaluated to uninline functions.
>
> On the other hand, we can wonder what is the real benefit of trying to
> hide access to internal resources. Should we make all public?
In that case every change in any of such structures will be an ABI breakage.
Even now any change in rte_eth_dev is quite problematic because of that.
I think we better keep them private as much as possible and cleanup
our examples and testpmd code.
Konstantin
>
> > One more related question: Is it OK to access internal device
> > stuff using 'device' pointer obtained by 'rte_eth_dev_info'?
> > This looks really dangerous. It's unclear why pointers like this
> > exposed to user.
>
> It's a lot easier to expose pointers than doing a good API for all uses.
> We need to question what is really dangerous and what we want to avoid?
>
>
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] Direct using of 'rte_eth_devices' in DPDK apps.
2018-11-16 9:51 3% ` Ananyev, Konstantin
@ 2018-11-16 14:16 4% ` Wiles, Keith
2018-11-16 14:19 3% ` Wiles, Keith
1 sibling, 0 replies; 200+ results
From: Wiles, Keith @ 2018-11-16 14:16 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Thomas Monjalon, Ilya Maximets, dev, Yigit, Ferruh, ovs-dev,
Stokes, Ian, Kevin Traynor, Ophir Munk, Shahaf Shuler,
Eelco Chaudron, arybchenko
On Nov 16, 2018, at 3:51 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com<mailto:konstantin.ananyev@intel.com>> wrote:
Hi everyone,
Hi,
16/11/2018 09:42, Ilya Maximets:
Hi,
While discussing the ways to enable DPDK 18.11 new features in OVS
there was suggestions to use 'rte_eth_devices[]' array directly.
But this array is marked as '@internal' and also it located in
the internal header 'lib/librte_ethdev/rte_ethdev_core.h' with the
following disclaimer:
/**
* @file
*
* RTE Ethernet Device internal header.
*
* This header contains internal data types. But they are still part of the
* public API because they are used by inline functions in the published API.
*
* Applications should not use these directly.
*
*/
>From the other hand, test-pmd and some example apps in DPDK source
tree are using this array for various reasons.
So, is it OK to use this array directly or not?
Good question :)
Thanks for bringing this discussion.
As you said, it is public because of inline functions using it directly
for performance purpose. The DPDK API is bad for separating public and
internal stuff. And over time, there is not a lot of attention on trying
to not use internal symbols in applications.
In general we need to change the API, i.e. make 'rte_eth_devices' part
of a public API. Or change the test-pmd and example apps to stop
using it.
I agree we need to decide an option and make it clear.
We can try to make this variable private and add more public functions
to use it (I'm thinking at more iterators like sibling ones).
It would clarify the API.
It can be evaluated what is the real cost after compiler optimization
for Rx/Tx functions. It can also be evaluated to uninline functions.
On the other hand, we can wonder what is the real benefit of trying to
hide access to internal resources. Should we make all public?
In that case every change in any of such structures will be an ABI breakage.
Even now any change in rte_eth_dev is quite problematic because of that.
I think we better keep them private as much as possible and cleanup
our examples and testpmd code.
Konstantin
I Agree here, I have noticed a few places we allow direct access to internal data structures, which we need to restrict access by making them private with getter/setter functions or just getter/setter functions even if we can not make them private. At least with moving members and adding members we can state that it is not a ABI breakage as long as everyone uses the getter/setter functions. These functions could not be inline functions correct as that would still break API?
One more related question: Is it OK to access internal device
stuff using 'device' pointer obtained by 'rte_eth_dev_info'?
This looks really dangerous. It's unclear why pointers like this
exposed to user.
It's a lot easier to expose pointers than doing a good API for all uses.
We need to question what is really dangerous and what we want to avoid?
Regards,
Keith
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] Direct using of 'rte_eth_devices' in DPDK apps.
2018-11-16 9:51 3% ` Ananyev, Konstantin
2018-11-16 14:16 4% ` Wiles, Keith
@ 2018-11-16 14:19 3% ` Wiles, Keith
1 sibling, 0 replies; 200+ results
From: Wiles, Keith @ 2018-11-16 14:19 UTC (permalink / raw)
To: Ananyev, Konstantin
Cc: Thomas Monjalon, Ilya Maximets, dev, Yigit, Ferruh, ovs-dev,
Stokes, Ian, Kevin Traynor, Ophir Munk, Shahaf Shuler,
Eelco Chaudron, arybchenko
> On Nov 16, 2018, at 3:51 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
>
> Hi everyone,
>
>>
>> Hi,
>>
>> 16/11/2018 09:42, Ilya Maximets:
>>> Hi,
>>> While discussing the ways to enable DPDK 18.11 new features in OVS
>>> there was suggestions to use 'rte_eth_devices[]' array directly.
>>> But this array is marked as '@internal' and also it located in
>>> the internal header 'lib/librte_ethdev/rte_ethdev_core.h' with the
>>> following disclaimer:
>>>
>>> /**
>>> * @file
>>> *
>>> * RTE Ethernet Device internal header.
>>> *
>>> * This header contains internal data types. But they are still part of the
>>> * public API because they are used by inline functions in the published API.
>>> *
>>> * Applications should not use these directly.
>>> *
>>> */
>>>
>>> From the other hand, test-pmd and some example apps in DPDK source
>>> tree are using this array for various reasons.
>>>
>>> So, is it OK to use this array directly or not?
>>
>> Good question :)
>> Thanks for bringing this discussion.
>>
>> As you said, it is public because of inline functions using it directly
>> for performance purpose. The DPDK API is bad for separating public and
>> internal stuff. And over time, there is not a lot of attention on trying
>> to not use internal symbols in applications.
>>
>>> In general we need to change the API, i.e. make 'rte_eth_devices' part
>>> of a public API. Or change the test-pmd and example apps to stop
>>> using it.
>>
>> I agree we need to decide an option and make it clear.
>>
>> We can try to make this variable private and add more public functions
>> to use it (I'm thinking at more iterators like sibling ones).
>> It would clarify the API.
>> It can be evaluated what is the real cost after compiler optimization
>> for Rx/Tx functions. It can also be evaluated to uninline functions.
>>
>> On the other hand, we can wonder what is the real benefit of trying to
>> hide access to internal resources. Should we make all public?
>
> In that case every change in any of such structures will be an ABI breakage.
> Even now any change in rte_eth_dev is quite problematic because of that.
> I think we better keep them private as much as possible and cleanup
> our examples and testpmd code.
> Konstantin
I Agree here, I have noticed a few places we allow direct access to internal data structures, which we need to restrict access by making them private with getter/setter functions or just getter/setter functions even if we can not make them private. At least with moving members and adding members we can state that it is not a ABI breakage as long as everyone uses the getter/setter functions. These functions could not be inline functions correct as that would still break API?
>
>>
>>> One more related question: Is it OK to access internal device
>>> stuff using 'device' pointer obtained by 'rte_eth_dev_info'?
>>> This looks really dangerous. It's unclear why pointers like this
>>> exposed to user.
>>
>> It's a lot easier to expose pointers than doing a good API for all uses.
>> We need to question what is really dangerous and what we want to avoid?
Regards,
Keith
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file
2018-11-02 11:50 0% ` Neil Horman
@ 2018-11-18 22:25 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-11-18 22:25 UTC (permalink / raw)
To: Neil Horman; +Cc: dev, doucette
02/11/2018 12:50, Neil Horman:
> On Thu, Nov 01, 2018 at 11:53:00PM +0100, Thomas Monjalon wrote:
> > 01/11/2018 14:54, Neil Horman:
> > > the regex to determine the end of the map file chunk in a patch seems to
> > > be wrong, It was using perl regex syntax, which awk doesn't appear to
> > > support (I'm still not sure how it was working previously). Regardless,
> > > it wasn't triggering and as a result symbols were getting added to the
> > > mapdb that shouldn't be there.
> > >
> > > Fix it by converting the regex to use traditional posix syntax, matching
> > > only on the negation of the character class [^map]
> > >
> > > Tested and shown to be working on the ip_frag patch set provided by
> > > doucette@bu.edu
> > >
> > > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > > CC: thomas@monjalon.net
> > > CC: doucette@bu.edu
> > > Reported-by: doucette@bu.edu
> >
> > You could use these lines:
> >
> > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> >
> > Reported-by: Cody Doucette <doucette@bu.edu>
> >
> I'm fine with the second line, and the first is fine I guess, but I'm not sure
> there is an exact correlation
>
> > > --- a/devtools/check-symbol-change.sh
> > > +++ b/devtools/check-symbol-change.sh
> > > - /[-+] a\/.*\.^(map)/ {in_map=0}
> > > + /[-+] a\/.*\.[^map]/ {in_map=0}
> >
> > Not sure this is what you intend:
> > [^map] means any character except "m", "a" and "p".
> >
> Its not 100%, but its pretty close. The regex for exact matching on not a
> specific string is pretty large and complex. Since we have no files that that
> end in .m .a or .p, this should give us what we want for the forseeable future.
>
> > I don't know whether awk supports this syntax: (?!foo)
> >
> It unfortunately doesn't, thats perl syntax, and while grep I think supports it,
> awk is more strictly posix compliant.
I understand now.
Applied, thanks
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state
@ 2018-11-20 11:52 3% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-20 11:52 UTC (permalink / raw)
To: Andrew Rybchenko, Neil Horman, John McNamara, Marko Kovacevic
Cc: dev, Thomas Monjalon, Matan Azrad
On 8/27/2018 4:00 PM, Andrew Rybchenko wrote:
> On 08/24/2018 05:51 PM, Ferruh Yigit wrote:
>> Add a deprecation notice to remove RTE_ETH_DEV_DEFERRED state, but this
>> is mostly a reminder because of a missing target.
>> It doesn't worth to break the ABI because of this change and removal
>> can be done when ethdev ABI version increased.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> Cc: Thomas Monjalon <thomas@monjalon.net>
>> Cc: Andrew Rybchenko <arybchenko@solarflare.com>
>> Cc: Matan Azrad <matan@mellanox.com>
>> ---
>> doc/guides/rel_notes/deprecation.rst | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
>> index e2dbee317..9cd12ccd8 100644
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> @@ -95,3 +95,7 @@ Deprecation Notices
>>
>> This is due to a lack of flexibility and reliance on a type unusable with
>> C++ programs (struct rte_flow_desc).
>> +
>> +* ethdev: remove deprecated RTE_ETH_DEV_DEFERRED device state.
>> + Since this is an enum filed in the middle, removing this field will break
>> + the ABI, so removing postponed to next ethdev ABI version increase.
>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>
In this release we already break the ABI for ethdev, instead of putting this
deprecation notice in, I will send a patch to remove RTE_ETH_DEV_DEFERRED, since
it is not used in current code, it should be trivial and safe change.
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH] ethdev: remove unused DEFERRED device state
@ 2018-11-20 12:02 3% ` Ferruh Yigit
2018-11-20 14:15 0% ` Matan Azrad
1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-11-20 12:02 UTC (permalink / raw)
To: Thomas Monjalon, Andrew Rybchenko; +Cc: dev, Ferruh Yigit, Matan Azrad
DEFERRED state replaced by ownership concept and it is no more used as
code comment states.
ethdev ABI broken on this release use this opportunity to remove
DEFERRED state.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Matan Azrad <matan@mellanox.com>
---
lib/librte_ethdev/rte_ethdev.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 8a92d91e3..1960f3a2d 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1306,8 +1306,6 @@ enum rte_eth_dev_state {
RTE_ETH_DEV_UNUSED = 0,
/** Device is attached when allocated in probing. */
RTE_ETH_DEV_ATTACHED,
- /** The deferred state is useless and replaced by ownership. */
- RTE_ETH_DEV_DEFERRED,
/** Device is in removed state when plug-out is detected. */
RTE_ETH_DEV_REMOVED,
};
--
2.17.2
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] ethdev: remove unused DEFERRED device state
2018-11-20 12:02 3% ` [dpdk-dev] [PATCH] ethdev: remove unused " Ferruh Yigit
@ 2018-11-20 14:15 0% ` Matan Azrad
2018-11-21 15:20 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Matan Azrad @ 2018-11-20 14:15 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon, Andrew Rybchenko; +Cc: dev
From: Ferruh Yigit
> DEFERRED state replaced by ownership concept and it is no more used as
> code comment states.
>
> ethdev ABI broken on this release use this opportunity to remove DEFERRED
> state.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Matan Azrad <matan@mellanox.com>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] Last call for deprecation notices
@ 2018-11-21 11:45 4% Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-21 11:45 UTC (permalink / raw)
To: dpdk-dev; +Cc: Thomas Monjalon
If there are any planned API/ABI change on v19.02, deprecation notice patches
for them should be sent, approved and merged withing v18.11 scope which is only
a few days away.
If you will be working on a feature for v19.02, please take some time think if
it will cause any API/ABI change.
And if it will so, please send the deprecation notice for it ASAP, otherwise
your feature can be blocked for the release because of API/ABI breakage it causes.
Thanks,
ferruh
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] ethdev: remove unused DEFERRED device state
2018-11-20 14:15 0% ` Matan Azrad
@ 2018-11-21 15:20 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-21 15:20 UTC (permalink / raw)
To: Matan Azrad, Thomas Monjalon, Andrew Rybchenko; +Cc: dev
On 11/20/2018 2:15 PM, Matan Azrad wrote:
>
>
> From: Ferruh Yigit
>> DEFERRED state replaced by ownership concept and it is no more used as
>> code comment states.
>>
>> ethdev ABI broken on this release use this opportunity to remove DEFERRED
>> state.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> Acked-by: Matan Azrad <matan@mellanox.com>
Applied to dpdk-next-net/master, thanks.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] doc: announce kvargs API change
@ 2018-11-21 15:45 5% Thomas Monjalon
2018-11-22 10:32 5% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-21 15:45 UTC (permalink / raw)
To: dev; +Cc: olivier.matz
In some usages, kvlist is processed one time in rte_kvargs_process(),
and it is processed a second time if need to check whether it was matched.
In order to simplify implementation of kvargs checks, a new callback
may be used for "no match" cases.
The change of the function prototype would be as below:
int
rte_kvargs_process(const struct rte_kvargs *kvlist,
const char *key_match,
- arg_handler_t handler,
+ arg_handler_t match_handler,
+ arg_handler_t no_match_handler,
void *opaque_arg)
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
doc/guides/rel_notes/deprecation.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 34b28234c..7af65cd4b 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,6 +11,10 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
+* kvargs: The function ``rte_kvargs_process`` will get a new parameter
+ for a function pointer called in case of no match of the key.
+ It will ease implementation of default values or check for mandatory keys.
+
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
--
2.19.0
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [RFC] pdump: remove deprecated APIs
@ 2018-11-22 2:55 3% Tiwei Bie
0 siblings, 0 replies; 200+ results
From: Tiwei Bie @ 2018-11-22 2:55 UTC (permalink / raw)
To: dev; +Cc: reshma.pattan
We already changed to use generic IPC in pdump since below commit:
commit 660098d61f57 ("pdump: use generic multi-process channel")
The `rte_pdump_set_socket_dir()`, the `path` parameter of
`rte_pdump_init()` and the `enum rte_pdump_socktype` have been
deprecated since then. This commit removes these deprecated
APIs and also bumps the pdump ABI.
Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
This patch is marked as RFC because the API and ABI changes should
also be documented by this patch in the `release_19_02.rst` which
doesn't exist currently. I will send a new version once we have it.
app/test-pmd/testpmd.c | 2 +-
doc/guides/prog_guide/pdump_lib.rst | 14 ++------------
doc/guides/rel_notes/deprecation.rst | 7 -------
lib/librte_pdump/Makefile | 2 +-
lib/librte_pdump/meson.build | 2 +-
lib/librte_pdump/rte_pdump.c | 9 +--------
lib/librte_pdump/rte_pdump.h | 34 +---------------------------------
lib/librte_pdump/rte_pdump_version.map | 1 -
8 files changed, 7 insertions(+), 64 deletions(-)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 4c75587d0..a10bc40bb 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -3104,7 +3104,7 @@ main(int argc, char** argv)
#ifdef RTE_LIBRTE_PDUMP
/* initialize packet capture framework */
- rte_pdump_init(NULL);
+ rte_pdump_init();
#endif
count = 0;
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index ed3c15e58..afb5b3411 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -34,10 +34,6 @@ or disable the packet capture, and to uninitialize it:
* ``rte_pdump_uninit()``:
This API uninitializes the packet capture framework.
-* ``rte_pdump_set_socket_dir()``:
- This API sets the server and client socket paths.
- Note: This API is not thread-safe.
-
Operation
---------
@@ -60,8 +56,8 @@ enabling or disabling the packet capture.
Implementation Details
----------------------
-The library API ``rte_pdump_init()``, initializes the packet capture framework by creating the pthread and the server
-socket. The server socket in the pthread context will be listening to the client requests to enable or disable the
+The library API ``rte_pdump_init()``, initializes the packet capture framework by creating the pdump server by calling
+``rte_mp_action_register()`` function. The server will listen to the client requests to enable or disable the
packet capture.
The library APIs ``rte_pdump_enable()`` and ``rte_pdump_enable_by_deviceid()`` enables the packet capture.
@@ -82,12 +78,6 @@ received from the server, the client socket is closed.
The library API ``rte_pdump_uninit()``, uninitializes the packet capture framework by closing the pthread and the
server socket.
-The library API ``rte_pdump_set_socket_dir()``, sets the given path as either server socket path
-or client socket path based on the ``type`` argument of the API.
-If the given path is ``NULL``, default path will be selected, i.e. either ``/var/run/.dpdk`` for root user or ``~/.dpdk``
-for non root user. Clients also need to call this API to set their server socket path if the server socket
-path is different from default path.
-
Use Case: Packet Capturing
--------------------------
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 34b28234c..586bf98c5 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -48,10 +48,3 @@ Deprecation Notices
PMDs that implement the latter.
Target release for removal of the legacy API will be defined once most
PMDs have switched to rte_flow.
-
-* pdump: As we changed to use generic IPC, some changes in APIs and structure
- are expected in subsequent release.
-
- - ``rte_pdump_set_socket_dir`` will be removed;
- - The parameter, ``path``, of ``rte_pdump_init`` will be removed;
- - The enum ``rte_pdump_socktype`` will be removed.
diff --git a/lib/librte_pdump/Makefile b/lib/librte_pdump/Makefile
index b241151dc..89593689a 100644
--- a/lib/librte_pdump/Makefile
+++ b/lib/librte_pdump/Makefile
@@ -12,7 +12,7 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev
EXPORT_MAP := rte_pdump_version.map
-LIBABIVER := 2
+LIBABIVER := 3
# all source are stored in SRCS-y
SRCS-$(CONFIG_RTE_LIBRTE_PDUMP) := rte_pdump.c
diff --git a/lib/librte_pdump/meson.build b/lib/librte_pdump/meson.build
index be80904b9..b4b4f26c5 100644
--- a/lib/librte_pdump/meson.build
+++ b/lib/librte_pdump/meson.build
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
-version = 2
+version = 3
sources = files('rte_pdump.c')
headers = files('rte_pdump.h')
allow_experimental_apis = true
diff --git a/lib/librte_pdump/rte_pdump.c b/lib/librte_pdump/rte_pdump.c
index 6c3a88581..4f38ac58b 100644
--- a/lib/librte_pdump/rte_pdump.c
+++ b/lib/librte_pdump/rte_pdump.c
@@ -406,7 +406,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
}
int
-rte_pdump_init(const char *path __rte_unused)
+rte_pdump_init(void)
{
return rte_mp_action_register(PDUMP_MP, pdump_server);
}
@@ -616,10 +616,3 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
return ret;
}
-
-int
-rte_pdump_set_socket_dir(const char *path __rte_unused,
- enum rte_pdump_socktype type __rte_unused)
-{
- return 0;
-}
diff --git a/lib/librte_pdump/rte_pdump.h b/lib/librte_pdump/rte_pdump.h
index 673a2b070..6b00fc17a 100644
--- a/lib/librte_pdump/rte_pdump.h
+++ b/lib/librte_pdump/rte_pdump.h
@@ -29,25 +29,16 @@ enum {
RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
};
-enum rte_pdump_socktype {
- RTE_PDUMP_SOCKET_SERVER = 1,
- RTE_PDUMP_SOCKET_CLIENT = 2
-};
-
/**
* Initialize packet capturing handling
*
* Register the IPC action for communication with target (primary) process.
*
- * @param path
- * This parameter is going to be deprecated; it was used for specifying the
- * directory path for server socket.
- *
* @return
* 0 on success, -1 on error
*/
int
-rte_pdump_init(const char *path);
+rte_pdump_init(void);
/**
* Un initialize packet capturing handling
@@ -162,29 +153,6 @@ int
rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
uint32_t flags);
-/**
- * @deprecated
- * Allows applications to set server and client socket paths.
- * If specified path is null default path will be selected, i.e.
- *"/var/run/" for root user and "$HOME" for non root user.
- * Clients also need to call this API to set their server path if the
- * server path is different from default path.
- * This API is not thread-safe.
- *
- * @param path
- * directory path for server or client socket.
- * @param type
- * specifies RTE_PDUMP_SOCKET_SERVER if socket path is for server.
- * (or)
- * specifies RTE_PDUMP_SOCKET_CLIENT if socket path is for client.
- *
- * @return
- * 0 on success, -EINVAL on error
- *
- */
-__rte_deprecated int
-rte_pdump_set_socket_dir(const char *path, enum rte_pdump_socktype type);
-
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_pdump/rte_pdump_version.map b/lib/librte_pdump/rte_pdump_version.map
index edec99a41..3e744f301 100644
--- a/lib/librte_pdump/rte_pdump_version.map
+++ b/lib/librte_pdump/rte_pdump_version.map
@@ -6,7 +6,6 @@ DPDK_16.07 {
rte_pdump_enable;
rte_pdump_enable_by_deviceid;
rte_pdump_init;
- rte_pdump_set_socket_dir;
rte_pdump_uninit;
local: *;
--
2.14.5
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
@ 2018-11-22 9:58 3% ` Stokes, Ian
0 siblings, 0 replies; 200+ results
From: Stokes, Ian @ 2018-11-22 9:58 UTC (permalink / raw)
To: Stephen Hemminger, Andrew Rybchenko; +Cc: dev, Shahaf Shuler
> On Thu, 6 Sep 2018 09:29:32 +0300
> Andrew Rybchenko <arybchenko@solarflare.com> wrote:
>
> > On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > > This addresses the usability issue raised by OVS at DPDK Userspace
> > > summit. It adds general min/max mtu into device info. For
> > > compatiablity, and to save space, it fits in a hole in existing
> structure.
> >
> > It is true for amd64, but it looks like it is false on 32-bit. So, ABI
> > breakage.
>
> Yes it is ABI change on 32 bit, but 18.11 is a major release where this is
> allowed/expected.
Thanks for this work Stephen, I've tested it with OVS DPDK and it resolves the issues as described, if it's to be part of DPDK 19.02 I guess there should be an ABI breakage notification in 18.11?
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v2] doc: announce kvargs API change
2018-11-21 15:45 5% [dpdk-dev] [PATCH] doc: announce kvargs API change Thomas Monjalon
@ 2018-11-22 10:32 5% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-11-22 10:32 UTC (permalink / raw)
To: olivier.matz; +Cc: dev
After processing a kvlist in rte_kvargs_process(),
it may be needed to loop again over kvlist in order to know
whether the key is matched or not.
In order to simplify implementation of kvargs checks,
a new pointer parameter may be used to get the match count.
The change of the function prototype would be as below:
int
rte_kvargs_process(const struct rte_kvargs *kvlist,
const char *key_match,
+ int *match_count,
arg_handler_t handler,
void *opaque_arg)
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
v1: callback for no-match
v2: integer for match count (Olivier suggestion)
---
doc/guides/rel_notes/deprecation.rst | 3 +++
1 file changed, 3 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 34b28234c..dccf7bee6 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,6 +11,9 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
+* kvargs: The function ``rte_kvargs_process`` will get a new parameter
+ for returning key match count. It will ease handling of no-match case.
+
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
--
2.19.0
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [PATCH v1 1/1] doc: announce ethdev ABI change for rte_eth_dev_info.
@ 2018-11-22 12:09 13% Ian Stokes
0 siblings, 0 replies; 200+ results
From: Ian Stokes @ 2018-11-22 12:09 UTC (permalink / raw)
To: dev; +Cc: stephen, arybchenko, Ian Stokes
Maximum and minimum MTU values vary between hardware devices. In
hardware agnostic DPDK applications access to such information would
allow a more accurate way of validating and setting supported MTU values on
a per device basis rather than using a defined default for all devices.
The following solution was proposed:
http://mails.dpdk.org/archives/dev/2018-September/110959.html
This patch adds a depreciation notice for ``rte_eth_dev_info`` as new
members will be added to represent min and max MTU values. These can be
added to fit a hole in the existing structure for amd64 but not for 32 bit,
as such ABI change will occur as size of the structure will be impacted.
Signed-off-by: Ian Stokes <ian.stokes@intel.com>
---
doc/guides/rel_notes/deprecation.rst | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 34b28234c..da2b1ce15 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -49,6 +49,18 @@ Deprecation Notices
Target release for removal of the legacy API will be defined once most
PMDs have switched to rte_flow.
+* ethdev: Maximum and minimum MTU values vary between hardware devices. In
+ hardware agnostic DPDK applications access to such information would allow
+ a more accurate way of validating and setting supported MTU values on a per
+ device basis rather than using a defined default for all devices. To
+ resolve this, the following members will be added to ``rte_eth_dev_info``.
+ Note: these can be added to fit a hole in the existing structure for amd64
+ but not for 32 bit, as such ABI change will occur as size of the structure
+ will increase.
+
+ - Member ``uint16_t min_mtu`` the minimum MTU allowed.
+ - Member ``uint16_t max_mtu`` the maximum MTU allowed.
+
* pdump: As we changed to use generic IPC, some changes in APIs and structure
are expected in subsequent release.
--
2.13.6
^ permalink raw reply [relevance 13%]
Results 4801-5000 of ~18000 next (newer) | prev (older) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2018-03-07 17:44 [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
2018-10-04 15:10 0% ` Thomas Monjalon
2018-10-04 15:28 0% ` Ferruh Yigit
2018-10-04 15:55 0% ` Thomas Monjalon
2018-10-05 9:13 0% ` Bruce Richardson
2018-10-05 10:17 0% ` Ferruh Yigit
2018-10-05 11:30 3% ` Neil Horman
2018-10-05 12:35 0% ` Ferruh Yigit
2018-06-07 12:38 [dpdk-dev] [PATCH 00/22] enable hotplug on multi-process Qi Zhang
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
2018-09-28 4:23 2% ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
2018-10-16 0:16 2% ` [dpdk-dev] [PATCH v17 2/6] eal: " Qi Zhang
2018-08-17 10:51 [dpdk-dev] [PATCH v1 0/5] Enable hotplug in vfio Jeff Guo
2018-08-17 10:51 ` [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device Jeff Guo
2018-09-26 12:22 3% ` Burakov, Anatoly
2018-09-29 6:15 3% ` Jeff Guo
2018-10-01 7:51 3% ` Burakov, Anatoly
2018-08-23 12:08 [dpdk-dev] [PATCH 00/11] introduce telemetry library Ciara Power
2018-10-03 17:36 ` [dpdk-dev] [PATCH v2 00/10] " Kevin Laatz
2018-10-04 13:25 ` Van Haaren, Harry
2018-10-04 15:53 ` Thomas Monjalon
2018-10-09 10:33 3% ` Van Haaren, Harry
2018-10-09 11:41 0% ` Thomas Monjalon
2018-08-24 14:51 [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state Ferruh Yigit
2018-08-27 15:00 ` Andrew Rybchenko
2018-11-20 11:52 3% ` Ferruh Yigit
2018-11-20 12:02 3% ` [dpdk-dev] [PATCH] ethdev: remove unused " Ferruh Yigit
2018-11-20 14:15 0% ` Matan Azrad
2018-11-21 15:20 0% ` Ferruh Yigit
2018-08-24 16:47 [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority Konstantin Ananyev
2018-09-16 9:56 ` Thomas Monjalon
2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
2018-10-03 16:18 0% ` Luca Boccassi
2018-08-24 16:53 [dpdk-dev] [RFC] ipsec: new library for IPsec data-path processing Konstantin Ananyev
2018-10-09 18:23 2% ` [dpdk-dev] [RFC v2 0/9] " Konstantin Ananyev
2018-11-15 23:53 2% ` [dpdk-dev] [PATCH " Konstantin Ananyev
2018-08-24 17:48 [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session Konstantin Ananyev
2018-10-05 11:05 0% ` Ananyev, Konstantin
2018-11-12 21:01 0% ` Trahe, Fiona
2018-11-13 18:56 0% ` Ananyev, Konstantin
2018-08-27 12:24 [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries Anatoly Burakov
2018-09-20 15:41 ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
2018-10-03 22:05 0% ` Thomas Monjalon
2018-10-04 9:17 0% ` Burakov, Anatoly
2018-10-04 9:18 0% ` Thomas Monjalon
2018-10-04 10:46 0% ` Ferruh Yigit
2018-10-05 9:04 0% ` Burakov, Anatoly
2018-08-31 12:50 [dpdk-dev] [PATCH v2 0/5] use IOVAs check based on DMA mask Alejandro Lucero
2018-08-31 12:50 ` [dpdk-dev] [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-03 12:43 3% ` Burakov, Anatoly
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
2018-10-04 12:59 0% ` [dpdk-dev] Fwd: " Alejandro Lucero
2018-08-31 16:51 [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries Qiaobin Fu
2018-09-02 22:05 ` Honnappa Nagarahalli
2018-09-04 19:36 ` Michel Machado
2018-09-05 22:13 ` Honnappa Nagarahalli
2018-09-06 14:28 ` Michel Machado
2018-09-12 20:37 ` Honnappa Nagarahalli
2018-09-20 19:50 0% ` Michel Machado
2018-10-09 19:29 ` [dpdk-dev] [PATCH v4 1/2] hash table: fix a bug in rte_hash_iterate() Qiaobin Fu
2018-10-09 19:29 2% ` [dpdk-dev] [PATCH v4 2/2] hash table: add an iterator over conflicting entries Qiaobin Fu
2018-09-04 10:12 [dpdk-dev] [PATCH v2] ethdev: make default behavior CRC strip on Rx Ferruh Yigit
2018-09-24 17:31 4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
2018-09-24 17:12 0% ` David Marchand
2018-09-05 12:21 [dpdk-dev] [PATCH 1/2] eventdev: fix port id argument in Rx adapter caps API Nikhil Rao
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:50 0% ` Thomas Monjalon
2018-09-25 9:56 0% ` Jerin Jacob
2018-09-25 9:49 4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
2018-09-05 16:41 [dpdk-dev] [RFC] ethdev: add min/max MTU to device info Stephen Hemminger
2018-09-06 6:29 ` Andrew Rybchenko
2018-09-06 10:52 ` Stephen Hemminger
2018-11-22 9:58 3% ` Stokes, Ian
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 22:33 0% ` Honnappa Nagarahalli
2018-10-02 13:17 3% ` Van Haaren, Harry
2018-10-02 23:58 0% ` Wang, Yipeng1
2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
2018-10-04 3:54 0% ` Honnappa Nagarahalli
2018-10-04 19:16 0% ` Wang, Yipeng1
2018-09-30 23:05 0% ` Honnappa Nagarahalli
2018-09-07 22:27 [dpdk-dev] [RFC] eal: simplify parameters of hotplug functions Thomas Monjalon
2018-10-03 23:10 ` [dpdk-dev] [PATCH v5 0/5] eal: simplify devargs and " Thomas Monjalon
2018-10-03 23:10 4% ` [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-04 9:31 4% ` Gaëtan Rivet
2018-10-04 9:48 3% ` Thomas Monjalon
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
2018-10-11 12:10 0% ` Thomas Monjalon
2018-09-17 10:36 [dpdk-dev] [PATCH 00/11] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-09-21 16:13 [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-20 11:36 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-09-21 16:13 16% ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-21 16:13 4% ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-02 9:03 0% ` Burakov, Anatoly
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-27 11:12 0% ` Shreyansh Jain
2018-09-27 11:29 0% ` Burakov, Anatoly
2018-09-29 0:09 0% ` Yongseok Koh
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 13:21 0% ` Burakov, Anatoly
2018-09-27 13:42 0% ` Alejandro Lucero
2018-09-27 14:04 0% ` Burakov, Anatoly
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
2018-09-25 15:25 3% [dpdk-dev] [PATCH v1] doc: remove unused release note file John McNamara
2018-09-26 21:00 [dpdk-dev] [PATCH v2 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions Ori Kam
2018-10-07 12:57 ` [dpdk-dev] [PATCH v3 " Ori Kam
2018-10-09 16:48 ` Ferruh Yigit
2018-10-10 6:45 ` Andrew Rybchenko
2018-10-10 9:00 ` Ori Kam
2018-10-10 12:02 2% ` Adrien Mazarguil
2018-10-10 13:17 0% ` Ori Kam
2018-10-10 16:10 0% ` Adrien Mazarguil
2018-10-11 8:48 0% ` Ori Kam
2018-09-27 11:26 [dpdk-dev] [PATCH] ethdev: add field for device data per process Alejandro Lucero
2018-10-03 20:44 ` Thomas Monjalon
2018-10-05 13:01 ` Ferruh Yigit
2018-10-05 13:17 ` Alejandro Lucero
2018-10-05 13:26 4% ` Ferruh Yigit
2018-10-05 14:47 0% ` Thomas Monjalon
2018-09-28 17:58 [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option Luca Boccassi
2018-10-01 9:17 ` Bruce Richardson
2018-10-01 9:25 ` Bruce Richardson
2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
2018-10-01 11:06 0% ` Bruce Richardson
2018-10-01 11:24 0% ` Luca Boccassi
2018-10-02 11:02 0% ` Marco Varlese
2018-10-02 12:23 0% ` Bruce Richardson
2018-10-02 13:07 0% ` Luca Boccassi
2018-10-02 13:06 3% ` [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY Luca Boccassi
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2018-10-05 12:06 3% [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-10-05 12:06 4% ` [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-05 12:45 3% [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-10 8:56 0% ` Tu, Lijuan
2018-10-11 9:26 0% ` Alejandro Lucero
2018-10-10 21:48 [dpdk-dev] [PATCH v1 0/3] Improvements over rte hash and tests Yipeng Wang
2018-10-10 21:48 ` [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table Yipeng Wang
2018-10-24 20:36 ` Bruce Richardson
2018-10-26 0:23 ` Honnappa Nagarahalli
2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
2018-10-11 14:20 4% [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes Konstantin Ananyev
2018-11-12 12:03 0% ` Akhil Goyal
2018-11-14 0:50 0% ` Trahe, Fiona
2018-11-14 3:15 0% ` Joseph, Anoob
2018-11-14 10:08 0% ` Ananyev, Konstantin
2018-11-14 10:12 0% ` Joseph, Anoob
2018-10-11 19:57 [dpdk-dev] [PATCH 1/2] eal: add API that sleeps while waiting for threads Ferruh Yigit
2018-10-15 22:21 ` [dpdk-dev] [PATCH v2 " Ferruh Yigit
2018-10-16 8:42 3% ` Ananyev, Konstantin
2018-10-15 14:50 [dpdk-dev] [PATCH 1/6] doc: add missing shared library versions to release notes Ferruh Yigit
2018-10-15 14:50 ` [dpdk-dev] [PATCH 6/6] doc: remove internal libs from " Ferruh Yigit
2018-10-16 11:52 ` Shreyansh Jain
2018-10-25 0:07 4% ` Thomas Monjalon
2018-10-18 16:08 [dpdk-dev] [PATCH] doc: show internal functions in doxygen Thomas Monjalon
2018-10-18 16:22 ` Ferruh Yigit
2018-10-18 17:04 ` Thomas Monjalon
2018-10-19 7:39 3% ` Ferruh Yigit
2018-10-22 6:15 0% ` Shreyansh Jain
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
2018-10-26 7:22 0% ` Olivier Matz
2018-10-24 16:09 0% ` Stephen Hemminger
2018-10-24 16:39 0% ` Bruce Richardson
2018-10-26 7:20 0% ` Olivier Matz
2018-10-26 10:15 0% ` Bruce Richardson
2018-10-26 11:28 0% ` Olivier Matz
2018-10-24 18:38 0% ` Stephen Hemminger
2018-10-26 7:56 0% ` Olivier Matz
2018-10-31 2:17 [dpdk-dev] [PATCH v5 0/3] Extend rte_ipv6_frag_get_ipv6_fragment_header() Cody Doucette
2018-10-31 2:17 ` [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval Cody Doucette
2018-10-31 19:56 3% ` Ananyev, Konstantin
2018-11-07 20:21 0% ` Cody Doucette
2018-11-07 23:00 3% ` Ananyev, Konstantin
2018-11-01 4:54 [dpdk-dev] [PATCH 1/3] hash: deprecate lock ellision and read/write concurreny flags Honnappa Nagarahalli
2018-11-01 23:25 ` [dpdk-dev] [PATCH v2 0/4] " Honnappa Nagarahalli
2018-11-02 11:25 ` Bruce Richardson
2018-11-02 17:38 3% ` Honnappa Nagarahalli
2018-11-01 9:53 [dpdk-dev] [PATCH v4 0/2] ring library with c11 memory model bug fix and optimization Gavin Hu
2018-11-01 9:53 ` [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-11-01 17:26 ` Stephen Hemminger
2018-11-02 0:53 ` Gavin Hu (Arm Technology China)
2018-11-02 4:30 ` Honnappa Nagarahalli
2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
2018-11-02 9:36 0% ` Thomas Monjalon
2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
2018-11-01 13:54 [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file Neil Horman
2018-11-01 22:53 3% ` Thomas Monjalon
2018-11-02 11:50 0% ` Neil Horman
2018-11-18 22:25 0% ` Thomas Monjalon
2018-11-05 10:26 [dpdk-dev] [PATCH 0/2] ip_frag: two fixes in reassembly code Konstantin Ananyev
2018-11-05 12:18 ` [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision Konstantin Ananyev
2018-11-06 10:53 3% ` Burakov, Anatoly
2018-11-06 11:41 0% ` Ananyev, Konstantin
2018-11-05 17:09 4% [dpdk-dev] [PATCH] doc: document KNI limitation in release notes Ferruh Yigit
2018-11-05 17:28 4% [dpdk-dev] [PATCH] doc: update release notes for default KNI carries status Ferruh Yigit
2018-11-06 6:15 4% [dpdk-dev] [PATCH] doc/linux_gsg: fix numa lib name error Yong Wang
2018-11-07 2:40 4% [dpdk-dev] [PATCH v2] " Yong Wang
2018-11-09 8:28 3% [dpdk-dev] Which counters are set by rte_eth_stats_get Tom Barbette
2018-11-09 8:38 0% ` Thomas Monjalon
2018-11-09 16:23 0% ` Stephen Hemminger
2018-11-10 9:18 [dpdk-dev] DPDK techboard minutes of October 24 Jerin Jacob
2018-11-12 9:34 ` Burakov, Anatoly
2018-11-12 11:24 ` [dpdk-dev] [dpdk-techboard] " Ananyev, Konstantin
2018-11-12 12:21 ` Richardson, Bruce
2018-11-12 12:36 ` Ananyev, Konstantin
2018-11-12 16:43 3% ` Stephen Hemminger
2018-11-12 16:55 3% ` Thomas Monjalon
2018-11-13 9:33 0% ` Burakov, Anatoly
2018-11-14 11:23 5% [dpdk-dev] [PATCH] doc: security deprecation notice for session changes Konstantin Ananyev
2018-11-14 11:32 0% ` Mohammad Abdul Awal
2018-11-14 12:39 0% ` Akhil Goyal
2018-11-14 13:02 0% ` Ananyev, Konstantin
[not found] <CGME20181116084233eucas1p2ae806fd36b2fa1ea77d1a450facb0922@eucas1p2.samsung.com>
2018-11-16 8:42 ` [dpdk-dev] Direct using of 'rte_eth_devices' in DPDK apps Ilya Maximets
2018-11-16 9:29 ` Thomas Monjalon
2018-11-16 9:51 3% ` Ananyev, Konstantin
2018-11-16 14:16 4% ` Wiles, Keith
2018-11-16 14:19 3% ` Wiles, Keith
2018-11-21 11:45 4% [dpdk-dev] Last call for deprecation notices Ferruh Yigit
2018-11-21 15:45 5% [dpdk-dev] [PATCH] doc: announce kvargs API change Thomas Monjalon
2018-11-22 10:32 5% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2018-11-22 2:55 3% [dpdk-dev] [RFC] pdump: remove deprecated APIs Tiwei Bie
2018-11-22 12:09 13% [dpdk-dev] [PATCH v1 1/1] doc: announce ethdev ABI change for rte_eth_dev_info Ian Stokes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).