DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 10:40 16%   ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
  2018-09-27 11:03  0%     ` Shreyansh Jain
@ 2018-09-29  0:09  0%     ` Yongseok Koh
  1 sibling, 0 replies; 200+ results
From: Yongseok Koh @ 2018-09-29  0:09 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, Thomas Monjalon, alejandro.lucero

On Thu, Sep 27, 2018 at 11:40:59AM +0100, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
> 
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
> 
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
> 
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> 
> Notes:
>     v3:
>     - Add comment to explain the process of picking up minimum
>       page sizes for mempool
>     
>     v2:
>     - Add documentation changes and ABI break
>     
>     v1:
>     - Adjust all calls to memseg walk functions to ignore external
>       segments where it made sense to do so
> 
>  doc/guides/rel_notes/deprecation.rst          | 15 --------
>  doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
>  drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
>  drivers/net/mlx4/mlx4_mr.c                    |  3 ++
>  drivers/net/mlx5/mlx5.c                       |  5 ++-
>  drivers/net/mlx5/mlx5_mr.c                    |  3 ++
>  drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
>  lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
>  lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
>  lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
>  lib/librte_eal/common/eal_common_memory.c     |  3 ++
>  .../common/include/rte_eal_memconfig.h        |  1 +
>  lib/librte_eal/common/include/rte_memory.h    |  9 +++++
>  lib/librte_eal/common/malloc_elem.c           | 10 ++++--
>  lib/librte_eal/common/malloc_heap.c           |  9 +++--
>  lib/librte_eal/common/rte_malloc.c            |  2 +-
>  lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
>  lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
>  lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
>  lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
>  lib/librte_eal/meson.build                    |  2 +-
>  lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
>  test/test/test_malloc.c                       |  3 ++
>  test/test/test_memzone.c                      |  3 ++
>  24 files changed, 134 insertions(+), 44 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 138335dfb..d2aec64d1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
>  
> -* eal: certain structures will change in EAL on account of upcoming external
> -  memory support. Aside from internal changes leading to an ABI break, the
> -  following externally visible changes will also be implemented:
> -
> -  - ``rte_memseg_list`` will change to include a boolean flag indicating
> -    whether a particular memseg list is externally allocated. This will have
> -    implications for any users of memseg-walk-related functions, as they will
> -    now have to skip externally allocated segments in most cases if the intent
> -    is to only iterate over internal DPDK memory.
> -  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> -    as some socket ID's will now be representing externally allocated memory. No
> -    changes will be required for existing code as backwards compatibility will
> -    be kept, and those who do not use this feature will not see these extra
> -    socket ID's.
> -
>  * eal: both declaring and identifying devices will be streamlined in v18.11.
>    New functions will appear to query a specific port from buses, classes of
>    device and device drivers. Device declaration will be made coherent with the
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index bc9b74ec4..5fc71e208 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -91,6 +91,13 @@ API Changes
>    flag the MAC can be properly configured in any case. This is particularly
>    important for bonding.
>  
> +* eal: The following API changes were made in 18.11:
> +
> +  - ``rte_memseg_list`` structure now has an additional flag indicating whether
> +    the memseg list is externally allocated. This will have implications for any
> +    users of memseg-walk-related functions, as they will now have to skip
> +    externally allocated segments in most cases if the intent is to only iterate
> +    over internal DPDK memory.
>  
>  ABI Changes
>  -----------
> @@ -107,6 +114,10 @@ ABI Changes
>     =========================================================
>  
>  
> +* eal: EAL library ABI version was changed due to previously announced work on
> +       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
> +       a new flag indicating whether the memseg list refers to external memory.
> +
>  Removed Items
>  -------------
>  
> @@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
>       librte_compressdev.so.1
>       librte_cryptodev.so.5
>       librte_distributor.so.1
> -     librte_eal.so.8
> +   + librte_eal.so.9
>       librte_ethdev.so.10
>       librte_eventdev.so.4
>       librte_flow_classify.so.1
> diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
> index 4c2cd2a87..2e9244fb7 100644
> --- a/drivers/bus/fslmc/fslmc_vfio.c
> +++ b/drivers/bus/fslmc/fslmc_vfio.c
> @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
>  }
>  
>  static int
> -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
> -		 const struct rte_memseg *ms, void *arg)
> +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
> +		void *arg)
>  {
>  	int *n_segs = arg;
>  	int ret;
>  
> +	if (msl->external)
> +		return 0;
> +
>  	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
>  	if (ret)
>  		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
> diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
> index d23d3c613..9f5d790b6 100644
> --- a/drivers/net/mlx4/mlx4_mr.c
> +++ b/drivers/net/mlx4/mlx4_mr.c
> @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
>  {
>  	struct mr_find_contig_memsegs_data *data = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

Because memory free event for external memory is available, current design of
mlx4/mlx5 memory mgmt can accommodate the new external memory support. So,
please remove it so that PMD can traverse external memory as well.

>  	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
>  		return 0;
>  	/* Found, save it and stop walking. */
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 30d4e70a7..c90e1d8ce 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
>  static void *uar_base;
>  
>  static int
> -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
> +find_lower_va_bound(const struct rte_memseg_list *msl,
>  		const struct rte_memseg *ms, void *arg)
>  {
>  	void **addr = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

This one is fine.
But can you please remove the blank line?
That's a rule by former maintainers. :-)

>  	if (*addr == NULL)
>  		*addr = ms->addr;
>  	else
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 1d1bcb5fe..fd4345f9c 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
>  {
>  	struct mr_find_contig_memsegs_data *data = arg;
>  
> +	if (msl->external)
> +		return 0;
> +

Like I mentioned, please remove it.

If those two changes in mlx4/5_mr.c are removed, for the whole patch,

Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks
Yongseok

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
  2018-09-28  8:26  4%     ` Bruce Richardson
@ 2018-09-28  8:55  4%       ` Van Haaren, Harry
  0 siblings, 0 replies; 200+ results
From: Van Haaren, Harry @ 2018-09-28  8:55 UTC (permalink / raw)
  To: Richardson, Bruce, Wang, Yipeng1
  Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
	steve.capper, ola.liljedahl, nd

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Friday, September 28, 2018 9:26 AM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
> 
> On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> > Reply inlined:
> >
> > >-----Original Message-----
> > >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> > >Sent: Thursday, September 6, 2018 10:12 AM
> > >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> > >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com;
> > >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
> > >
> > >Reader-writer concurrency issue, caused by moving the keys
> > >to their alternative locations during key insert, is solved
> > >by introducing a global counter(tbl_chng_cnt) indicating a
> > >change in table.

<snip>

> > > /**
> > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash
> *h, const void *key,
> > >  *     array of user data. This value is unique for this key.
> > >  */
> > > int32_t
> > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > >
> > > /**
> > >  * Add a key to an existing hash table.
> > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void
> *key);
> > >  *     array of user data. This value is unique for this key.
> > >  */
> > > int32_t
> > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >
> > > /
> >
> > I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
> 
> Just removing a const should not change the ABI, I believe, since the const
> is just advisory hint to the compiler. Actual parameter size and count
> remains unchanged so I don't believe there is an issue.
> [ABI experts, please correct me if I'm wrong on this]


[Certainly no ABI expert, but...]

I think this is an API break, not ABI break.

Given application code as follows, it will fail to compile - even though
running the new code as a .so wouldn't cause any issues (AFAIK).

void do_hash_stuff(const struct rte_hash *h, ...)
{
    /* parameter passed in is const, but updated function prototype is non-const */
    rte_hash_add_key_with_hash(h, ...);
}

This means that we can't recompile apps against latest patch without application
code changes, if the app was passing a const rte_hash struct as the first parameter.


-Harry

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
  2018-09-28  1:00  3%   ` Wang, Yipeng1
@ 2018-09-28  8:26  4%     ` Bruce Richardson
  2018-09-28  8:55  4%       ` Van Haaren, Harry
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-28  8:26 UTC (permalink / raw)
  To: Wang, Yipeng1
  Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
	steve.capper, ola.liljedahl, nd

On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> Reply inlined:
> 
> >-----Original Message-----
> >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> >Sent: Thursday, September 6, 2018 10:12 AM
> >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
> >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
> >
> >Reader-writer concurrency issue, caused by moving the keys
> >to their alternative locations during key insert, is solved
> >by introducing a global counter(tbl_chng_cnt) indicating a
> >change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> > 		curr_bkt = curr_node->bkt;
> > 	}
> >
> >+	/* Inform the previous move. The current move need
> >+	 * not be informed now as the current bucket entry
> >+	 * is present in both primary and secondary.
> >+	 * Since there is one writer, load acquires on
> >+	 * tbl_chng_cnt are not required.
> >+	 */
> >+	__atomic_store_n(&h->tbl_chng_cnt,
> >+			 h->tbl_chng_cnt + 1,
> >+			 __ATOMIC_RELEASE);
> >+	/* The stores to sig_alt and sig_current should not
> >+	 * move above the store to tbl_chng_cnt.
> >+	 */
> >+	__atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
> we need macros for the compile time check.
> 
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> > 	uint32_t bucket_idx;
> > 	hash_sig_t alt_hash;
> > 	struct rte_hash_bucket *bkt;
> >+	uint32_t cnt_b, cnt_a;
> > 	int ret;
> >
> >-	bucket_idx = sig & h->bucket_bitmask;
> >-	bkt = &h->buckets[bucket_idx];
> >-
> > 	__hash_rw_reader_lock(h);
> >
> >-	/* Check if key is in primary location */
> >-	ret = search_one_bucket(h, key, sig, data, bkt);
> >-	if (ret != -1) {
> >-		__hash_rw_reader_unlock(h);
> >-		return ret;
> >-	}
> >-	/* Calculate secondary hash */
> >-	alt_hash = rte_hash_secondary_hash(sig);
> >-	bucket_idx = alt_hash & h->bucket_bitmask;
> >-	bkt = &h->buckets[bucket_idx];
> >+	do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
> MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
> but one difference is reader checks even/odd of the version counter to make sure there is no
> concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
> 
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> >  *   - -ENOSPC if there is no space in the hash for this key.
> >  */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
> >
> > /**
> >  * Add a key-value pair with a pre-computed hash value
> >@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> >  *   - -ENOSPC if there is no space in the hash for this key.
> >  */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > 						hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> >  *     array of user data. This value is unique for this key.
> >  */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> >  * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> >  *     array of user data. This value is unique for this key.
> >  */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> >
> > /
> 
> I think the above changes will break ABI by changing the parameter type? Other people may know better on this.

Just removing a const should not change the ABI, I believe, since the const
is just advisory hint to the compiler. Actual parameter size and count
remains unchanged so I don't believe there is an issue. 
[ABI experts, please correct me if I'm wrong on this]

/Bruce

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v16 2/6] eal: enable hotplug on multi-process
  2018-09-28  4:23  1% ` [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process Qi Zhang
@ 2018-09-28  4:23  2%   ` Qi Zhang
  0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-09-28  4:23 UTC (permalink / raw)
  To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.

IPC scenario for Case 1, 2:

attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst  |  11 ++
 lib/librte_eal/bsdapp/eal/Makefile      |   1 +
 lib/librte_eal/common/eal_common_dev.c  | 225 ++++++++++++++++++++++++++++++--
 lib/librte_eal/common/eal_private.h     |  30 +++++
 lib/librte_eal/common/hotplug_mp.c      | 221 +++++++++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h      |  46 +++++++
 lib/librte_eal/common/include/rte_dev.h |   9 ++
 lib/librte_eal/common/meson.build       |   1 +
 lib/librte_eal/linuxapp/eal/Makefile    |   1 +
 lib/librte_eal/linuxapp/eal/eal.c       |   6 +
 10 files changed, 542 insertions(+), 9 deletions(-)
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..f88910c7f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,12 @@ New Features
   SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
   vdev_netvsc, tap, and failsafe drivers combination.
 
+* **Support device multi-process hotplug.**
+
+  Hotplug and hot-unplug for devices will now be supported in multiprocessing
+  scenario. Any ethdev devices created in the primary process will be regarded
+  as shared and will be available for all DPDK processes. Synchronization
+  between processes will be done using DPDK IPC.
 
 API Changes
 -----------
@@ -91,6 +97,11 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+  In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+  that device be attached on all processes, while ``rte_eal_hotplug_remove``
+  will guarantee device be detached on all processes.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..4351c6a20 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -62,6 +62,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += hotplug_mp.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_mp.c
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 85eb1569f..314266041 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -19,8 +19,10 @@
 #include <rte_log.h>
 #include <rte_spinlock.h>
 #include <rte_malloc.h>
+#include <rte_string_fns.h>
 
 #include "eal_private.h"
+#include "hotplug_mp.h"
 
 /**
  * The device event callback description.
@@ -127,9 +129,10 @@ int rte_eal_dev_detach(struct rte_device *dev)
 	return ret;
 }
 
-int
-rte_eal_hotplug_add(const char *busname, const char *devname,
-                    const char *drvargs)
+/* help funciton to build devargs, caller should free the memory */
+static char *
+build_devargs(const char *busname, const char *devname,
+	      const char *drvargs)
 {
 	char *devargs = NULL;
 	int size, length = -1;
@@ -140,19 +143,33 @@ rte_eal_hotplug_add(const char *busname, const char *devname,
 		if (length >= size)
 			devargs = malloc(length + 1);
 		if (devargs == NULL)
-			return -ENOMEM;
+			break;
 	} while (size == 0);
 
+	return devargs;
+}
+
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+		    const char *drvargs)
+{
+	char *devargs = build_devargs(busname, devname, drvargs);
+
+	if (devargs == NULL)
+		return -ENOMEM;
+
 	return rte_dev_probe(devargs);
 }
 
-int __rte_experimental
-rte_dev_probe(const char *devargs)
+/* probe device at local process. */
+int
+local_dev_probe(const char *devargs, struct rte_device **new_dev)
 {
 	struct rte_device *dev;
 	struct rte_devargs *da;
 	int ret;
 
+	*new_dev = NULL;
 	da = calloc(1, sizeof(*da));
 	if (da == NULL)
 		return -ENOMEM;
@@ -195,6 +212,8 @@ rte_dev_probe(const char *devargs)
 			dev->name);
 		goto err_devarg;
 	}
+
+	*new_dev = dev;
 	return 0;
 
 err_devarg:
@@ -226,8 +245,9 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
 	return rte_dev_remove(dev);
 }
 
-int __rte_experimental
-rte_dev_remove(struct rte_device *dev)
+/* remove device at local process. */
+int
+local_dev_remove(struct rte_device *dev)
 {
 	struct rte_bus *bus;
 	int ret;
@@ -248,7 +268,194 @@ rte_dev_remove(struct rte_device *dev)
 	if (ret)
 		RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
 			dev->name);
-	rte_devargs_remove(dev->devargs);
+	else
+		rte_devargs_remove(dev->devargs);
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_probe(const char *devargs)
+{
+	struct eal_dev_mp_req req;
+	struct rte_device *dev;
+	int ret;
+
+	memset(&req, 0, sizeof(req));
+	req.t = EAL_DEV_REQ_TYPE_ATTACH;
+	strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		/**
+		 * If in secondary process, just send IPC request to
+		 * primary process.
+		 */
+		ret = eal_dev_hotplug_request_to_primary(&req);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"Failed to send hotplug request to primary\n");
+			return -ENOMSG;
+		}
+		if (req.result)
+			RTE_LOG(ERR, EAL,
+				"Failed to hotplug add device\n");
+		return req.result;
+	}
+
+	/* attach a shared device from primary start from here: */
+
+	/* primary attach the new device itself. */
+	ret = local_dev_probe(devargs, &dev);
+
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Failed to attach device on primary process\n");
+
+		/**
+		 * it is possible that secondary process failed to attached a
+		 * device that primary process have during initialization,
+		 * so for -EEXIST case, we still need to sync with secondary
+		 * process.
+		 */
+		if (ret != -EEXIST)
+			return ret;
+	}
+
+	/* primary send attach sync request to secondary. */
+	ret = eal_dev_hotplug_request_to_secondary(&req);
+
+	/* if any commnunication error, we need to rollback. */
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Failed to send hotplug add request to secondary\n");
+		ret = -ENOMSG;
+		goto rollback;
+	}
+
+	/**
+	 * if any secondary failed to attach, we need to consider if rollback
+	 * is necessary.
+	 */
+	if (req.result) {
+		RTE_LOG(ERR, EAL,
+			"Failed to attach device on secondary process\n");
+		ret = req.result;
+
+		/* for -EEXIST, we don't need to rollback. */
+		if (ret == -EEXIST)
+			return ret;
+		goto rollback;
+	}
+
+	return 0;
+
+rollback:
+	req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
+
+	/* primary send rollback request to secondary. */
+	if (eal_dev_hotplug_request_to_secondary(&req))
+		RTE_LOG(WARNING, EAL,
+			"Failed to rollback device attach on secondary."
+			"Devices in secondary may not sync with primary\n");
+
+	/* primary rollback itself. */
+	if (local_dev_remove(dev))
+		RTE_LOG(WARNING, EAL,
+			"Failed to rollback device attach on primary."
+			"Devices in secondary may not sync with primary\n");
+
+	return ret;
+}
+
+int __rte_experimental
+rte_dev_remove(struct rte_device *dev)
+{
+	struct eal_dev_mp_req req;
+	char *devargs;
+	int ret;
+
+	devargs = build_devargs(dev->devargs->bus->name, dev->name, "");
+	if (devargs == NULL)
+		return -ENOMEM;
+
+	memset(&req, 0, sizeof(req));
+	req.t = EAL_DEV_REQ_TYPE_DETACH;
+	strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+	free(devargs);
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+		/**
+		 * If in secondary process, just send IPC request to
+		 * primary process.
+		 */
+		ret = eal_dev_hotplug_request_to_primary(&req);
+		if (ret) {
+			RTE_LOG(ERR, EAL,
+				"Failed to send hotplug request to primary\n");
+			return -ENOMSG;
+		}
+		if (req.result)
+			RTE_LOG(ERR, EAL,
+				"Failed to hotplug remove device\n");
+		return req.result;
+	}
+
+	/* detach a device from primary start from here: */
+
+	/* primary send detach sync request to secondary */
+	ret = eal_dev_hotplug_request_to_secondary(&req);
+
+	/**
+	 * if communication error, we need to rollback, because it is possible
+	 * part of the secondary processes still detached it successfully.
+	 */
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Failed to send device detach request to secondary\n");
+		ret = -ENOMSG;
+		goto rollback;
+	}
+
+	/**
+	 * if any secondary failed to detach, we need to consider if rollback
+	 * is necessary.
+	 */
+	if (req.result) {
+		RTE_LOG(ERR, EAL,
+			"Failed to detach device on secondary process\n");
+		ret = req.result;
+		/**
+		 * if -ENOENT, we don't need to rollback, since devices is
+		 * already detached on secondary process.
+		 */
+		if (ret != -ENOENT)
+			goto rollback;
+	}
+
+	/* primary detach the device itself. */
+	ret = local_dev_remove(dev);
+
+	/* if primary failed, still need to consider if rollback is necessary */
+	if (ret) {
+		RTE_LOG(ERR, EAL,
+			"Failed to detach device on primary process\n");
+		/* if -ENOENT, we don't need to rollback */
+		if (ret == -ENOENT)
+			return ret;
+		goto rollback;
+	}
+
+	return 0;
+
+rollback:
+	req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
+
+	/* primary send rollback request to secondary. */
+	if (eal_dev_hotplug_request_to_secondary(&req))
+		RTE_LOG(WARNING, EAL,
+			"Failed to rollback device detach on secondary."
+			"Devices in secondary may not sync with primary\n");
+
 	return ret;
 }
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a83c..83f10a9f8 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,34 @@ int
 rte_devargs_layers_parse(struct rte_devargs *devargs,
 			 const char *devstr);
 
+/*
+ * probe a device at local process.
+ *
+ * @param devargs
+ *   Device arguments including bus, class and driver properties.
+ * @param new_dev
+ *   new device be probed as output.
+ * @return
+ *   0 on success, negative on error.
+ */
+int local_dev_probe(const char *devargs, struct rte_device **new_dev);
+
+/**
+ * Hotplug remove a given device from a specific bus at local process.
+ *
+ * @param dev
+ *   Data structure of the device to remove.
+ * @return
+ *   0 on success, negative on error.
+ */
+int local_dev_remove(struct rte_device *dev);
+
+/**
+ * Register all mp action callbacks for hotplug.
+ *
+ * @return
+ *   0 on success, negative on error.
+ */
+int rte_dev_hotplug_mp_init(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
new file mode 100644
index 000000000..1c92e44cb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <string.h>
+
+#include <rte_eal.h>
+#include <rte_alarm.h>
+#include <rte_string_fns.h>
+#include <rte_devargs.h>
+
+#include "hotplug_mp.h"
+#include "eal_private.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+{
+	const char *name = _name;
+
+	return strcmp(dev->name, name);
+}
+
+struct mp_reply_bundle {
+	struct rte_mp_msg msg;
+	void *peer;
+};
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+	RTE_SET_USED(msg);
+	RTE_SET_USED(peer);
+	return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+	struct mp_reply_bundle *bundle = param;
+	struct rte_mp_msg *msg = &bundle->msg;
+	const struct eal_dev_mp_req *req =
+		(const struct eal_dev_mp_req *)msg->param;
+	struct rte_mp_msg mp_resp;
+	struct eal_dev_mp_req *resp =
+		(struct eal_dev_mp_req *)mp_resp.param;
+	struct rte_devargs *da;
+	struct rte_device *dev;
+	struct rte_bus *bus;
+	int ret = 0;
+
+	memset(&mp_resp, 0, sizeof(mp_resp));
+
+	switch (req->t) {
+	case EAL_DEV_REQ_TYPE_ATTACH:
+	case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
+		ret = local_dev_probe(req->devargs, &dev);
+		break;
+	case EAL_DEV_REQ_TYPE_DETACH:
+	case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
+		da = calloc(1, sizeof(*da));
+		if (da == NULL) {
+			ret = -ENOMEM;
+			goto quit;
+		}
+
+		ret = rte_devargs_parse(da, req->devargs);
+		if (ret)
+			goto quit;
+
+		bus = rte_bus_find_by_name(da->bus->name);
+		if (bus == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", da->bus->name);
+			ret = -ENOENT;
+			goto quit;
+		}
+
+		dev = bus->find_device(NULL, cmp_dev_name, da->name);
+		if (dev == NULL) {
+			RTE_LOG(ERR, EAL, "Cannot find plugged device (%s)\n", da->name);
+			ret = -ENOENT;
+			goto quit;
+		}
+
+		ret = local_dev_remove(dev);
+quit:
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+	mp_resp.len_param = sizeof(*req);
+	memcpy(resp, req, sizeof(*resp));
+	resp->result = ret;
+	if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+		RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+
+	free(bundle->peer);
+	free(bundle);
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+	struct rte_mp_msg mp_resp;
+	const struct eal_dev_mp_req *req =
+		(const struct eal_dev_mp_req *)msg->param;
+	struct eal_dev_mp_req *resp =
+		(struct eal_dev_mp_req *)mp_resp.param;
+	struct mp_reply_bundle *bundle;
+	int ret = 0;
+
+	memset(&mp_resp, 0, sizeof(mp_resp));
+	strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+	mp_resp.len_param = sizeof(*req);
+	memcpy(resp, req, sizeof(*resp));
+
+	bundle = calloc(1, sizeof(*bundle));
+	if (bundle == NULL) {
+		resp->result = -ENOMEM;
+		ret = rte_mp_reply(&mp_resp, peer);
+		if (ret)
+			RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+		return ret;
+	}
+
+	bundle->msg = *msg;
+	/**
+	 * We need to send reply on interrupt thread, but peer can't be
+	 * parsed directly, so this is a temporal hack, need to be fixed
+	 * when it is ready.
+	 */
+	bundle->peer = (void *)strdup(peer);
+
+	/**
+	 * We are at IPC callback thread, sync IPC is not allowed due to
+	 * dead lock, so we delegate the task to interrupt thread.
+	 */
+	ret = rte_eal_alarm_set(1, __handle_primary_request, bundle);
+	if (ret) {
+		resp->result = ret;
+		ret = rte_mp_reply(&mp_resp, peer);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+			return ret;
+		}
+	}
+	return 0;
+}
+
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req)
+{
+	RTE_SET_USED(req);
+	return -ENOTSUP;
+}
+
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
+{
+	struct rte_mp_msg mp_req;
+	struct rte_mp_reply mp_reply;
+	struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+	int ret;
+	int i;
+
+	memset(&mp_req, 0, sizeof(mp_req));
+	memcpy(mp_req.param, req, sizeof(*req));
+	mp_req.len_param = sizeof(*req);
+	strlcpy(mp_req.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+	ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+	if (ret) {
+		RTE_LOG(ERR, EAL, "rte_mp_request_sync failed\n");
+		return ret;
+	}
+
+	if (mp_reply.nb_sent != mp_reply.nb_received) {
+		RTE_LOG(ERR, EAL, "not all secondary reply\n");
+		return -1;
+	}
+
+	req->result = 0;
+	for (i = 0; i < mp_reply.nb_received; i++) {
+		struct eal_dev_mp_req *resp =
+			(struct eal_dev_mp_req *)mp_reply.msgs[i].param;
+		if (resp->result) {
+			req->result = resp->result;
+			if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+				req->result != -EEXIST)
+				break;
+			if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
+				req->result != -ENOENT)
+				break;
+		}
+	}
+
+	return 0;
+}
+
+int rte_dev_hotplug_mp_init(void)
+{
+	int ret;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+					handle_secondary_request);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+				EAL_DEV_MP_ACTION_REQUEST);
+			return ret;
+		}
+	} else {
+		ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+					handle_primary_request);
+		if (ret) {
+			RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+				EAL_DEV_MP_ACTION_REQUEST);
+			return ret;
+		}
+	}
+
+	return 0;
+}
diff --git a/lib/librte_eal/common/hotplug_mp.h b/lib/librte_eal/common/hotplug_mp.h
new file mode 100644
index 000000000..c95c8f1fb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _HOTPLUG_MP_H_
+#define _HOTPLUG_MP_H_
+
+#include <rte_dev.h>
+#include <rte_bus.h>
+
+#define EAL_DEV_MP_ACTION_REQUEST      "eal_dev_mp_request"
+#define EAL_DEV_MP_ACTION_RESPONSE     "eal_dev_mp_response"
+
+#define EAL_DEV_MP_DEV_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
+#define EAL_DEV_MP_BUS_NAME_MAX_LEN 32
+#define EAL_DEV_MP_DEV_ARGS_MAX_LEN 128
+
+enum eal_dev_req_type {
+	EAL_DEV_REQ_TYPE_ATTACH,
+	EAL_DEV_REQ_TYPE_DETACH,
+	EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK,
+	EAL_DEV_REQ_TYPE_DETACH_ROLLBACK,
+};
+
+struct eal_dev_mp_req {
+	enum eal_dev_req_type t;
+	char devargs[EAL_DEV_MP_DEV_ARGS_MAX_LEN];
+	int result;
+};
+
+/**
+ * this is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request issued from primary process.
+ */
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req);
+
+
+#endif /* _HOTPLUG_MP_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 7a30362c0..266331acd 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -190,6 +190,9 @@ int rte_eal_dev_detach(struct rte_device *dev);
 
 /**
  * Hotplug add a given device to a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug add the same device. Any failure on other process rollback
+ * the action.
  *
  * @param busname
  *   The bus name the device is added to.
@@ -219,6 +222,9 @@ int __rte_experimental rte_dev_probe(const char *devargs);
 
 /**
  * Hotplug remove a given device from a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
  *
  * @param busname
  *   The bus name the device is removed from.
@@ -234,6 +240,9 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
  * @b EXPERIMENTAL: this API may change without prior notice
  *
  * Remove one device.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
  *
  * @param dev
  *   Data structure of the device to remove.
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index b7fc98499..04c414356 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -28,6 +28,7 @@ common_sources = files(
 	'eal_common_thread.c',
 	'eal_common_timer.c',
 	'eal_common_uuid.c',
+	'hotplug_mp.c',
 	'malloc_elem.c',
 	'malloc_heap.c',
 	'malloc_mp.c',
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..58455c1a6 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -70,6 +70,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_proc.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_fbarray.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_uuid.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_elem.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_heap.c
 SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_mp.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..f2c90c528 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -865,6 +865,12 @@ rte_eal_init(int argc, char **argv)
 		}
 	}
 
+	/* register mp action callbacks for hotplug */
+	if (rte_dev_hotplug_mp_init() < 0) {
+		rte_eal_init_alert("failed to register mp callback for hotplug\n");
+		return -1;
+	}
+
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices\n");
 		rte_errno = ENODEV;
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process
                     ` (7 preceding siblings ...)
  2018-08-16  3:04  1% ` [dpdk-dev] [PATCH v15 0/7] enable hotplug on multi-process Qi Zhang
@ 2018-09-28  4:23  1% ` Qi Zhang
  2018-09-28  4:23  2%   ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-09-28  4:23 UTC (permalink / raw)
  To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v16:
- rebase to patch "simplify parameters of hotplug functions"
  http://patchwork.dpdk.org/patch/45463/ include:
  * keep rte_eal_hotplug_add/rte_eal_hotplug_move unchanged.
  * the IPC sync logic is moved to rte_dev_probe/rte_dev_remove. 
  * simplify the IPC message by removing busname and devname from
    eal_dev_mp_req, since devargs string will encode those information
    already.
- combined release notes with related code changes.
- replace do_ prefix to local_ for local process only probe/remove function.
- improve comments 

v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.

v14:
- rebase.
- All changes belongs to patch 1/6.
  1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
     since it is only used by secondary process.
  2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
     I think its better to call rte_eth_dev_release_port_secondary after
     dev_uninit since it is possible that secondary process need to release
     some local resources in dev_uninit before release the port and return.
     Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
     because there is no special handle in all exist dev_uninit for secondary
     process.
  3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
     general step, so we don't need patches for i40e and ixgbe.
  4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
  when detaching a device. It is possible that device is not synced during
  some situation, so attach an exist device in primary still need to sync
  with secondary. Also, it's not necessary to rollback if we fail to
  attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11: - move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (6):
  ethdev: add function to release port in secondary process
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  drivers/net: enable hotplug on secondary process
  drivers/net: enable device detach on secondary
  examples/multi_process: add hotplug sample

 doc/guides/rel_notes/release_18_11.rst       |  11 +
 drivers/net/af_packet/rte_eth_af_packet.c    |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c               |   6 +-
 drivers/net/bonding/rte_eth_bond_pmd.c       |   6 +-
 drivers/net/ena/ena_ethdev.c                 |   2 +-
 drivers/net/kni/rte_eth_kni.c                |   6 +-
 drivers/net/liquidio/lio_ethdev.c            |   2 +-
 drivers/net/null/rte_eth_null.c              |   6 +-
 drivers/net/octeontx/octeontx_ethdev.c       |   8 +
 drivers/net/pcap/rte_eth_pcap.c              |   6 +-
 drivers/net/tap/rte_eth_tap.c                |   8 +-
 drivers/net/vhost/rte_eth_vhost.c            |   6 +-
 drivers/net/virtio/virtio_ethdev.c           |   2 +-
 examples/multi_process/Makefile              |   1 +
 examples/multi_process/hotplug_mp/Makefile   |  23 ++
 examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++
 examples/multi_process/hotplug_mp/commands.h |  10 +
 examples/multi_process/hotplug_mp/main.c     |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile           |   1 +
 lib/librte_eal/common/eal_common_dev.c       | 225 +++++++++++++-
 lib/librte_eal/common/eal_private.h          |  30 ++
 lib/librte_eal/common/hotplug_mp.c           | 426 +++++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h           |  46 +++
 lib/librte_eal/common/include/rte_dev.h      |   9 +
 lib/librte_eal/common/meson.build            |   1 +
 lib/librte_eal/linuxapp/eal/Makefile         |   1 +
 lib/librte_eal/linuxapp/eal/eal.c            |   6 +
 lib/librte_ethdev/rte_ethdev.c               |  17 +-
 lib/librte_ethdev/rte_ethdev_driver.h        |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h           |  10 +-
 lib/librte_ethdev/rte_ethdev_version.map     |   7 +
 31 files changed, 1126 insertions(+), 33 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
  @ 2018-09-28  1:00  3%   ` Wang, Yipeng1
  2018-09-28  8:26  4%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Wang, Yipeng1 @ 2018-09-28  1:00 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd

Reply inlined:

>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>Reader-writer concurrency issue, caused by moving the keys
>to their alternative locations during key insert, is solved
>by introducing a global counter(tbl_chng_cnt) indicating a
>change in table.
>
>@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> 		curr_bkt = curr_node->bkt;
> 	}
>
>+	/* Inform the previous move. The current move need
>+	 * not be informed now as the current bucket entry
>+	 * is present in both primary and secondary.
>+	 * Since there is one writer, load acquires on
>+	 * tbl_chng_cnt are not required.
>+	 */
>+	__atomic_store_n(&h->tbl_chng_cnt,
>+			 h->tbl_chng_cnt + 1,
>+			 __ATOMIC_RELEASE);
>+	/* The stores to sig_alt and sig_current should not
>+	 * move above the store to tbl_chng_cnt.
>+	 */
>+	__atomic_thread_fence(__ATOMIC_RELEASE);
>+
[Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
we need macros for the compile time check.

>@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> 	uint32_t bucket_idx;
> 	hash_sig_t alt_hash;
> 	struct rte_hash_bucket *bkt;
>+	uint32_t cnt_b, cnt_a;
> 	int ret;
>
>-	bucket_idx = sig & h->bucket_bitmask;
>-	bkt = &h->buckets[bucket_idx];
>-
> 	__hash_rw_reader_lock(h);
>
>-	/* Check if key is in primary location */
>-	ret = search_one_bucket(h, key, sig, data, bkt);
>-	if (ret != -1) {
>-		__hash_rw_reader_unlock(h);
>-		return ret;
>-	}
>-	/* Calculate secondary hash */
>-	alt_hash = rte_hash_secondary_hash(sig);
>-	bucket_idx = alt_hash & h->bucket_bitmask;
>-	bkt = &h->buckets[bucket_idx];
>+	do {
[Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
MemCache with Dumber Caching and Smarter Hashing"
as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
but one difference is reader checks even/odd of the version counter to make sure there is no
concurrent writer. Could you just double check and confirm that this is not needed for your implementation?

>--- a/lib/librte_hash/rte_hash.h
>+++ b/lib/librte_hash/rte_hash.h
>@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
>  *   - -ENOSPC if there is no space in the hash for this key.
>  */
> int
>-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
>+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
>
> /**
>  * Add a key-value pair with a pre-computed hash value
>@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
>  *   - -ENOSPC if there is no space in the hash for this key.
>  */
> int32_t
>-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
>+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> 						hash_sig_t sig, void *data);
>
> /**
>@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
>  *     array of user data. This value is unique for this key.
>  */
> int32_t
>-rte_hash_add_key(const struct rte_hash *h, const void *key);
>+rte_hash_add_key(struct rte_hash *h, const void *key);
>
> /**
>  * Add a key to an existing hash table.
>@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
>  *     array of user data. This value is unique for this key.
>  */
> int32_t
>-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
>
> /

I think the above changes will break ABI by changing the parameter type? Other people may know better on this.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:42  0%         ` Alejandro Lucero
@ 2018-09-27 14:04  0%           ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 14:04 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On 27-Sep-18 2:42 PM, Alejandro Lucero wrote:
> 
> 
> On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
>      > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov
>     <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>>
>      > wrote:
>      >
>      >> We will be assigning "invalid" socket ID's to external heap, and
>      >> malloc will now be able to verify if a supplied socket ID is in
>      >> fact a valid one, rendering parameter checks for sockets
>      >> obsolete.
>      >>
>      >> This changes the semantics of what we understand by "socket ID",
>      >> so document the change in the release notes.
>      >>
>      >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com
>     <mailto:anatoly.burakov@intel.com>>
>      >> ---
>      >>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>      >>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>      >>   lib/librte_eal/common/malloc_heap.c        | 2 +-
>      >>   lib/librte_eal/common/rte_malloc.c         | 4 ----
>      >>   4 files changed, 13 insertions(+), 8 deletions(-)
>      >>
>      >> diff --git a/doc/guides/rel_notes/release_18_11.rst
>      >> b/doc/guides/rel_notes/release_18_11.rst
>      >> index 5fc71e208..6ee236302 100644
>      >> --- a/doc/guides/rel_notes/release_18_11.rst
>      >> +++ b/doc/guides/rel_notes/release_18_11.rst
>      >> @@ -98,6 +98,13 @@ API Changes
>      >>       users of memseg-walk-related functions, as they will now
>     have to skip
>      >>       externally allocated segments in most cases if the intent
>     is to only
>      >> iterate
>      >>       over internal DPDK memory.
>      >> +  - ``socket_id`` parameter across the entire DPDK has gained
>     additional
>      >> +    meaning, as some socket ID's will now be representing
>     externally
>      >> allocated
>      >> +    memory. No changes will be required for existing code as
>     backwards
>      >> +    compatibility will be kept, and those who do not use this
>     feature
>      >> will not
>      >> +    see these extra socket ID's. Any new API's must not check
>     socket ID
>      >> +    parameters themselves, and must instead leave it to the memory
>      >> subsystem to
>      >> +    decide whether socket ID is a valid one.
>      >>
>      >>   ABI Changes
>      >>   -----------
>      >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>      >> b/lib/librte_eal/common/eal_common_memzone.c
>      >> index 7300fe05d..b7081afbf 100644
>      >> --- a/lib/librte_eal/common/eal_common_memzone.c
>      >> +++ b/lib/librte_eal/common/eal_common_memzone.c
>      >> @@ -120,13 +120,15 @@
>     memzone_reserve_aligned_thread_unsafe(const char
>      >> *name, size_t len,
>      >>                  return NULL;
>      >>          }
>      >>
>      >> -       if ((socket_id != SOCKET_ID_ANY) &&
>      >> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>      >> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>      >>
>      >
>      > Should not it be better to use RTE_MAX_HEAP instead of removing
>     the check?
> 
>     First of all, maximum number of heaps should not concern the rest of
>     the
>     code - this is purely internal detail of rte_malloc.
> 
> 
> In a previous patch you say that:
> 
> "Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list. Heap ID for
> external heaps will be order of their creation."
> 
> If I understand this right, heaps linked to physical sockets get a heap 
> ID, and then external heaps will get IDs starting from the higher 
> socket/heap ID + 1.

Yes and no.

Socket ID is an externally visible identification of "where to allocate 
from" (a heap). Heap ID is used internally. Normally, there is a 1:1 
correspondence of NUMA node to heap ID, but there may be cases where 
e.g. only NUMA nodes 0 and 7 are detected, so you'll have socket 0 and 7 
as valid socket ID's. However, these socket ID's will be internally 
resolved into heap ID's 0 and 1, not 0 and 7.

So, in *most* cases, socket ID for an internal heap is equivalent to its 
heap ID, but it is by accident. Heap ID is an internal identifier used 
by the malloc heap, and it is not visible externally - it is only known 
to malloc itself. Even memzone knows nothing about heap ID's - only 
socket ID's.

> So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps 
> (which does not seem so reading your next paragraph), it would be a good 
> sanity check to use RTE_MAX_HEAPS for the socket id.
> 
>     More importantly, socket ID is completely independent from number of
>     heaps. Socket ID is incremented each time a new heap is created, and
>     they are not reused. If you create and destroy a heap 100 times -
>     you'll
>     get 100 different socket ID's, even though max number of heaps is less
>     than that.
> 
> 
> I do not understand this. It is true there is no check regarding 
> RTE_MAX_HEAPS when creating new heaps,

There is one :) RTE_MAX_HEAPS is length of malloc heaps array (shared in 
memory). If we cannot find a vacant spot in heaps array, the heap will 
not be created.

However, *socket ID* is indeed limited only to INT_MAX. Socket ID is not 
heap ID - socket ID is an externally visible identifier. Multiple socket 
ID's can resolve to the same heap ID.

For example, if you create and destroy a heap 5 times one after the 
other, you'll get 5 different socket ID's, but all of them would have 
pointed to the same heap ID (but not at the same time).

So, semantically speaking, heap ID isn't really "an ID" as such, it's an 
index into heap array. Unlike socket ID, it has no meaning.

> then nor sure what the limit 
> refers to. And then there is code like dumping heaps info or getting 
> info from the heap based on socket id that will not work.

It is probably unclear because the ordering of this patchset is not 
ideal (and i'm not sure how to make it any better).

The code for dumping or getting heap info's accepts socket ID, but it 
translates it into heap ID, because that's what malloc uses internally 
to differentiate between the heaps. Heap ID is there to break dependency 
between NUMA node ID and position in the malloc heap array.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:21  0%       ` Burakov, Anatoly
@ 2018-09-27 13:42  0%         ` Alejandro Lucero
  2018-09-27 14:04  0%           ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:42 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <
> anatoly.burakov@intel.com>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
> >>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >>   lib/librte_eal/common/malloc_heap.c        | 2 +-
> >>   lib/librte_eal/common/rte_malloc.c         | 4 ----
> >>   4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >>       users of memseg-walk-related functions, as they will now have to
> skip
> >>       externally allocated segments in most cases if the intent is to
> only
> >> iterate
> >>       over internal DPDK memory.
> >> +  - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> +    meaning, as some socket ID's will now be representing externally
> >> allocated
> >> +    memory. No changes will be required for existing code as backwards
> >> +    compatibility will be kept, and those who do not use this feature
> >> will not
> >> +    see these extra socket ID's. Any new API's must not check socket ID
> >> +    parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> +    decide whether socket ID is a valid one.
> >>
> >>   ABI Changes
> >>   -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >>                  return NULL;
> >>          }
> >>
> >> -       if ((socket_id != SOCKET_ID_ANY) &&
> >> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing the
> check?
>
> First of all, maximum number of heaps should not concern the rest of the
> code - this is purely internal detail of rte_malloc.
>
>
In a previous patch you say that:

"Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation."

If I understand this right, heaps linked to physical sockets get a heap ID,
and then external heaps will get IDs starting from the higher socket/heap
ID + 1.
So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
(which does not seem so reading your next paragraph), it would be a good
sanity check to use RTE_MAX_HEAPS for the socket id.

More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times - you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
I do not understand this. It is true there is no check regarding
RTE_MAX_HEAPS when creating new heaps, then nor sure what the limit refers
to. And then there is code like dumping heaps info or getting info from the
heap based on socket id that will not work.


> >
> >
> >
> >>                  rte_errno = EINVAL;
> >>                  return NULL;
> >>          }
> >>
> >> -       if (!rte_eal_has_hugepages())
> >> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for
> an
> >> +        * external heap.
> >> +        */
> >> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> >>                  socket_id = SOCKET_ID_ANY;
> >>
> >>          contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> >> diff --git a/lib/librte_eal/common/malloc_heap.c
> >> b/lib/librte_eal/common/malloc_heap.c
> >> index 1d1e35708..73e478076 100644
> >> --- a/lib/librte_eal/common/malloc_heap.c
> >> +++ b/lib/librte_eal/common/malloc_heap.c
> >> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> >> socket_arg,
> >>          if (size == 0 || (align && !rte_is_power_of_2(align)))
> >>                  return NULL;
> >>
> >> -       if (!rte_eal_has_hugepages())
> >> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> >>                  socket_arg = SOCKET_ID_ANY;
> >>
> >>          if (socket_arg == SOCKET_ID_ANY)
> >> diff --git a/lib/librte_eal/common/rte_malloc.c
> >> b/lib/librte_eal/common/rte_malloc.c
> >> index 73d6df31d..9ba1472c3 100644
> >> --- a/lib/librte_eal/common/rte_malloc.c
> >> +++ b/lib/librte_eal/common/rte_malloc.c
> >> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> >> unsigned int align,
> >>          if (!rte_eal_has_hugepages())
> >>                  socket_arg = SOCKET_ID_ANY;
> >>
> >> -       /* Check socket parameter */
> >> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
> >> -               return NULL;
> >> -
> >>
> >
> > Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
>
> same as above :)
>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 13:14  0%     ` Alejandro Lucero
@ 2018-09-27 13:21  0%       ` Burakov, Anatoly
  2018-09-27 13:42  0%         ` Alejandro Lucero
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 13:21 UTC (permalink / raw)
  To: Alejandro Lucero
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
> 
>> We will be assigning "invalid" socket ID's to external heap, and
>> malloc will now be able to verify if a supplied socket ID is in
>> fact a valid one, rendering parameter checks for sockets
>> obsolete.
>>
>> This changes the semantics of what we understand by "socket ID",
>> so document the change in the release notes.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>   doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>>   lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>>   lib/librte_eal/common/malloc_heap.c        | 2 +-
>>   lib/librte_eal/common/rte_malloc.c         | 4 ----
>>   4 files changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_11.rst
>> b/doc/guides/rel_notes/release_18_11.rst
>> index 5fc71e208..6ee236302 100644
>> --- a/doc/guides/rel_notes/release_18_11.rst
>> +++ b/doc/guides/rel_notes/release_18_11.rst
>> @@ -98,6 +98,13 @@ API Changes
>>       users of memseg-walk-related functions, as they will now have to skip
>>       externally allocated segments in most cases if the intent is to only
>> iterate
>>       over internal DPDK memory.
>> +  - ``socket_id`` parameter across the entire DPDK has gained additional
>> +    meaning, as some socket ID's will now be representing externally
>> allocated
>> +    memory. No changes will be required for existing code as backwards
>> +    compatibility will be kept, and those who do not use this feature
>> will not
>> +    see these extra socket ID's. Any new API's must not check socket ID
>> +    parameters themselves, and must instead leave it to the memory
>> subsystem to
>> +    decide whether socket ID is a valid one.
>>
>>   ABI Changes
>>   -----------
>> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>> b/lib/librte_eal/common/eal_common_memzone.c
>> index 7300fe05d..b7081afbf 100644
>> --- a/lib/librte_eal/common/eal_common_memzone.c
>> +++ b/lib/librte_eal/common/eal_common_memzone.c
>> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
>> *name, size_t len,
>>                  return NULL;
>>          }
>>
>> -       if ((socket_id != SOCKET_ID_ANY) &&
>> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>>
> 
> Should not it be better to use RTE_MAX_HEAP instead of removing the check?

First of all, maximum number of heaps should not concern the rest of the 
code - this is purely internal detail of rte_malloc.

More importantly, socket ID is completely independent from number of 
heaps. Socket ID is incremented each time a new heap is created, and 
they are not reused. If you create and destroy a heap 100 times - you'll 
get 100 different socket ID's, even though max number of heaps is less 
than that.

> 
> 
> 
>>                  rte_errno = EINVAL;
>>                  return NULL;
>>          }
>>
>> -       if (!rte_eal_has_hugepages())
>> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
>> +        * external heap.
>> +        */
>> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>>                  socket_id = SOCKET_ID_ANY;
>>
>>          contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
>> diff --git a/lib/librte_eal/common/malloc_heap.c
>> b/lib/librte_eal/common/malloc_heap.c
>> index 1d1e35708..73e478076 100644
>> --- a/lib/librte_eal/common/malloc_heap.c
>> +++ b/lib/librte_eal/common/malloc_heap.c
>> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
>> socket_arg,
>>          if (size == 0 || (align && !rte_is_power_of_2(align)))
>>                  return NULL;
>>
>> -       if (!rte_eal_has_hugepages())
>> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>>                  socket_arg = SOCKET_ID_ANY;
>>
>>          if (socket_arg == SOCKET_ID_ANY)
>> diff --git a/lib/librte_eal/common/rte_malloc.c
>> b/lib/librte_eal/common/rte_malloc.c
>> index 73d6df31d..9ba1472c3 100644
>> --- a/lib/librte_eal/common/rte_malloc.c
>> +++ b/lib/librte_eal/common/rte_malloc.c
>> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
>> unsigned int align,
>>          if (!rte_eal_has_hugepages())
>>                  socket_arg = SOCKET_ID_ANY;
>>
>> -       /* Check socket parameter */
>> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
>> -               return NULL;
>> -
>>
> 
> Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.

same as above :)


-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 13:14  0%     ` Alejandro Lucero
  2018-09-27 13:21  0%       ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:14 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
	Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko

On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:

> We will be assigning "invalid" socket ID's to external heap, and
> malloc will now be able to verify if a supplied socket ID is in
> fact a valid one, rendering parameter checks for sockets
> obsolete.
>
> This changes the semantics of what we understand by "socket ID",
> so document the change in the release notes.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
>  lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>  lib/librte_eal/common/malloc_heap.c        | 2 +-
>  lib/librte_eal/common/rte_malloc.c         | 4 ----
>  4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 5fc71e208..6ee236302 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -98,6 +98,13 @@ API Changes
>      users of memseg-walk-related functions, as they will now have to skip
>      externally allocated segments in most cases if the intent is to only
> iterate
>      over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK has gained additional
> +    meaning, as some socket ID's will now be representing externally
> allocated
> +    memory. No changes will be required for existing code as backwards
> +    compatibility will be kept, and those who do not use this feature
> will not
> +    see these extra socket ID's. Any new API's must not check socket ID
> +    parameters themselves, and must instead leave it to the memory
> subsystem to
> +    decide whether socket ID is a valid one.
>
>  ABI Changes
>  -----------
> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> b/lib/librte_eal/common/eal_common_memzone.c
> index 7300fe05d..b7081afbf 100644
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> *name, size_t len,
>                 return NULL;
>         }
>
> -       if ((socket_id != SOCKET_ID_ANY) &&
> -           (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> +       if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>

Should not it be better to use RTE_MAX_HEAP instead of removing the check?



>                 rte_errno = EINVAL;
>                 return NULL;
>         }
>
> -       if (!rte_eal_has_hugepages())
> +       /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
> +        * external heap.
> +        */
> +       if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>                 socket_id = SOCKET_ID_ANY;
>
>         contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 1d1e35708..73e478076 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
>         if (size == 0 || (align && !rte_is_power_of_2(align)))
>                 return NULL;
>
> -       if (!rte_eal_has_hugepages())
> +       if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>                 socket_arg = SOCKET_ID_ANY;
>
>         if (socket_arg == SOCKET_ID_ANY)
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 73d6df31d..9ba1472c3 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> unsigned int align,
>         if (!rte_eal_has_hugepages())
>                 socket_arg = SOCKET_ID_ANY;
>
> -       /* Check socket parameter */
> -       if (socket_arg >= RTE_MAX_NUMA_NODES)
> -               return NULL;
> -
>

Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.


>         return malloc_heap_alloc(type, size, socket_arg, 0,
>                         align == 0 ? 1 : align, 0, false);
>  }
> --
> 2.17.1
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:12  0%         ` Shreyansh Jain
@ 2018-09-27 11:29  0%           ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:29 UTC (permalink / raw)
  To: Shreyansh Jain, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On 27-Sep-18 12:12 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
>> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>>> When we allocate and use DPDK memory, we need to be able to
>>>> differentiate between DPDK hugepage segments and segments that
>>>> were made part of DPDK but are externally allocated. Add such
>>>> a property to memseg lists.
>>>>
>>>> This breaks the ABI, so bump the EAL library ABI version and
>>>> document the change in release notes. This also breaks a few
>>>> internal assumptions about memory contiguousness, so adjust
>>>> malloc code in a few places.
>>>>
>>>> All current calls for memseg walk functions were adjusted to
>>>> ignore external segments where it made sense.
>>>>
>>>> Mempools is a special case, because we may be asked to allocate
>>>> a mempool on a specific socket, and we need to ignore all page
>>>> sizes on other heaps or other sockets. Previously, this
>>>> assumption of knowing all page sizes was not a problem, but it
>>>> will be now, so we have to match socket ID with page size when
>>>> calculating minimum page size for a mempool.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>
>>>
>>> Specifically for bus/fslmc perspective and generically for others:
>>>
>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>>
>>>
>>
>> Actually, this patch may need some further adjustment, since it makes 
>> assumption about not wanting to map external memory for DMA.
>>
>> Specifically - there's an fslmc dma map function that now skips 
>> external memory segments. Are you sure that's how it's supposed to be?
>>
> 
> I thought over that.
> For now yes. If we need to map external memory, and there is an event 
> that would be called back, it should be handled separately. So, for 
> example, a PMD level API to handle such requests from applications.

Well, technically such an event is already available, now that external 
memory allocations trigger mem events :)

> 
> The point is that how the external memory is handled is use-case 
> specific - the need to have its events reported back is definitely 
> there, but its handling is still a grey area.
> 
> Once the patches make their way in, I can always come back and tune that.
> 

OK, fair enough.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:08  0%       ` Burakov, Anatoly
@ 2018-09-27 11:12  0%         ` Shreyansh Jain
  2018-09-27 11:29  0%           ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:12 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>> When we allocate and use DPDK memory, we need to be able to
>>> differentiate between DPDK hugepage segments and segments that
>>> were made part of DPDK but are externally allocated. Add such
>>> a property to memseg lists.
>>>
>>> This breaks the ABI, so bump the EAL library ABI version and
>>> document the change in release notes. This also breaks a few
>>> internal assumptions about memory contiguousness, so adjust
>>> malloc code in a few places.
>>>
>>> All current calls for memseg walk functions were adjusted to
>>> ignore external segments where it made sense.
>>>
>>> Mempools is a special case, because we may be asked to allocate
>>> a mempool on a specific socket, and we need to ignore all page
>>> sizes on other heaps or other sockets. Previously, this
>>> assumption of knowing all page sizes was not a problem, but it
>>> will be now, so we have to match socket ID with page size when
>>> calculating minimum page size for a mempool.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> ---
>>>
>>
>> Specifically for bus/fslmc perspective and generically for others:
>>
>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>>
> 
> Actually, this patch may need some further adjustment, since it makes 
> assumption about not wanting to map external memory for DMA.
> 
> Specifically - there's an fslmc dma map function that now skips external 
> memory segments. Are you sure that's how it's supposed to be?
> 

I thought over that.
For now yes. If we need to map external memory, and there is an event 
that would be called back, it should be handled separately. So, for 
example, a PMD level API to handle such requests from applications.

The point is that how the external memory is handled is use-case 
specific - the need to have its events reported back is definitely 
there, but its handling is still a grey area.

Once the patches make their way in, I can always come back and tune that.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 11:03  0%     ` Shreyansh Jain
@ 2018-09-27 11:08  0%       ` Burakov, Anatoly
  2018-09-27 11:12  0%         ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:08 UTC (permalink / raw)
  To: Shreyansh Jain, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes. This also breaks a few
>> internal assumptions about memory contiguousness, so adjust
>> malloc code in a few places.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>
> 
> Specifically for bus/fslmc perspective and generically for others:
> 
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> 
> 

Actually, this patch may need some further adjustment, since it makes 
assumption about not wanting to map external memory for DMA.

Specifically - there's an fslmc dma map function that now skips external 
memory segments. Are you sure that's how it's supposed to be?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-27 10:40 16%   ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 11:03  0%     ` Shreyansh Jain
  2018-09-27 11:08  0%       ` Burakov, Anatoly
  2018-09-29  0:09  0%     ` Yongseok Koh
  1 sibling, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:03 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
	Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
	Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
	andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
	geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
	keith.wiles, thomas, alejandro.lucero

On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
> 
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
> 
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
> 
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> 

Specifically for bus/fslmc perspective and generically for others:

Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
  2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
  2018-09-27 10:40  2%   ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-09-27 10:40 16%   ` Anatoly Burakov
  2018-09-27 11:03  0%     ` Shreyansh Jain
  2018-09-29  0:09  0%     ` Yongseok Koh
  2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas, alejandro.lucero

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,10 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+       a new flag indicating whether the memseg list refers to external memory.
+
 Removed Items
 -------------
 
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps
  2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
                     ` (2 preceding siblings ...)
  2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 10:41  9%   ` Anatoly Burakov
  2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
  4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  1 +
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
 * eal: EAL library ABI version was changed due to previously announced work on
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
+       Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps
  2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
                     ` (3 preceding siblings ...)
  2018-09-27 10:41  9%   ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-27 10:41  4%   ` Anatoly Burakov
  4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
        Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+       Structure ``rte_eal_memconfig`` has been extended to contain next socket
+       ID for externally allocated memory segments.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
  2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
  2018-09-27 10:40  2%   ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
  2018-09-27 10:40 16%   ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 10:41  4%   ` Anatoly Burakov
  2018-09-27 13:14  0%     ` Alejandro Lucero
  2018-09-27 10:41  9%   ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
  2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
  4 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko, alejandro.lucero

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v6 00/21] Support externally allocated memory in DPDK
  2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-27 10:40  2%   ` Anatoly Burakov
  2018-09-27 10:40 16%   ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 305 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  28 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  14 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |   8 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 316 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 47 files changed, 1913 insertions(+), 139 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0
  2018-09-26 18:04  2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
@ 2018-09-26 18:04  2%   ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: thomas, Hemant Agrawal

From: Hemant Agrawal <hemant.agrawal@nxp.com>

This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.

The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.

This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 drivers/bus/fslmc/Makefile                  |   2 +-
 drivers/bus/fslmc/mc/dpbp.c                 |  10 +
 drivers/bus/fslmc/mc/dpci.c                 | 197 ++++++++++++++++++++
 drivers/bus/fslmc/mc/dpcon.c                |  30 +++
 drivers/bus/fslmc/mc/dpdmai.c               |  14 ++
 drivers/bus/fslmc/mc/dpio.c                 |   9 +
 drivers/bus/fslmc/mc/fsl_dpbp.h             |   1 +
 drivers/bus/fslmc/mc/fsl_dpbp_cmd.h         |  16 +-
 drivers/bus/fslmc/mc/fsl_dpci.h             |  47 ++++-
 drivers/bus/fslmc/mc/fsl_dpci_cmd.h         |  62 +++++-
 drivers/bus/fslmc/mc/fsl_dpcon.h            |  19 ++
 drivers/bus/fslmc/mc/fsl_dpdmai.h           |   5 +
 drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h       |  20 +-
 drivers/bus/fslmc/mc/fsl_dpmng.h            |   2 +-
 drivers/bus/fslmc/mc/fsl_dpopr.h            |  85 +++++++++
 drivers/bus/fslmc/meson.build               |   2 +
 drivers/bus/fslmc/rte_bus_fslmc_version.map |  10 +
 drivers/crypto/dpaa2_sec/Makefile           |   2 +-
 drivers/crypto/dpaa2_sec/meson.build        |   2 +
 drivers/event/dpaa2/Makefile                |   2 +-
 drivers/event/dpaa2/meson.build             |   2 +
 drivers/mempool/dpaa2/Makefile              |   2 +-
 drivers/mempool/dpaa2/meson.build           |   2 +
 drivers/net/dpaa2/Makefile                  |   2 +-
 drivers/net/dpaa2/meson.build               |   2 +
 drivers/raw/dpaa2_cmdif/Makefile            |   2 +-
 drivers/raw/dpaa2_cmdif/meson.build         |   2 +
 drivers/raw/dpaa2_qdma/Makefile             |   2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c         |  14 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.h         |   6 +-
 drivers/raw/dpaa2_qdma/meson.build          |   2 +
 31 files changed, 541 insertions(+), 34 deletions(-)
 create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h

diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 515d0f534..e95551980 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_ethdev
 EXPORT_MAP := rte_bus_fslmc_version.map
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
         qbman/qbman_portal.c \
diff --git a/drivers/bus/fslmc/mc/dpbp.c b/drivers/bus/fslmc/mc/dpbp.c
index 0215d22da..d9103409c 100644
--- a/drivers/bus/fslmc/mc/dpbp.c
+++ b/drivers/bus/fslmc/mc/dpbp.c
@@ -248,6 +248,16 @@ int dpbp_reset(struct fsl_mc_io *mc_io,
 	/* send command to mc*/
 	return mc_send_command(mc_io, &cmd);
 }
+/**
+ * dpbp_get_attributes - Retrieve DPBP attributes.
+ *
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPBP object
+ * @attr:	Returned object's attributes
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
 int dpbp_get_attributes(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
diff --git a/drivers/bus/fslmc/mc/dpci.c b/drivers/bus/fslmc/mc/dpci.c
index ff366bfa9..95edae9d9 100644
--- a/drivers/bus/fslmc/mc/dpci.c
+++ b/drivers/bus/fslmc/mc/dpci.c
@@ -265,6 +265,15 @@ int dpci_reset(struct fsl_mc_io *mc_io,
 	return mc_send_command(mc_io, &cmd);
 }
 
+/**
+ * dpci_get_attributes() - Retrieve DPCI attributes.
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @attr:	Returned object's attributes
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
 int dpci_get_attributes(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
@@ -292,6 +301,94 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
 	return 0;
 }
 
+/**
+ * dpci_get_peer_attributes() - Retrieve peer DPCI attributes.
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @attr:	Returned peer attributes
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+			     uint32_t cmd_flags,
+			     uint16_t token,
+			     struct dpci_peer_attr *attr)
+{
+	struct dpci_rsp_get_peer_attr *rsp_params;
+	struct mc_command cmd = { 0 };
+	int err;
+
+	/* prepare command */
+	cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_PEER_ATTR,
+					  cmd_flags,
+					  token);
+
+	/* send command to mc*/
+	err = mc_send_command(mc_io, &cmd);
+	if (err)
+		return err;
+
+	/* retrieve response parameters */
+	rsp_params = (struct dpci_rsp_get_peer_attr *)cmd.params;
+	attr->peer_id = le32_to_cpu(rsp_params->id);
+	attr->num_of_priorities = rsp_params->num_of_priorities;
+
+	return 0;
+}
+
+/**
+ * dpci_get_link_state() - Retrieve the DPCI link state.
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @up:		Returned link state; returns '1' if link is up, '0' otherwise
+ *
+ * DPCI can be connected to another DPCI, together they
+ * create a 'link'. In order to use the DPCI Tx and Rx queues,
+ * both objects must be enabled.
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+			uint32_t cmd_flags,
+			uint16_t token,
+			int *up)
+{
+	struct dpci_rsp_get_link_state *rsp_params;
+	struct mc_command cmd = { 0 };
+	int err;
+
+	/* prepare command */
+	cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_LINK_STATE,
+					  cmd_flags,
+					  token);
+
+	/* send command to mc*/
+	err = mc_send_command(mc_io, &cmd);
+	if (err)
+		return err;
+
+	/* retrieve response parameters */
+	rsp_params = (struct dpci_rsp_get_link_state *)cmd.params;
+	*up = dpci_get_field(rsp_params->up, UP);
+
+	return 0;
+}
+
+/**
+ * dpci_set_rx_queue() - Set Rx queue configuration
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @priority:	Select the queue relative to number of
+ *			priorities configured at DPCI creation; use
+ *			DPCI_ALL_QUEUES to configure all Rx queues
+ *			identically.
+ * @cfg:	Rx queue configuration
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
 int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
 		      uint32_t cmd_flags,
 		      uint16_t token,
@@ -314,6 +411,9 @@ int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
 	dpci_set_field(cmd_params->dest_type,
 		       DEST_TYPE,
 		       cfg->dest_cfg.dest_type);
+	dpci_set_field(cmd_params->dest_type,
+		       ORDER_PRESERVATION,
+		       cfg->order_preservation_en);
 
 	/* send command to mc*/
 	return mc_send_command(mc_io, &cmd);
@@ -438,3 +538,100 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
 
 	return 0;
 }
+
+/**
+ * dpci_set_opr() - Set Order Restoration configuration.
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @index:	The queue index
+ * @options:	Configuration mode options
+ *		can be OPR_OPT_CREATE or OPR_OPT_RETIRE
+ * @cfg:	Configuration options for the OPR
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+		 uint32_t cmd_flags,
+		 uint16_t token,
+		 uint8_t index,
+		 uint8_t options,
+		 struct opr_cfg *cfg)
+{
+	struct dpci_cmd_set_opr *cmd_params;
+	struct mc_command cmd = { 0 };
+
+	/* prepare command */
+	cmd.header = mc_encode_cmd_header(DPCI_CMDID_SET_OPR,
+					  cmd_flags,
+					  token);
+	cmd_params = (struct dpci_cmd_set_opr *)cmd.params;
+	cmd_params->index = index;
+	cmd_params->options = options;
+	cmd_params->oloe = cfg->oloe;
+	cmd_params->oeane = cfg->oeane;
+	cmd_params->olws = cfg->olws;
+	cmd_params->oa = cfg->oa;
+	cmd_params->oprrws = cfg->oprrws;
+
+	/* send command to mc*/
+	return mc_send_command(mc_io, &cmd);
+}
+
+/**
+ * dpci_get_opr() - Retrieve Order Restoration config and query.
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCI object
+ * @index:	The queue index
+ * @cfg:	Returned OPR configuration
+ * @qry:	Returned OPR query
+ *
+ * Return:	'0' on Success; Error code otherwise.
+ */
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+		 uint32_t cmd_flags,
+		 uint16_t token,
+		 uint8_t index,
+		 struct opr_cfg *cfg,
+		 struct opr_qry *qry)
+{
+	struct dpci_rsp_get_opr *rsp_params;
+	struct dpci_cmd_get_opr *cmd_params;
+	struct mc_command cmd = { 0 };
+	int err;
+
+	/* prepare command */
+	cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_OPR,
+					  cmd_flags,
+					  token);
+	cmd_params = (struct dpci_cmd_get_opr *)cmd.params;
+	cmd_params->index = index;
+
+	/* send command to mc*/
+	err = mc_send_command(mc_io, &cmd);
+	if (err)
+		return err;
+
+	/* retrieve response parameters */
+	rsp_params = (struct dpci_rsp_get_opr *)cmd.params;
+	cfg->oloe = rsp_params->oloe;
+	cfg->oeane = rsp_params->oeane;
+	cfg->olws = rsp_params->olws;
+	cfg->oa = rsp_params->oa;
+	cfg->oprrws = rsp_params->oprrws;
+	qry->rip = dpci_get_field(rsp_params->flags, RIP);
+	qry->enable = dpci_get_field(rsp_params->flags, OPR_ENABLE);
+	qry->nesn = le16_to_cpu(rsp_params->nesn);
+	qry->ndsn = le16_to_cpu(rsp_params->ndsn);
+	qry->ea_tseq = le16_to_cpu(rsp_params->ea_tseq);
+	qry->tseq_nlis = dpci_get_field(rsp_params->tseq_nlis, TSEQ_NLIS);
+	qry->ea_hseq = le16_to_cpu(rsp_params->ea_hseq);
+	qry->hseq_nlis = dpci_get_field(rsp_params->hseq_nlis, HSEQ_NLIS);
+	qry->ea_hptr = le16_to_cpu(rsp_params->ea_hptr);
+	qry->ea_tptr = le16_to_cpu(rsp_params->ea_tptr);
+	qry->opr_vid = le16_to_cpu(rsp_params->opr_vid);
+	qry->opr_id = le16_to_cpu(rsp_params->opr_id);
+
+	return 0;
+}
diff --git a/drivers/bus/fslmc/mc/dpcon.c b/drivers/bus/fslmc/mc/dpcon.c
index 3f6e04b97..92bd26512 100644
--- a/drivers/bus/fslmc/mc/dpcon.c
+++ b/drivers/bus/fslmc/mc/dpcon.c
@@ -295,6 +295,36 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
 	return 0;
 }
 
+/**
+ * dpcon_set_notification() - Set DPCON notification destination
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPCON object
+ * @cfg:	Notification parameters
+ *
+ * Return:	'0' on Success; Error code otherwise
+ */
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+			   uint32_t cmd_flags,
+			   uint16_t token,
+			   struct dpcon_notification_cfg *cfg)
+{
+	struct dpcon_cmd_set_notification *dpcon_cmd;
+	struct mc_command cmd = { 0 };
+
+	/* prepare command */
+	cmd.header = mc_encode_cmd_header(DPCON_CMDID_SET_NOTIFICATION,
+					  cmd_flags,
+					  token);
+	dpcon_cmd = (struct dpcon_cmd_set_notification *)cmd.params;
+	dpcon_cmd->dpio_id = cpu_to_le32(cfg->dpio_id);
+	dpcon_cmd->priority = cfg->priority;
+	dpcon_cmd->user_ctx = cpu_to_le64(cfg->user_ctx);
+
+	/* send command to mc*/
+	return mc_send_command(mc_io, &cmd);
+}
+
 /**
  * dpcon_get_api_version - Get Data Path Concentrator API version
  * @mc_io:	Pointer to MC portal's DPCON object
diff --git a/drivers/bus/fslmc/mc/dpdmai.c b/drivers/bus/fslmc/mc/dpdmai.c
index 528889df3..dcb9d516a 100644
--- a/drivers/bus/fslmc/mc/dpdmai.c
+++ b/drivers/bus/fslmc/mc/dpdmai.c
@@ -113,6 +113,7 @@ int dpdmai_create(struct fsl_mc_io *mc_io,
 					  cmd_flags,
 					  dprc_token);
 	cmd_params = (struct dpdmai_cmd_create *)cmd.params;
+	cmd_params->num_queues = cfg->num_queues;
 	cmd_params->priorities[0] = cfg->priorities[0];
 	cmd_params->priorities[1] = cfg->priorities[1];
 
@@ -297,6 +298,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
 	rsp_params = (struct dpdmai_rsp_get_attr *)cmd.params;
 	attr->id = le32_to_cpu(rsp_params->id);
 	attr->num_of_priorities = rsp_params->num_of_priorities;
+	attr->num_of_queues = rsp_params->num_of_queues;
 
 	return 0;
 }
@@ -306,6 +308,8 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
  * @mc_io:	Pointer to MC portal's I/O object
  * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
  * @token:	Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ *		parameter provided in dpdmai_create
  * @priority:	Select the queue relative to number of
  *		priorities configured at DPDMAI creation; use
  *		DPDMAI_ALL_QUEUES to configure all Rx queues
@@ -317,6 +321,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
 int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			const struct dpdmai_rx_queue_cfg *cfg)
 {
@@ -331,6 +336,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
 	cmd_params->dest_id = cpu_to_le32(cfg->dest_cfg.dest_id);
 	cmd_params->dest_priority = cfg->dest_cfg.priority;
 	cmd_params->priority = priority;
+	cmd_params->queue_idx = queue_idx;
 	cmd_params->user_ctx = cpu_to_le64(cfg->user_ctx);
 	cmd_params->options = cpu_to_le32(cfg->options);
 	dpdmai_set_field(cmd_params->dest_type,
@@ -346,6 +352,8 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
  * @mc_io:	Pointer to MC portal's I/O object
  * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
  * @token:	Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ *		parameter provided in dpdmai_create
  * @priority:	Select the queue relative to number of
  *		priorities configured at DPDMAI creation
  * @attr:	Returned Rx queue attributes
@@ -355,6 +363,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
 int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			struct dpdmai_rx_queue_attr *attr)
 {
@@ -369,6 +378,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
 					  token);
 	cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
 	cmd_params->priority = priority;
+	cmd_params->queue_idx = queue_idx;
 
 	/* send command to mc*/
 	err = mc_send_command(mc_io, &cmd);
@@ -392,6 +402,8 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
  * @mc_io:	Pointer to MC portal's I/O object
  * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
  * @token:	Token of DPDMAI object
+ * @queue_idx: Tx queue index. Accepted values are form 0 to num_queues
+ *		parameter provided in dpdmai_create
  * @priority:	Select the queue relative to number of
  *		priorities configured at DPDMAI creation
  * @attr:	Returned Tx queue attributes
@@ -401,6 +413,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
 int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			struct dpdmai_tx_queue_attr *attr)
 {
@@ -415,6 +428,7 @@ int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
 					  token);
 	cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
 	cmd_params->priority = priority;
+	cmd_params->queue_idx = queue_idx;
 
 	/* send command to mc*/
 	err = mc_send_command(mc_io, &cmd);
diff --git a/drivers/bus/fslmc/mc/dpio.c b/drivers/bus/fslmc/mc/dpio.c
index 966277cc6..a3382ed14 100644
--- a/drivers/bus/fslmc/mc/dpio.c
+++ b/drivers/bus/fslmc/mc/dpio.c
@@ -268,6 +268,15 @@ int dpio_reset(struct fsl_mc_io *mc_io,
 	return mc_send_command(mc_io, &cmd);
 }
 
+/**
+ * dpio_get_attributes() - Retrieve DPIO attributes
+ * @mc_io:	Pointer to MC portal's I/O object
+ * @cmd_flags:	Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token:	Token of DPIO object
+ * @attr:	Returned object's attributes
+ *
+ * Return:	'0' on Success; Error code otherwise
+ */
 int dpio_get_attributes(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp.h b/drivers/bus/fslmc/mc/fsl_dpbp.h
index 111836261..9d405b42c 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp.h
@@ -82,6 +82,7 @@ int dpbp_get_attributes(struct fsl_mc_io *mc_io,
 /**
  * BPSCN write will attempt to allocate into a cache (coherent write)
  */
+#define DPBP_NOTIF_OPT_COHERENT_WRITE	0x00000001
 int dpbp_get_api_version(struct fsl_mc_io *mc_io,
 			 uint32_t cmd_flags,
 			 uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
index 18402cedf..55c9fc9b4 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
@@ -9,13 +9,15 @@
 
 /* DPBP Version */
 #define DPBP_VER_MAJOR				3
-#define DPBP_VER_MINOR				3
+#define DPBP_VER_MINOR				4
 
 /* Command versioning */
 #define DPBP_CMD_BASE_VERSION			1
+#define DPBP_CMD_VERSION_2			2
 #define DPBP_CMD_ID_OFFSET			4
 
 #define DPBP_CMD(id)	((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_BASE_VERSION)
+#define DPBP_CMD_V2(id)	((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_VERSION_2)
 
 /* Command IDs */
 #define DPBP_CMDID_CLOSE		DPBP_CMD(0x800)
@@ -37,8 +39,8 @@
 #define DPBP_CMDID_GET_IRQ_STATUS	DPBP_CMD(0x016)
 #define DPBP_CMDID_CLEAR_IRQ_STATUS	DPBP_CMD(0x017)
 
-#define DPBP_CMDID_SET_NOTIFICATIONS	DPBP_CMD(0x1b0)
-#define DPBP_CMDID_GET_NOTIFICATIONS	DPBP_CMD(0x1b1)
+#define DPBP_CMDID_SET_NOTIFICATIONS	DPBP_CMD_V2(0x1b0)
+#define DPBP_CMDID_GET_NOTIFICATIONS	DPBP_CMD_V2(0x1b1)
 
 #define DPBP_CMDID_GET_FREE_BUFFERS_NUM	DPBP_CMD(0x1b2)
 
@@ -68,8 +70,8 @@ struct dpbp_cmd_set_notifications {
 	uint32_t depletion_exit;
 	uint32_t surplus_entry;
 	uint32_t surplus_exit;
-	uint16_t options;
-	uint16_t pad[3];
+	uint32_t options;
+	uint16_t pad[2];
 	uint64_t message_ctx;
 	uint64_t message_iova;
 };
@@ -79,8 +81,8 @@ struct dpbp_rsp_get_notifications {
 	uint32_t depletion_exit;
 	uint32_t surplus_entry;
 	uint32_t surplus_exit;
-	uint16_t options;
-	uint16_t pad[3];
+	uint32_t options;
+	uint16_t pad[2];
 	uint64_t message_ctx;
 	uint64_t message_iova;
 };
diff --git a/drivers/bus/fslmc/mc/fsl_dpci.h b/drivers/bus/fslmc/mc/fsl_dpci.h
index f69ed3f33..9af9097e5 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci.h
@@ -6,6 +6,8 @@
 #ifndef __FSL_DPCI_H
 #define __FSL_DPCI_H
 
+#include <fsl_dpopr.h>
+
 /* Data Path Communication Interface API
  * Contains initialization APIs and runtime control APIs for DPCI
  */
@@ -17,7 +19,7 @@ struct fsl_mc_io;
 /**
  * Maximum number of Tx/Rx priorities per DPCI object
  */
-#define DPCI_PRIO_NUM		2
+#define DPCI_PRIO_NUM		4
 
 /**
  * Indicates an invalid frame queue
@@ -106,6 +108,27 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
 			uint16_t token,
 			struct dpci_attr *attr);
 
+/**
+ * struct dpci_peer_attr - Structure representing the peer DPCI attributes
+ * @peer_id:		DPCI peer id; if no peer is connected returns (-1)
+ * @num_of_priorities:	The pper's number of receive priorities; determines the
+ *			number of transmit priorities for the local DPCI object
+ */
+struct dpci_peer_attr {
+	int peer_id;
+	uint8_t num_of_priorities;
+};
+
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+			     uint32_t cmd_flags,
+			     uint16_t token,
+			     struct dpci_peer_attr *attr);
+
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+			uint32_t cmd_flags,
+			uint16_t token,
+			int *up);
+
 /**
  * enum dpci_dest - DPCI destination types
  * @DPCI_DEST_NONE:	Unassigned destination; The queue is set in parked mode
@@ -153,6 +176,11 @@ struct dpci_dest_cfg {
  */
 #define DPCI_QUEUE_OPT_DEST		0x00000002
 
+/**
+ * Set the queue to hold active mode.
+ */
+#define DPCI_QUEUE_OPT_HOLD_ACTIVE	0x00000004
+
 /**
  * struct dpci_rx_queue_cfg - Structure representing RX queue configuration
  * @options:	Flags representing the suggested modifications to the queue;
@@ -163,11 +191,14 @@ struct dpci_dest_cfg {
  *		'options'
  * @dest_cfg:	Queue destination parameters;
  *		valid only if 'DPCI_QUEUE_OPT_DEST' is contained in 'options'
+ * @order_preservation_en: order preservation configuration for the rx queue
+ * valid only if 'DPCI_QUEUE_OPT_HOLD_ACTIVE' is contained in 'options'
  */
 struct dpci_rx_queue_cfg {
 	uint32_t options;
 	uint64_t user_ctx;
 	struct dpci_dest_cfg dest_cfg;
+	int order_preservation_en;
 };
 
 int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
@@ -217,4 +248,18 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
 			 uint16_t *major_ver,
 			 uint16_t *minor_ver);
 
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+		 uint32_t cmd_flags,
+		 uint16_t token,
+		 uint8_t index,
+		 uint8_t options,
+		 struct opr_cfg *cfg);
+
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+		 uint32_t cmd_flags,
+		 uint16_t token,
+		 uint8_t index,
+		 struct opr_cfg *cfg,
+		 struct opr_qry *qry);
+
 #endif /* __FSL_DPCI_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
index 634248ac0..92b85a820 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
@@ -8,7 +8,7 @@
 
 /* DPCI Version */
 #define DPCI_VER_MAJOR			3
-#define DPCI_VER_MINOR			3
+#define DPCI_VER_MINOR			4
 
 #define DPCI_CMD_BASE_VERSION		1
 #define DPCI_CMD_BASE_VERSION_V2	2
@@ -35,6 +35,8 @@
 #define DPCI_CMDID_GET_PEER_ATTR	DPCI_CMD_V1(0x0e2)
 #define DPCI_CMDID_GET_RX_QUEUE		DPCI_CMD_V1(0x0e3)
 #define DPCI_CMDID_GET_TX_QUEUE		DPCI_CMD_V1(0x0e4)
+#define DPCI_CMDID_SET_OPR		DPCI_CMD_V1(0x0e5)
+#define DPCI_CMDID_GET_OPR		DPCI_CMD_V1(0x0e6)
 
 /* Macros for accessing command fields smaller than 1byte */
 #define DPCI_MASK(field)        \
@@ -90,6 +92,8 @@ struct dpci_rsp_get_link_state {
 
 #define DPCI_DEST_TYPE_SHIFT	0
 #define DPCI_DEST_TYPE_SIZE	4
+#define DPCI_ORDER_PRESERVATION_SHIFT	4
+#define DPCI_ORDER_PRESERVATION_SIZE	1
 
 struct dpci_cmd_set_rx_queue {
 	uint32_t dest_id;
@@ -128,5 +132,61 @@ struct dpci_rsp_get_api_version {
 	uint16_t minor;
 };
 
+struct dpci_cmd_set_opr {
+	uint16_t pad0;
+	uint8_t index;
+	uint8_t options;
+	uint8_t pad1[7];
+	uint8_t oloe;
+	uint8_t oeane;
+	uint8_t olws;
+	uint8_t oa;
+	uint8_t oprrws;
+};
+
+struct dpci_cmd_get_opr {
+	uint16_t pad;
+	uint8_t index;
+};
+
+#define DPCI_RIP_SHIFT		0
+#define DPCI_RIP_SIZE		1
+#define DPCI_OPR_ENABLE_SHIFT	1
+#define DPCI_OPR_ENABLE_SIZE	1
+#define DPCI_TSEQ_NLIS_SHIFT	0
+#define DPCI_TSEQ_NLIS_SIZE	1
+#define DPCI_HSEQ_NLIS_SHIFT	0
+#define DPCI_HSEQ_NLIS_SIZE	1
+
+struct dpci_rsp_get_opr {
+	uint64_t pad0;
+	/* from LSB: rip:1 enable:1 */
+	uint8_t flags;
+	uint16_t pad1;
+	uint8_t oloe;
+	uint8_t oeane;
+	uint8_t olws;
+	uint8_t oa;
+	uint8_t oprrws;
+	uint16_t nesn;
+	uint16_t pad8;
+	uint16_t ndsn;
+	uint16_t pad2;
+	uint16_t ea_tseq;
+	/* only the LSB */
+	uint8_t tseq_nlis;
+	uint8_t pad3;
+	uint16_t ea_hseq;
+	/* only the LSB */
+	uint8_t hseq_nlis;
+	uint8_t pad4;
+	uint16_t ea_hptr;
+	uint16_t pad5;
+	uint16_t ea_tptr;
+	uint16_t pad6;
+	uint16_t opr_vid;
+	uint16_t pad7;
+	uint16_t opr_id;
+};
 #pragma pack(pop)
 #endif /* _FSL_DPCI_CMD_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpcon.h b/drivers/bus/fslmc/mc/fsl_dpcon.h
index 36dd5f3c1..fc0430dc1 100644
--- a/drivers/bus/fslmc/mc/fsl_dpcon.h
+++ b/drivers/bus/fslmc/mc/fsl_dpcon.h
@@ -81,6 +81,25 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
 			 uint16_t token,
 			 struct dpcon_attr *attr);
 
+/**
+ * struct dpcon_notification_cfg - Structure representing notification params
+ * @dpio_id:	DPIO object ID; must be configured with a notification channel;
+ *		to disable notifications set it to 'DPCON_INVALID_DPIO_ID';
+ * @priority:	Priority selection within the DPIO channel; valid values
+ *		are 0-7, depending on the number of priorities in that channel
+ * @user_ctx:	User context value provided with each CDAN message
+ */
+struct dpcon_notification_cfg {
+	int dpio_id;
+	uint8_t priority;
+	uint64_t user_ctx;
+};
+
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+			   uint32_t cmd_flags,
+			   uint16_t token,
+			   struct dpcon_notification_cfg *cfg);
+
 int dpcon_get_api_version(struct fsl_mc_io *mc_io,
 			  uint32_t cmd_flags,
 			  uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai.h b/drivers/bus/fslmc/mc/fsl_dpdmai.h
index 03e46ec14..40469cc13 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai.h
@@ -39,6 +39,7 @@ int dpdmai_close(struct fsl_mc_io *mc_io,
  *	should be configured with 0
  */
 struct dpdmai_cfg {
+	uint8_t num_queues;
 	uint8_t priorities[DPDMAI_PRIO_NUM];
 };
 
@@ -78,6 +79,7 @@ int dpdmai_reset(struct fsl_mc_io *mc_io,
 struct dpdmai_attr {
 	int id;
 	uint8_t num_of_priorities;
+	uint8_t num_of_queues;
 };
 
 int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
@@ -149,6 +151,7 @@ struct dpdmai_rx_queue_cfg {
 int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			const struct dpdmai_rx_queue_cfg *cfg);
 
@@ -168,6 +171,7 @@ struct dpdmai_rx_queue_attr {
 int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			struct dpdmai_rx_queue_attr *attr);
 
@@ -183,6 +187,7 @@ struct dpdmai_tx_queue_attr {
 int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
 			uint32_t cmd_flags,
 			uint16_t token,
+			uint8_t queue_idx,
 			uint8_t priority,
 			struct dpdmai_tx_queue_attr *attr);
 
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
index 618e19eae..7e122de4e 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
@@ -7,30 +7,32 @@
 
 /* DPDMAI Version */
 #define DPDMAI_VER_MAJOR		3
-#define DPDMAI_VER_MINOR		2
+#define DPDMAI_VER_MINOR		3
 
 /* Command versioning */
 #define DPDMAI_CMD_BASE_VERSION		1
+#define DPDMAI_CMD_VERSION_2		2
 #define DPDMAI_CMD_ID_OFFSET		4
 
 #define DPDMAI_CMD(id)	((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_BASE_VERSION)
+#define DPDMAI_CMD_V2(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_VERSION_2)
 
 /* Command IDs */
 #define DPDMAI_CMDID_CLOSE		DPDMAI_CMD(0x800)
 #define DPDMAI_CMDID_OPEN		DPDMAI_CMD(0x80E)
-#define DPDMAI_CMDID_CREATE		DPDMAI_CMD(0x90E)
+#define DPDMAI_CMDID_CREATE		DPDMAI_CMD_V2(0x90E)
 #define DPDMAI_CMDID_DESTROY		DPDMAI_CMD(0x98E)
 #define DPDMAI_CMDID_GET_API_VERSION	DPDMAI_CMD(0xa0E)
 
 #define DPDMAI_CMDID_ENABLE		DPDMAI_CMD(0x002)
 #define DPDMAI_CMDID_DISABLE		DPDMAI_CMD(0x003)
-#define DPDMAI_CMDID_GET_ATTR		DPDMAI_CMD(0x004)
+#define DPDMAI_CMDID_GET_ATTR		DPDMAI_CMD_V2(0x004)
 #define DPDMAI_CMDID_RESET		DPDMAI_CMD(0x005)
 #define DPDMAI_CMDID_IS_ENABLED		DPDMAI_CMD(0x006)
 
-#define DPDMAI_CMDID_SET_RX_QUEUE	DPDMAI_CMD(0x1A0)
-#define DPDMAI_CMDID_GET_RX_QUEUE	DPDMAI_CMD(0x1A1)
-#define DPDMAI_CMDID_GET_TX_QUEUE	DPDMAI_CMD(0x1A2)
+#define DPDMAI_CMDID_SET_RX_QUEUE	DPDMAI_CMD_V2(0x1A0)
+#define DPDMAI_CMDID_GET_RX_QUEUE	DPDMAI_CMD_V2(0x1A1)
+#define DPDMAI_CMDID_GET_TX_QUEUE	DPDMAI_CMD_V2(0x1A2)
 
 /* Macros for accessing command fields smaller than 1byte */
 #define DPDMAI_MASK(field)        \
@@ -47,7 +49,7 @@ struct dpdmai_cmd_open {
 };
 
 struct dpdmai_cmd_create {
-	uint8_t pad;
+	uint8_t num_queues;
 	uint8_t priorities[2];
 };
 
@@ -66,6 +68,7 @@ struct dpdmai_rsp_is_enabled {
 struct dpdmai_rsp_get_attr {
 	uint32_t id;
 	uint8_t num_of_priorities;
+	uint8_t num_of_queues;
 };
 
 #define DPDMAI_DEST_TYPE_SHIFT	0
@@ -77,7 +80,7 @@ struct dpdmai_cmd_set_rx_queue {
 	uint8_t priority;
 	/* from LSB: dest_type:4 */
 	uint8_t dest_type;
-	uint8_t pad;
+	uint8_t queue_idx;
 	uint64_t user_ctx;
 	uint32_t options;
 };
@@ -85,6 +88,7 @@ struct dpdmai_cmd_set_rx_queue {
 struct dpdmai_cmd_get_queue {
 	uint8_t pad[5];
 	uint8_t priority;
+	uint8_t queue_idx;
 };
 
 struct dpdmai_rsp_get_rx_queue {
diff --git a/drivers/bus/fslmc/mc/fsl_dpmng.h b/drivers/bus/fslmc/mc/fsl_dpmng.h
index afaf9b711..8559bef87 100644
--- a/drivers/bus/fslmc/mc/fsl_dpmng.h
+++ b/drivers/bus/fslmc/mc/fsl_dpmng.h
@@ -18,7 +18,7 @@ struct fsl_mc_io;
  * Management Complex firmware version information
  */
 #define MC_VER_MAJOR 10
-#define MC_VER_MINOR 3
+#define MC_VER_MINOR 10
 
 /**
  * struct mc_version
diff --git a/drivers/bus/fslmc/mc/fsl_dpopr.h b/drivers/bus/fslmc/mc/fsl_dpopr.h
new file mode 100644
index 000000000..fd727e011
--- /dev/null
+++ b/drivers/bus/fslmc/mc/fsl_dpopr.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
+ *
+ * Copyright 2013-2015 Freescale Semiconductor Inc.
+ * Copyright 2018 NXP
+ *
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/** @addtogroup dpopr Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ * @{
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ *			0 - Window size is 32 frames.
+ *			1 - Window size is 64 frames.
+ *			2 - Window size is 128 frames.
+ *			3 - Window size is 256 frames.
+ *			4 - Window size is 512 frames.
+ *			5 - Window size is 1024 frames.
+ *@oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ *@olws: OPR acceptable late arrival window size (0 to 3)
+ *			0 - Disabled. Late arrivals are always rejected.
+ *			1 - Window size is 32 frames.
+ *			2 - Window size is the same as the OPR restoration
+ *			window size configured in the OPRRWS field.
+ *			3 - Window size is 8192 frames.
+ *			Late arrivals are always accepted.
+ *@oeane: Order restoration list (ORL) resource exhaustion
+ *			advance NESN enable (0 disabled, 1 enabled)
+ *@oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+	uint8_t oprrws;
+	uint8_t oa;
+	uint8_t olws;
+	uint8_t oeane;
+	uint8_t oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+	char enable;
+	char rip;
+	uint16_t ndsn;
+	uint16_t nesn;
+	uint16_t ea_hseq;
+	char hseq_nlis;
+	uint16_t ea_tseq;
+	char tseq_nlis;
+	uint16_t ea_tptr;
+	uint16_t ea_hptr;
+	uint16_t opr_id;
+	uint16_t opr_vid;
+};
+
+#endif /* __FSL_DPOPR_H_ */
diff --git a/drivers/bus/fslmc/meson.build b/drivers/bus/fslmc/meson.build
index 22a56a6fc..54ca92d0c 100644
--- a/drivers/bus/fslmc/meson.build
+++ b/drivers/bus/fslmc/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 if host_machine.system() != 'linux'
         build = false
 endif
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index b4a881704..8717373dd 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -117,3 +117,13 @@ DPDK_18.05 {
 	rte_dpaa2_memsegs;
 
 } DPDK_18.02;
+
+DPDK_18.11 {
+	global:
+
+	dpci_get_link_state;
+	dpci_get_opr;
+	dpci_get_peer_attributes;
+	dpci_set_opr;
+
+} DPDK_18.05;
diff --git a/drivers/crypto/dpaa2_sec/Makefile b/drivers/crypto/dpaa2_sec/Makefile
index da3d8f84f..a61be49db 100644
--- a/drivers/crypto/dpaa2_sec/Makefile
+++ b/drivers/crypto/dpaa2_sec/Makefile
@@ -41,7 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
 EXPORT_MAP := rte_pmd_dpaa2_sec_version.map
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # library source files
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC) += dpaa2_sec_dpseci.c
diff --git a/drivers/crypto/dpaa2_sec/meson.build b/drivers/crypto/dpaa2_sec/meson.build
index 01afc5877..8fa4827ed 100644
--- a/drivers/crypto/dpaa2_sec/meson.build
+++ b/drivers/crypto/dpaa2_sec/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 if host_machine.system() != 'linux'
         build = false
 endif
diff --git a/drivers/event/dpaa2/Makefile b/drivers/event/dpaa2/Makefile
index 5e1a63200..3f85dd2be 100644
--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc
 # versioning export map
 EXPORT_MAP := rte_pmd_dpaa2_event_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # depends on fslmc bus which uses experimental API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/event/dpaa2/meson.build b/drivers/event/dpaa2/meson.build
index de7a46155..c46b39e9d 100644
--- a/drivers/event/dpaa2/meson.build
+++ b/drivers/event/dpaa2/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 if host_machine.system() != 'linux'
 	build = false
 endif
diff --git a/drivers/mempool/dpaa2/Makefile b/drivers/mempool/dpaa2/Makefile
index 9e4c87d79..4996a2cd1 100644
--- a/drivers/mempool/dpaa2/Makefile
+++ b/drivers/mempool/dpaa2/Makefile
@@ -19,7 +19,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
 EXPORT_MAP := rte_mempool_dpaa2_version.map
 
 # Lbrary version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # depends on fslmc bus which uses experimental API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/mempool/dpaa2/meson.build b/drivers/mempool/dpaa2/meson.build
index 90bab6069..6b6ead617 100644
--- a/drivers/mempool/dpaa2/meson.build
+++ b/drivers/mempool/dpaa2/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 if host_machine.system() != 'linux'
         build = false
 endif
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
index 9b0b14331..1d46f7f25 100644
--- a/drivers/net/dpaa2/Makefile
+++ b/drivers/net/dpaa2/Makefile
@@ -25,7 +25,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
 EXPORT_MAP := rte_pmd_dpaa2_version.map
 
 # library version
-LIBABIVER := 1
+LIBABIVER := 2
 
 # depends on fslmc bus which uses experimental API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/net/dpaa2/meson.build b/drivers/net/dpaa2/meson.build
index 213f0d72f..b34595258 100644
--- a/drivers/net/dpaa2/meson.build
+++ b/drivers/net/dpaa2/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 if host_machine.system() != 'linux'
         build = false
 endif
diff --git a/drivers/raw/dpaa2_cmdif/Makefile b/drivers/raw/dpaa2_cmdif/Makefile
index 9b863dda2..0dbe5c821 100644
--- a/drivers/raw/dpaa2_cmdif/Makefile
+++ b/drivers/raw/dpaa2_cmdif/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_rawdev
 
 EXPORT_MAP := rte_pmd_dpaa2_cmdif_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 #
 # all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_cmdif/meson.build b/drivers/raw/dpaa2_cmdif/meson.build
index 1d146872e..37bb24a1b 100644
--- a/drivers/raw/dpaa2_cmdif/meson.build
+++ b/drivers/raw/dpaa2_cmdif/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
 deps += ['rawdev', 'mempool_dpaa2', 'bus_vdev']
 sources = files('dpaa2_cmdif.c')
diff --git a/drivers/raw/dpaa2_qdma/Makefile b/drivers/raw/dpaa2_qdma/Makefile
index d88809ead..645220772 100644
--- a/drivers/raw/dpaa2_qdma/Makefile
+++ b/drivers/raw/dpaa2_qdma/Makefile
@@ -25,7 +25,7 @@ LDLIBS += -lrte_ring
 
 EXPORT_MAP := rte_pmd_dpaa2_qdma_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 #
 # all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index 2787d3028..44503331e 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -805,7 +805,7 @@ dpaa2_dpdmai_dev_uninit(struct rte_rawdev *rawdev)
 		DPAA2_QDMA_ERR("dmdmai disable failed");
 
 	/* Set up the DQRR storage for Rx */
-	for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+	for (i = 0; i < dpdmai_dev->num_queues; i++) {
 		struct dpaa2_queue *rxq = &(dpdmai_dev->rx_queue[i]);
 
 		if (rxq->q_storage) {
@@ -856,17 +856,17 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
 			       ret);
 		goto init_err;
 	}
-	dpdmai_dev->num_queues = attr.num_of_priorities;
+	dpdmai_dev->num_queues = attr.num_of_queues;
 
 	/* Set up Rx Queues */
-	for (i = 0; i < attr.num_of_priorities; i++) {
+	for (i = 0; i < dpdmai_dev->num_queues; i++) {
 		struct dpaa2_queue *rxq;
 
 		memset(&rx_queue_cfg, 0, sizeof(struct dpdmai_rx_queue_cfg));
 		ret = dpdmai_set_rx_queue(&dpdmai_dev->dpdmai,
 					  CMD_PRI_LOW,
 					  dpdmai_dev->token,
-					  i, &rx_queue_cfg);
+					  i, 0, &rx_queue_cfg);
 		if (ret) {
 			DPAA2_QDMA_ERR("Setting Rx queue failed with err: %d",
 				       ret);
@@ -893,9 +893,9 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
 	}
 
 	/* Get Rx and Tx queues FQID's */
-	for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+	for (i = 0; i < dpdmai_dev->num_queues; i++) {
 		ret = dpdmai_get_rx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
-					  dpdmai_dev->token, i, &rx_attr);
+					  dpdmai_dev->token, i, 0, &rx_attr);
 		if (ret) {
 			DPAA2_QDMA_ERR("Reading device failed with err: %d",
 				       ret);
@@ -904,7 +904,7 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
 		dpdmai_dev->rx_queue[i].fqid = rx_attr.fqid;
 
 		ret = dpdmai_get_tx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
-					  dpdmai_dev->token, i, &tx_attr);
+					  dpdmai_dev->token, i, 0, &tx_attr);
 		if (ret) {
 			DPAA2_QDMA_ERR("Reading device failed with err: %d",
 				       ret);
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
index c6a057806..0cbe90255 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
@@ -11,6 +11,8 @@ struct qdma_io_meta;
 #define DPAA2_QDMA_MAX_FLE 3
 #define DPAA2_QDMA_MAX_SDD 2
 
+#define DPAA2_DPDMAI_MAX_QUEUES	8
+
 /** FLE pool size: 3 Frame list + 2 source/destination descriptor */
 #define QDMA_FLE_POOL_SIZE (sizeof(struct qdma_io_meta) + \
 		sizeof(struct qbman_fle) * DPAA2_QDMA_MAX_FLE + \
@@ -142,9 +144,9 @@ struct dpaa2_dpdmai_dev {
 	/** Number of queue in this DPDMAI device */
 	uint8_t num_queues;
 	/** RX queues */
-	struct dpaa2_queue rx_queue[DPDMAI_PRIO_NUM];
+	struct dpaa2_queue rx_queue[DPAA2_DPDMAI_MAX_QUEUES];
 	/** TX queues */
-	struct dpaa2_queue tx_queue[DPDMAI_PRIO_NUM];
+	struct dpaa2_queue tx_queue[DPAA2_DPDMAI_MAX_QUEUES];
 };
 
 #endif /* __DPAA2_QDMA_H__ */
diff --git a/drivers/raw/dpaa2_qdma/meson.build b/drivers/raw/dpaa2_qdma/meson.build
index b6a081f11..2a4b69c16 100644
--- a/drivers/raw/dpaa2_qdma/meson.build
+++ b/drivers/raw/dpaa2_qdma/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright 2018 NXP
 
+version = 2
+
 build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
 deps += ['rawdev', 'mempool_dpaa2', 'ring']
 sources = files('dpaa2_qdma.c')
-- 
2.17.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
  @ 2018-09-26 18:04  2% ` Shreyansh Jain
  2018-09-26 18:04  2%   ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
  To: dev, ferruh.yigit; +Cc: thomas, Shreyansh Jain

About the series:

This series of patches upgrades the DPAA2 driver firmware to
v10.10.10 (MC Firmware).
As the bus/fslmc is modified, it is a dependent object for other
drivers like net/crypto/qdma. Also, the changes are mostly tightly
linked - thus, the patches include upgrade as well as sequential
changes to driver.
Once done, it would imply that DPAA2 driver won't work with any MC
FW lower than 10.10.10.

Support for this new firmware is available in publically available
LSDK (Layerscape SDK) release [1].

Besides the FW change, there are other subtle changes as well:
- Support reading the MAC address from NIC device, rather than
  using a default MAC
- Adding support for QBMan 5.0 FW APIs
- Some patches for NXP's LX2 platform specific features
- And some bug fixes.

Dependency:

* These patches are based on net-next/master 58c3b609699a8c
* Series [1] is logically related to this, but has no git/patch
  related dependency. It is series for upgrade of DPAA.

[1] https://lsdk.github.io/index.html
[2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*

Version History:
v1->v2:
 - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
   first set of patches (MC firmware update) breaks the internal ABI
 - Added support for ordered processing APIs. These APIs are expected
   to be used in subseqent feature updates on DPAA2 ethernet driver.
 - Some internal bug fixes.
 (Patches increased from 11~15)

Hemant Agrawal (9):
  net/dpaa2: fix VLAN filter enablement
  bus/fslmc: upgrade mc FW APIs to 10.10.0
  net/dpaa2: upgrade dpni to mc FW APIs to 10.10.0
  crypto/dpaa2_sec: upgarde mc FW APIs to 10.10.0
  net/dpaa2: update RSS value in mbuf for lx2 platform
  net/dpaa2: optimize the fd reset in Tx path
  net/dpaa2: enhance the queue memory cleanup routines
  net/dpaa2: support MBUF VLAN tci population from HW parser
  net/dpaa2: support Rx checksum offload in slow parsing

Nipun Gupta (4):
  net/dpaa2: fix IOVA conversion for congestion memory
  bus/fslmc: support memory backed portals with QBMAN 5.0
  bus/fslmc: support 32 enq and deq for LX2 platform
  bus/fslmc: disable annotation prefetch for LX2

Shreyansh Jain (2):
  net/dpaa2: read hardware provided MAC for DPNI devices
  net/dpaa2: add per queue stats get and reset support

 drivers/bus/fslmc/Makefile                    |   2 +-
 drivers/bus/fslmc/mc/dpbp.c                   |  10 +
 drivers/bus/fslmc/mc/dpci.c                   | 197 +++++
 drivers/bus/fslmc/mc/dpcon.c                  |  30 +
 drivers/bus/fslmc/mc/dpdmai.c                 |  14 +
 drivers/bus/fslmc/mc/dpio.c                   |   9 +
 drivers/bus/fslmc/mc/fsl_dpbp.h               |   1 +
 drivers/bus/fslmc/mc/fsl_dpbp_cmd.h           |  16 +-
 drivers/bus/fslmc/mc/fsl_dpci.h               |  47 +-
 drivers/bus/fslmc/mc/fsl_dpci_cmd.h           |  62 +-
 drivers/bus/fslmc/mc/fsl_dpcon.h              |  19 +
 drivers/bus/fslmc/mc/fsl_dpdmai.h             |   5 +
 drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h         |  20 +-
 drivers/bus/fslmc/mc/fsl_dpmng.h              |   2 +-
 drivers/bus/fslmc/mc/fsl_dpopr.h              |  85 ++
 drivers/bus/fslmc/meson.build                 |   2 +
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c      | 197 +++--
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.h      |   4 +
 drivers/bus/fslmc/portal/dpaa2_hw_pvt.h       |  32 +-
 drivers/bus/fslmc/qbman/include/compat.h      |   3 +-
 .../fslmc/qbman/include/fsl_qbman_portal.h    |  33 +-
 drivers/bus/fslmc/qbman/qbman_portal.c        | 764 +++++++++++++++---
 drivers/bus/fslmc/qbman/qbman_portal.h        |  30 +-
 drivers/bus/fslmc/qbman/qbman_sys.h           | 100 ++-
 drivers/bus/fslmc/qbman/qbman_sys_decl.h      |   4 +
 drivers/bus/fslmc/rte_bus_fslmc_version.map   |  12 +
 drivers/crypto/dpaa2_sec/Makefile             |   2 +-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c   |   8 +-
 drivers/crypto/dpaa2_sec/mc/dpseci.c          | 128 ++-
 drivers/crypto/dpaa2_sec/mc/fsl_dpseci.h      |  25 +-
 drivers/crypto/dpaa2_sec/mc/fsl_dpseci_cmd.h  |  73 +-
 drivers/crypto/dpaa2_sec/meson.build          |   2 +
 drivers/event/dpaa2/Makefile                  |   2 +-
 drivers/event/dpaa2/dpaa2_eventdev.c          |   4 +-
 drivers/event/dpaa2/meson.build               |   2 +
 drivers/mempool/dpaa2/Makefile                |   2 +-
 drivers/mempool/dpaa2/meson.build             |   2 +
 drivers/net/dpaa2/Makefile                    |   2 +-
 drivers/net/dpaa2/base/dpaa2_hw_dpni_annot.h  |  40 +
 drivers/net/dpaa2/dpaa2_ethdev.c              | 173 +++-
 drivers/net/dpaa2/dpaa2_rxtx.c                |  95 ++-
 drivers/net/dpaa2/mc/dpni.c                   | 134 ++-
 drivers/net/dpaa2/mc/fsl_dpkg.h               |  71 +-
 drivers/net/dpaa2/mc/fsl_dpni.h               | 378 +++++----
 drivers/net/dpaa2/mc/fsl_dpni_cmd.h           |  87 +-
 drivers/net/dpaa2/mc/fsl_net.h                |   2 +-
 drivers/net/dpaa2/meson.build                 |   2 +
 drivers/raw/dpaa2_cmdif/Makefile              |   2 +-
 drivers/raw/dpaa2_cmdif/meson.build           |   2 +
 drivers/raw/dpaa2_qdma/Makefile               |   2 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c           |  14 +-
 drivers/raw/dpaa2_qdma/dpaa2_qdma.h           |   6 +-
 drivers/raw/dpaa2_qdma/meson.build            |   2 +
 53 files changed, 2377 insertions(+), 585 deletions(-)
 create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h

-- 
2.17.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
  @ 2018-09-26 12:22  3%   ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-26 12:22 UTC (permalink / raw)
  To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
	matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
	bernard.iremonger, arybchenko
  Cc: jblunck, shreyansh.jain, dev, helin.zhang

On 17-Aug-18 11:51 AM, Jeff Guo wrote:
> There are some extended interrupt types in vfio pci device except from the
> existing interrupts, such as err and req notifier, it could be useful for
> device error monitoring. And these corresponding interrupt handler is
> different from the other interrupt handler that register in PMDs, so a new
> interrupt handler should be added. This patch will add specific req handler
> in generic pci device.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>   drivers/bus/pci/rte_bus_pci.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index 0d1955f..c45a820 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -66,6 +66,7 @@ struct rte_pci_device {
>   	uint16_t max_vfs;                   /**< sriov enable if not zero */
>   	enum rte_kernel_driver kdrv;        /**< Kernel driver passthrough */
>   	char name[PCI_PRI_STR_SIZE+1];      /**< PCI location (ASCII) */
> +	struct rte_intr_handle req_notifier_handler;/**< Req notifier handle */
>   };
>   
>   /**
> 

Does this break ABI?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external
      2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-26 11:22 16% ` Anatoly Burakov
  2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 13 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 134 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,10 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+       a new flag indicating whether the memseg list refers to external memory.
+
 Removed Items
 -------------
 
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID
                     ` (2 preceding siblings ...)
  2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-26 11:22  4% ` Anatoly Burakov
  2018-09-26 11:22  9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
  2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
  5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps
                     ` (4 preceding siblings ...)
  2018-09-26 11:22  9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-26 11:22  4% ` Anatoly Burakov
  5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst        |  2 +
 .../common/include/rte_eal_memconfig.h        |  3 ++
 lib/librte_eal/common/include/rte_malloc.h    | 19 +++++++
 lib/librte_eal/common/malloc_heap.c           | 37 +++++++++++++
 lib/librte_eal/common/malloc_heap.h           |  3 ++
 lib/librte_eal/common/rte_malloc.c            | 52 +++++++++++++++++++
 lib/librte_eal/rte_eal_version.map            |  1 +
 7 files changed, 117 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
        Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+       Structure ``rte_eal_memconfig`` has been extended to contain next socket
+       ID for externally allocated memory segments.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
 	/* Heaps of Malloc */
 	struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
 
+	/* next socket ID for external malloc heap */
+	int next_socket_id;
+
 	/* address of mem_config in primary process. used to map shared config into
 	 * exact same address the primary process maps it.
 	 */
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
 rte_malloc_get_socket_stats(int socket,
 		struct rte_malloc_socket_stats *socket_stats);
 
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ *   socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ *   Name of the heap to create.
+ *
+ * @return
+ *   - 0 on successful creation
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - ``heap_name`` was NULL, empty or too long
+ *     EEXIST - heap by name of ``heap_name`` already exists
+ *     ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
 /**
  * Find socket ID corresponding to a named heap.
  *
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
 #include "malloc_heap.h"
 #include "malloc_mp.h"
 
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
 static unsigned
 check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
 {
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
 	rte_spinlock_unlock(&heap->lock);
 }
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	uint32_t next_socket_id = mcfg->next_socket_id;
+
+	/* prevent overflow. did you really create 2 billion heaps??? */
+	if (next_socket_id > INT32_MAX) {
+		RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+		rte_errno = ENOSPC;
+		return -1;
+	}
+
+	/* initialize empty heap */
+	heap->alloc_count = 0;
+	heap->first = NULL;
+	heap->last = NULL;
+	LIST_INIT(heap->free_head);
+	rte_spinlock_init(&heap->lock);
+	heap->total_size = 0;
+	heap->socket_id = next_socket_id;
+
+	/* we hold a global mem hotplug writelock, so it's safe to increment */
+	mcfg->next_socket_id++;
+
+	/* set up name */
+	strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+	return 0;
+}
+
 int
 rte_eal_malloc_heap_init(void)
 {
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
 	unsigned int i;
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign min socket ID to external heaps */
+		mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
 		/* assign names to default DPDK heaps */
 		for (i = 0; i < rte_socket_count(); i++) {
 			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
 malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
 		size_t align, bool contig);
 
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
 int
 malloc_heap_free(struct malloc_elem *elem);
 
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
 #include <rte_memory.h>
 #include <rte_eal.h>
 #include <rte_eal_memconfig.h>
+#include <rte_errno.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
 
 	return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
 }
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct malloc_heap *heap = NULL;
+	int i, ret;
+
+	if (heap_name == NULL ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+			strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+				RTE_HEAP_NAME_MAX_LEN) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	/* check if there is space in the heap list, or if heap with this name
+	 * already exists.
+	 */
+	rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+	for (i = 0; i < RTE_MAX_HEAPS; i++) {
+		struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+		/* existing heap */
+		if (strncmp(heap_name, tmp->name,
+				RTE_HEAP_NAME_MAX_LEN) == 0) {
+			RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+				heap_name);
+			rte_errno = EEXIST;
+			ret = -1;
+			goto unlock;
+		}
+		/* empty heap */
+		if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+			heap = tmp;
+			break;
+		}
+	}
+	if (heap == NULL) {
+		RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we're sure that we can create a new heap, so do it */
+	ret = malloc_heap_create(heap, heap_name);
+unlock:
+	rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+	return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
 	rte_fbarray_set_used;
 	rte_log_register_type_and_pick_level;
 	rte_malloc_dump_heaps;
+	rte_malloc_heap_create;
 	rte_malloc_heap_get_socket;
 	rte_malloc_heap_socket_is_external;
 	rte_mem_alloc_validator_register;
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK
    @ 2018-09-26 11:21  2% ` Anatoly Burakov
  2018-09-27 10:40  2%   ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
                     ` (4 more replies)
  2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
                   ` (3 subsequent siblings)
  5 siblings, 5 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:21 UTC (permalink / raw)
  To: dev
  Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
	daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
	scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
	thomas, shreyansh.jain, shahafs, arybchenko

This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes

v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
  IOVA-contiguousness when dealing with externally allocated memory

v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
  comments

v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (21):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: add function to check if socket is external
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  malloc: enable event callbacks for external memory
  test: add unit tests for external memory support
  app/testpmd: add support for external memory
  doc: add external memory feature to the release notes
  doc: add external memory feature to programmer's guide

 app/test-pmd/config.c                         |  21 +-
 app/test-pmd/parameters.c                     |  23 +-
 app/test-pmd/testpmd.c                        | 305 ++++++++++++-
 app/test-pmd/testpmd.h                        |  13 +-
 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 .../prog_guide/env_abstraction_layer.rst      |  37 ++
 doc/guides/rel_notes/deprecation.rst          |  15 -
 doc/guides/rel_notes/release_18_11.rst        |  28 +-
 doc/guides/testpmd_app_ug/run_app.rst         |  12 +
 drivers/bus/fslmc/fslmc_vfio.c                |  14 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 .../net/virtio/virtio_user/virtio_user_dev.c  |   8 +
 lib/librte_eal/bsdapp/eal/Makefile            |   2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   8 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   9 +-
 lib/librte_eal/common/include/rte_malloc.h    | 192 ++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_elem.c           |  10 +-
 lib/librte_eal/common/malloc_heap.c           | 316 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 429 +++++++++++++++++-
 lib/librte_eal/linuxapp/eal/Makefile          |   2 +-
 lib/librte_eal/linuxapp/eal/eal.c             |  10 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  27 +-
 lib/librte_eal/meson.build                    |   2 +-
 lib/librte_eal/rte_eal_version.map            |   8 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  57 ++-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 389 ++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 47 files changed, 1913 insertions(+), 139 deletions(-)
 create mode 100644 test/test/test_external_mem.c

-- 
2.17.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps
                     ` (3 preceding siblings ...)
  2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-26 11:22  9% ` Anatoly Burakov
  2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
  5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst          |  1 +
 lib/librte_eal/common/include/rte_malloc_heap.h |  2 ++
 lib/librte_eal/common/malloc_heap.c             | 17 ++++++++++++++++-
 lib/librte_eal/common/rte_malloc.c              |  1 +
 4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
 * eal: EAL library ABI version was changed due to previously announced work on
        supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
        a new flag indicating whether the memseg list refers to external memory.
+       Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
 
 Removed Items
 -------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
 
 /* Number of free lists per heap, grouped by size. */
 #define RTE_HEAP_NUM_FREELISTS  13
+#define RTE_HEAP_NAME_MAX_LEN 32
 
 /* dummy definition, for pointers */
 struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
 	unsigned alloc_count;
 	size_t total_size;
 	unsigned int socket_id;
+	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
 #endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	malloc_heap_add_memory(heap, found_msl, ms->addr, len);
 
 	heap->total_size += len;
-	heap->socket_id = msl->socket_id;
 
 	RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
 			msl->socket_id);
@@ -1020,6 +1019,22 @@ int
 rte_eal_malloc_heap_init(void)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int i;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* assign names to default DPDK heaps */
+		for (i = 0; i < rte_socket_count(); i++) {
+			struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+			char heap_name[RTE_HEAP_NAME_MAX_LEN];
+			int socket_id = rte_socket_id_by_idx(i);
+
+			snprintf(heap_name, sizeof(heap_name) - 1,
+					"socket_%i", socket_id);
+			strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+			heap->socket_id = socket_id;
+		}
+	}
+
 
 	if (register_mp_requests()) {
 		RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
 		malloc_heap_get_stats(heap, &sock_stats);
 
 		fprintf(f, "Heap id:%u\n", heap_id);
+		fprintf(f, "\tHeap name:%s\n", heap->name);
 		fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
 		fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
 		fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
-- 
2.17.1

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v1] doc: remove unused release note file
@ 2018-09-25 15:25  3% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2018-09-25 15:25 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Remove unused file from the release notes docs. This file was
used to display a hierarchy in older releases, circa 2015, but
doesn't seem useful in the current structure.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/index.rst           |  1 -
 doc/guides/rel_notes/rel_description.rst | 12 ------------
 2 files changed, 13 deletions(-)
 delete mode 100644 doc/guides/rel_notes/rel_description.rst

diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 89fdb4b..1243e98 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,7 +8,6 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
-    rel_description
     release_18_11
     release_18_08
     release_18_05
diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst
deleted file mode 100644
index 8f28556..0000000
--- a/doc/guides/rel_notes/rel_description.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2010-2015 Intel Corporation.
-
-Description of Release
-======================
-
-This document contains the release notes for Data Plane Development Kit (DPDK)
-release version |release| and previous releases.
-
-It lists new features, fixed bugs, API and ABI changes and known issues.
-
-For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide <linux_gsg>`.
-- 
2.7.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
  2018-09-25 12:22  3%   ` Luca Boccassi
  2018-09-25 12:57  3%     ` Thomas Monjalon
@ 2018-09-25 14:34  0%     ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-09-25 14:34 UTC (permalink / raw)
  To: Luca Boccassi, Thomas Monjalon; +Cc: dev

Hi Luca,

> 
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > >
> > > Fixes: dc276b5780c2 ("acl: new library")
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Cc: stable@dpdk.org
> >
> > Applied with below title, thanks
> > 	acl: forbid rule with priority zero
> 
> Hi,
> 
> This patch is marked for stable, but it changes an enum in a public header 

Yes it does.

> so it looks like an ABI breakage? Have I got it wrong?

Strictly speaking - yes, but priority=0 is invalid value with current implementation.
I don't think someone uses it - as in that case acl library simply wouldn't work
correctly.
Konstantin

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
  2018-09-25 12:22  3%   ` Luca Boccassi
@ 2018-09-25 12:57  3%     ` Thomas Monjalon
  2018-09-25 14:34  0%     ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-25 12:57 UTC (permalink / raw)
  To: Luca Boccassi, Konstantin Ananyev; +Cc: dev

25/09/2018 14:22, Luca Boccassi:
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > > 
> > > Fixes: dc276b5780c2 ("acl: new library")
> > > 
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > 
> > Cc: stable@dpdk.org
> > 
> > Applied with below title, thanks
> > 	acl: forbid rule with priority zero
> 
> Hi,
> 
> This patch is marked for stable, but it changes an enum in a public
> header so it looks like an ABI breakage? Have I got it wrong?

-	RTE_ACL_MIN_PRIORITY = 0,
+	RTE_ACL_MIN_PRIORITY = 1,

In my understanding, the change is not breaking the ABI because
the old minimal value (0) can still be used, with the same side effect.

The new value is just removing a side effect for newly compiled apps.

Konstantin, am I right?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
  @ 2018-09-25 12:22  3%   ` Luca Boccassi
  2018-09-25 12:57  3%     ` Thomas Monjalon
  2018-09-25 14:34  0%     ` Ananyev, Konstantin
  0 siblings, 2 replies; 200+ results
From: Luca Boccassi @ 2018-09-25 12:22 UTC (permalink / raw)
  To: Thomas Monjalon, Konstantin Ananyev; +Cc: dev

On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> 24/08/2018 18:47, Konstantin Ananyev:
> > If user specifies priority=0 for some of ACL rules
> > that can cause rte_acl_classify to return wrong results.
> > The reason is that priority zero is used internally for no-match
> > nodes.
> > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > The simplest way to overcome the issue is just not allow zero
> > to be a valid priority for the rule.
> > 
> > Fixes: dc276b5780c2 ("acl: new library")
> > 
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> Cc: stable@dpdk.org
> 
> Applied with below title, thanks
> 	acl: forbid rule with priority zero

Hi,

This patch is marked for stable, but it changes an enum in a public
header so it looks like an ABI breakage? Have I got it wrong?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
  2018-09-25  9:50  0%     ` Thomas Monjalon
@ 2018-09-25  9:56  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-09-25  9:56 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Nikhil Rao, dev, stable

-----Original Message-----
> Date: Tue, 25 Sep 2018 11:50:06 +0200
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: Nikhil Rao <nikhil.rao@intel.com>, dev@dpdk.org, stable@dpdk.org
> Subject: Re: [PATCH v2] eventdev: fix port id argument in Rx adapter caps
>  API
> 
> 
> 25/09/2018 11:15, Jerin Jacob:
> > -----Original Message-----
> > > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > > From: Nikhil Rao <nikhil.rao@intel.com>
> > > To: jerin.jacob@caviumnetworks.com
> > > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > > X-Mailer: git-send-email 1.8.3.1
> > >
> > >
> > > Make the ethernet port id passed into
> > > rte_event_eth_rx_adapter_caps_get() 16 bit.
> > >
> > > Also, update the event rx adapter test to use 16 bit
> > > ethernet port ids.
> > >
> > > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > >
> > > v2:
> > > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > > * add update to release notes (Jerin Jacob)
> > >
> > >  lib/librte_eventdev/rte_eventdev.h     | 2 +-
> > >  lib/librte_eventdev/rte_eventdev.c     | 2 +-
> > >  test/test/test_event_eth_rx_adapter.c  | 6 +++---
> > >  doc/guides/rel_notes/release_18_11.rst | 4 +++-
> > >  lib/librte_eventdev/Makefile           | 2 +-
> >
> > Missing version update in lib/librte_eventdev/meson.build. See version=
> >
> > >  5 files changed, 9 insertions(+), 7 deletions(-)
> > >
> > >  ABI Changes
> > >  -----------
> > > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> > >       librte_distributor.so.1
> > >       librte_eal.so.8
> > >       librte_ethdev.so.10
> > > -     librte_eventdev.so.4
> > > +   + librte_eventdev.so.6
> >
> > Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> > release notes. The version(change to librte_eventdev.so.5) should have been
> > updated in change set in 3810ae4357.
> >
> > +Thomas,
> > In case if he has difference in opinion on updating released release note file.
> 
> I prefer such changes being atomic.

Me too. But the offending change set(3810ae4357) is old
➜ [master][dpdk.org] $ git describe  3810ae4357
v18.05-389-g3810ae435

Do you prefer to have patch to update the release_18_08.rst file or ignore it?


> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
  2018-09-25  9:15  0%   ` Jerin Jacob
@ 2018-09-25  9:50  0%     ` Thomas Monjalon
  2018-09-25  9:56  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-09-25  9:50 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: Nikhil Rao, dev, stable

25/09/2018 11:15, Jerin Jacob:
> -----Original Message-----
> > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > From: Nikhil Rao <nikhil.rao@intel.com>
> > To: jerin.jacob@caviumnetworks.com
> > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > X-Mailer: git-send-email 1.8.3.1
> > 
> > 
> > Make the ethernet port id passed into
> > rte_event_eth_rx_adapter_caps_get() 16 bit.
> > 
> > Also, update the event rx adapter test to use 16 bit
> > ethernet port ids.
> > 
> > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> > 
> > v2:
> > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > * add update to release notes (Jerin Jacob)
> > 
> >  lib/librte_eventdev/rte_eventdev.h     | 2 +-
> >  lib/librte_eventdev/rte_eventdev.c     | 2 +-
> >  test/test/test_event_eth_rx_adapter.c  | 6 +++---
> >  doc/guides/rel_notes/release_18_11.rst | 4 +++-
> >  lib/librte_eventdev/Makefile           | 2 +-
> 
> Missing version update in lib/librte_eventdev/meson.build. See version=
> 
> >  5 files changed, 9 insertions(+), 7 deletions(-)
> > 
> >  ABI Changes
> >  -----------
> > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> >       librte_distributor.so.1
> >       librte_eal.so.8
> >       librte_ethdev.so.10
> > -     librte_eventdev.so.4
> > +   + librte_eventdev.so.6
> 
> Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> release notes. The version(change to librte_eventdev.so.5) should have been 
> updated in change set in 3810ae4357.
> 
> +Thomas,
> In case if he has difference in opinion on updating released release note file.

I prefer such changes being atomic.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] eventdev: fix port id argument in Rx adapter caps API
    2018-09-25  8:49  4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25  9:49  4% ` Nikhil Rao
  1 sibling, 0 replies; 200+ results
From: Nikhil Rao @ 2018-09-25  9:49 UTC (permalink / raw)
  To: jerin.jacob; +Cc: dev, Nikhil Rao, stable

Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.

Also, update the event rx adapter test to use 16 bit
ethernet port ids.

Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---

v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)

v3:
* update meson.build (Jerin Jacob)

 lib/librte_eventdev/rte_eventdev.h     | 2 +-
 lib/librte_eventdev/rte_eventdev.c     | 2 +-
 test/test/test_event_eth_rx_adapter.c  | 6 +++---
 doc/guides/rel_notes/release_18_11.rst | 4 +++-
 lib/librte_eventdev/Makefile           | 2 +-
 lib/librte_eventdev/meson.build        | 2 +-
 6 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
  *
  */
 int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
 				uint32_t *caps);
 
 #define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
 }
 
 int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
 				uint32_t *caps)
 {
 	struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
 static struct event_eth_rx_adapter_test_params default_params;
 
 static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
 		struct rte_mempool *mp)
 {
 	const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
 }
 
 static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
 {
 	static const struct rte_eth_conf port_conf_default = {
 		.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
 }
 
 static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
 {
 	static const struct rte_eth_conf port_conf_default = {
 		.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+  has been changed from uint8_t to uint16_t.
 
 ABI Changes
 -----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_distributor.so.1
      librte_eal.so.8
      librte_ethdev.so.10
-     librte_eventdev.so.4
+   + librte_eventdev.so.6
      librte_flow_classify.so.1
      librte_gro.so.1
      librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_eventdev.a
 
 # library version
-LIBABIVER := 5
+LIBABIVER := 6
 
 # build flags
 CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/lib/librte_eventdev/meson.build b/lib/librte_eventdev/meson.build
index 3cbaf29..3c4e510 100644
--- a/lib/librte_eventdev/meson.build
+++ b/lib/librte_eventdev/meson.build
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 5
+version = 6
 allow_experimental_apis = true
 
 if host_machine.system() == 'linux'
-- 
1.8.3.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
  2018-09-25  8:49  4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25  9:15  0%   ` Jerin Jacob
  2018-09-25  9:50  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-25  9:15 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: dev, stable, thomas

-----Original Message-----
> Date: Tue, 25 Sep 2018 14:19:12 +0530
> From: Nikhil Rao <nikhil.rao@intel.com>
> To: jerin.jacob@caviumnetworks.com
> CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> X-Mailer: git-send-email 1.8.3.1
> 
> 
> Make the ethernet port id passed into
> rte_event_eth_rx_adapter_caps_get() 16 bit.
> 
> Also, update the event rx adapter test to use 16 bit
> ethernet port ids.
> 
> Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
> 
> v2:
> * squash changes to autotest and library into a single patch (Jerin Jacob)
> * add update to release notes (Jerin Jacob)
> 
>  lib/librte_eventdev/rte_eventdev.h     | 2 +-
>  lib/librte_eventdev/rte_eventdev.c     | 2 +-
>  test/test/test_event_eth_rx_adapter.c  | 6 +++---
>  doc/guides/rel_notes/release_18_11.rst | 4 +++-
>  lib/librte_eventdev/Makefile           | 2 +-

Missing version update in lib/librte_eventdev/meson.build. See version=

>  5 files changed, 9 insertions(+), 7 deletions(-)
> 
>  ABI Changes
>  -----------
> @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
>       librte_distributor.so.1
>       librte_eal.so.8
>       librte_ethdev.so.10
> -     librte_eventdev.so.4
> +   + librte_eventdev.so.6

Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
release notes. The version(change to librte_eventdev.so.5) should have been 
updated in change set in 3810ae4357.

+Thomas,
In case if he has difference in opinion on updating released release note file.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
  @ 2018-09-25  8:49  4% ` Nikhil Rao
  2018-09-25  9:15  0%   ` Jerin Jacob
  2018-09-25  9:49  4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
  1 sibling, 1 reply; 200+ results
From: Nikhil Rao @ 2018-09-25  8:49 UTC (permalink / raw)
  To: jerin.jacob; +Cc: dev, Nikhil Rao, stable

Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.

Also, update the event rx adapter test to use 16 bit
ethernet port ids.

Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---

v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)

 lib/librte_eventdev/rte_eventdev.h     | 2 +-
 lib/librte_eventdev/rte_eventdev.c     | 2 +-
 test/test/test_event_eth_rx_adapter.c  | 6 +++---
 doc/guides/rel_notes/release_18_11.rst | 4 +++-
 lib/librte_eventdev/Makefile           | 2 +-
 5 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
  *
  */
 int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
 				uint32_t *caps);
 
 #define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
 }
 
 int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
 				uint32_t *caps)
 {
 	struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
 static struct event_eth_rx_adapter_test_params default_params;
 
 static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
 		struct rte_mempool *mp)
 {
 	const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
 }
 
 static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
 {
 	static const struct rte_eth_conf port_conf_default = {
 		.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
 }
 
 static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
 {
 	static const struct rte_eth_conf port_conf_default = {
 		.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+  has been changed from uint8_t to uint16_t.
 
 ABI Changes
 -----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_distributor.so.1
      librte_eal.so.8
      librte_ethdev.so.10
-     librte_eventdev.so.4
+   + librte_eventdev.so.6
      librte_flow_classify.so.1
      librte_gro.so.1
      librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_eventdev.a
 
 # library version
-LIBABIVER := 5
+LIBABIVER := 6
 
 # build flags
 CFLAGS += -DALLOW_EXPERIMENTAL_API
-- 
1.8.3.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
  2018-09-24 17:31  4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
@ 2018-09-24 17:12  0%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2018-09-24 17:12 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, dev, Thomas Monjalon

On Mon, Sep 24, 2018 at 7:31 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> Document changes done in
> commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  doc/guides/rel_notes/release_18_11.rst | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index 2f53564a9..41b9cd8d5 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -112,6 +112,12 @@ API Changes
>    flag the MAC can be properly configured in any case. This is particularly
>    important for bonding.
>
> +* The default behaviour of CRC strip offload changed. Without any specific Rx
> +  offload flag, default behavior by PMD is now to strip CRC.
> +  DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
> +  To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
> +  offload.
> +
>
>  ABI Changes
>  -----------

Reviewed-by: David Marchand <david.marchand@6wind.com>


-- 
David Marchand

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
  @ 2018-09-24 17:31  4% ` Ferruh Yigit
  2018-09-24 17:12  0%   ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-09-24 17:31 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic
  Cc: dev, Ferruh Yigit, Thomas Monjalon, david.marchand

Document changes done in
commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2f53564a9..41b9cd8d5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -112,6 +112,12 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* The default behaviour of CRC strip offload changed. Without any specific Rx
+  offload flag, default behavior by PMD is now to strip CRC.
+  DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
+  To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
+  offload.
+
 
 ABI Changes
 -----------
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external
    @ 2018-09-21 16:13 16%   ` Anatoly Burakov
  2018-09-21 16:13  4%   ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_elem.c           | 10 ++++--
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/common/rte_malloc.c            |  2 +-
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 24 files changed, 133 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
 	contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
 
 	/* if we're in IOVA as VA mode, or if we're in legacy mode with
-	 * hugepages, all elements are IOVA-contiguous.
+	 * hugepages, all elements are IOVA-contiguous. however, we can only
+	 * make these assumptions about internal memory - externally allocated
+	 * segments have to be checked.
 	 */
-	if (rte_eal_iova_mode() == RTE_IOVA_VA ||
-			(internal_config.legacy_mem && rte_eal_has_hugepages()))
+	if (!elem->msl->external &&
+			(rte_eal_iova_mode() == RTE_IOVA_VA ||
+				(internal_config.legacy_mem &&
+					rte_eal_has_hugepages())))
 		return RTE_PTR_DIFF(data_end, contig_seg_start);
 
 	cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
 	if (elem == NULL)
 		return RTE_BAD_IOVA;
 
-	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+	if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
 		return (uintptr_t) addr;
 
 	ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID
      2018-09-21 16:13 16%   ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-21 16:13  4%   ` Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-12 20:37  2%         ` Honnappa Nagarahalli
@ 2018-09-20 19:50  0%           ` Michel Machado
  0 siblings, 0 replies; 200+ results
From: Michel Machado @ 2018-09-20 19:50 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang

On 09/12/2018 04:37 PM, Honnappa Nagarahalli wrote:
>>> +int32_t
>>> +rte_hash_iterator_init(const struct rte_hash *h,
>>> +	struct rte_hash_iterator_state *state) {
>>> +	struct rte_hash_iterator_istate *__state;
>>> '__state' can be replaced by 's'.
>>>
>>> +
>>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>>> +
>>> +	__state = (struct rte_hash_iterator_istate *)state;
>>> +	__state->h = h;
>>> +	__state->next = 0;
>>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>>> +
>>> +	return 0;
>>> +}
>>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>>
>>       Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>>
>> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
> 
>      It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.
> 
>      About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.
> 
> As my tests showed, I do not see any impact of this.

    Ok, we are going to eliminate the init functions in v4.

>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>> index 9e7d9315f..fdb01023e 100644
>>> --- a/lib/librte_hash/rte_hash.h
>>> +++ b/lib/librte_hash/rte_hash.h
>>> @@ -14,6 +14,8 @@
>>>     #include <stdint.h>
>>>     #include <stddef.h>
>>>     
>>> +#include <rte_compat.h>
>>> +
>>>     #ifdef __cplusplus
>>>     extern "C" {
>>>     #endif
>>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>>     /** @internal A hash table structure. */  struct rte_hash;
>>>     
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * @internal A hash table iterator state structure.
>>> + */
>>> +struct rte_hash_iterator_state {
>>> +	uint8_t space[64];
>>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>>
>>       Okay.
> 
>      I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.
> 
> Ok. May be have a #define for 64?

    Ok.

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
  2018-08-27 12:24  3% [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries Anatoly Burakov
  2018-09-19  8:56  3% ` Thomas Monjalon
@ 2018-09-20 15:41 17% ` Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-20 15:41 UTC (permalink / raw)
  To: dev; +Cc: John McNamara, Marko Kovacevic, thomas

Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.

Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.

This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.

[1] http://patches.dpdk.org/patch/43502/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Added documentation on ABI break

 doc/guides/rel_notes/rel_description.rst      |  5 +++++
 doc/guides/rel_notes/release_18_11.rst        |  6 +++++-
 .../common/include/rte_eal_memconfig.h        |  4 ++++
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 20 +++++++++++++++++++
 lib/librte_eal/meson.build                    |  2 +-
 6 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst
index 8f285566f..3fd289939 100644
--- a/doc/guides/rel_notes/rel_description.rst
+++ b/doc/guides/rel_notes/rel_description.rst
@@ -10,3 +10,8 @@ release version |release| and previous releases.
 It lists new features, fixed bugs, API and ABI changes and known issues.
 
 For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide <linux_gsg>`.
+
+* eal: new ABI version for EAL library due to adding ``legacy_mem`` and
+       ``single_file_segments`` values to ``rte_config`` structure on account of
+       improving DPDK usability when using either ``--legacy-mem`` or
+       ``--single-file-segments`` flags.
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..34acf01d9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -83,6 +83,10 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: added ``legacy_mem`` and ``single_file_segments`` values to
+       ``rte_config`` structure on account of improving DPDK usability when
+       using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+
 
 Removed Items
 -------------
@@ -129,7 +133,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..62a21c2dc 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -77,6 +77,10 @@ struct rte_mem_config {
 	 * exact same address the primary process maps it.
 	 */
 	uint64_t mem_cfg_addr;
+
+	/* legacy mem and single file segments options are shared */
+	uint32_t legacy_mem;
+	uint32_t single_file_segments;
 } __attribute__((__packed__));
 
 
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..4a55d3b69 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -352,6 +352,24 @@ eal_proc_type_detect(void)
 	return ptype;
 }
 
+/* copies data from internal config to shared config */
+static void
+eal_update_mem_config(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	mcfg->legacy_mem = internal_config.legacy_mem;
+	mcfg->single_file_segments = internal_config.single_file_segments;
+}
+
+/* copies data from shared config to internal config */
+static void
+eal_update_internal_config(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	internal_config.legacy_mem = mcfg->legacy_mem;
+	internal_config.single_file_segments = mcfg->single_file_segments;
+}
+
 /* Sets up rte_config structure with the pointer to shared memory config.*/
 static void
 rte_config_init(void)
@@ -361,11 +379,13 @@ rte_config_init(void)
 	switch (rte_config.process_type){
 	case RTE_PROC_PRIMARY:
 		rte_eal_config_create();
+		eal_update_mem_config();
 		break;
 	case RTE_PROC_SECONDARY:
 		rte_eal_config_attach();
 		rte_eal_mcfg_wait_complete(rte_config.mem_config);
 		rte_eal_config_reattach();
+		eal_update_internal_config();
 		break;
 	case RTE_PROC_AUTO:
 	case RTE_PROC_INVALID:
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
-- 
2.17.1

^ permalink raw reply	[relevance 17%]

* [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID
      2018-09-20 11:36 16%     ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20 11:36  4%     ` Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs, arybchenko

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external
    @ 2018-09-20 11:36 16%     ` Anatoly Burakov
  2018-09-20 11:36  4%     ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---

Notes:
    v3:
    - Add comment to explain the process of picking up minimum
      page sizes for mempool
    
    v2:
    - Add documentation changes and ABI break
    
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 --------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 ++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 ++--
 lib/librte_eal/common/eal_common_memory.c     |  3 ++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 +++++
 lib/librte_eal/common/malloc_heap.c           |  9 +++--
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 +++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 ++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 35 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 22 files changed, 125 insertions(+), 40 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
   flag the MAC can be properly configured in any case. This is particularly
   important for bonding.
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -107,6 +114,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
 	unsigned int len;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	/*
+	 * we need to only look at page sizes available for a particular socket
+	 * ID.  so, we either need an exact match on socket ID (can match both
+	 * native and external memory), or, if SOCKET_ID_ANY was specified as a
+	 * socket ID argument, we must only look at native memory and ignore any
+	 * page sizes associated with external memory.
+	 */
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (valid && msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[relevance 16%]

* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  2018-09-20  9:30  0%         ` Andrew Rybchenko
@ 2018-09-20  9:54  0%           ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-20  9:54 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	thomas

On 20-Sep-18 10:30 AM, Andrew Rybchenko wrote:
> On 9/19/18 4:56 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> A couple of minor questions/suggestions below, but it is OK to
> go as is even if rejected.
> 
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> <...>
> 
>> diff --git a/lib/librte_mempool/rte_mempool.c 
>> b/lib/librte_mempool/rte_mempool.c
>> index 03e6b5f73..d61c77da3 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned 
>> obj_size)
>>       return new_obj_size * RTE_MEMPOOL_ALIGN;
>>   }
>> +struct pagesz_walk_arg {
>> +    int socket_id;
>> +    size_t min;
>> +};
>> +
>>   static int
>>   find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
>>   {
>> -    size_t *min = arg;
>> +    struct pagesz_walk_arg *wa = arg;
>> +    bool valid;
>> -    if (msl->page_sz < *min)
>> -        *min = msl->page_sz;
>> +    valid = msl->socket_id == wa->socket_id;
> 
> Is it intended that we accept externally allocated segment
> if it is on requested socket? If so, it would be good to add
> comment to explain why.

Accepting externally allocated segments is precisely the point here - we 
want to find page size of underlying memory, regardless of whether it's 
internal or external. We use socket ID to identify valid page sizes for 
a particular heap (since socket ID is technically a heap identifier, as 
far as external code is concerned), but within that heap there can be 
multiple segment lists corresponding to that socket ID, each with its 
own page size.

> 
>> +    valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
>> +
>> +    if (!valid)
>> +        return 0;
>> +
>> +    if (msl->page_sz < wa->min)
>> +        wa->min = msl->page_sz;
> 
> I'd suggest to keep single return (it is just a bit shorter)
> if (valid && msl->page_sz < wa->min)
>           wa->min = msl->page_sz;

Sure. If there will be other comments that warrant a v3 respin, i'll 
incorporate this feedback :)

Thanks for the review!

> 
> <...>
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  2018-09-19 13:56 16%       ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20  9:30  0%         ` Andrew Rybchenko
  2018-09-20  9:54  0%           ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-20  9:30 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
	winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
	srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
	thomas

On 9/19/18 4:56 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>

A couple of minor questions/suggestions below, but it is OK to
go as is even if rejected.

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

<...>

> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 03e6b5f73..d61c77da3 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
>   	return new_obj_size * RTE_MEMPOOL_ALIGN;
>   }
>   
> +struct pagesz_walk_arg {
> +	int socket_id;
> +	size_t min;
> +};
> +
>   static int
>   find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
>   {
> -	size_t *min = arg;
> +	struct pagesz_walk_arg *wa = arg;
> +	bool valid;
>   
> -	if (msl->page_sz < *min)
> -		*min = msl->page_sz;
> +	valid = msl->socket_id == wa->socket_id;

Is it intended that we accept externally allocated segment
if it is on requested socket? If so, it would be good to add
comment to explain why.

> +	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
> +
> +	if (!valid)
> +		return 0;
> +
> +	if (msl->page_sz < wa->min)
> +		wa->min = msl->page_sz;

I'd suggest to keep single return (it is just a bit shorter)
if (valid && msl->page_sz < wa->min)
          wa->min = msl->page_sz;

<...>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
  @ 2018-09-19 13:56 16%       ` Anatoly Burakov
  2018-09-20  9:30  0%         ` Andrew Rybchenko
  2018-09-19 13:56  4%       ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
  1 sibling, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
	Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
	Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
	Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, thomas

When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.

This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.

All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.

Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v1:
    - Adjust all calls to memseg walk functions to ignore external
      segments where it made sense to do so

 doc/guides/rel_notes/deprecation.rst          | 15 ---------
 doc/guides/rel_notes/release_18_11.rst        | 12 ++++++-
 drivers/bus/fslmc/fslmc_vfio.c                |  7 +++--
 drivers/net/mlx4/mlx4_mr.c                    |  3 ++
 drivers/net/mlx5/mlx5.c                       |  5 ++-
 drivers/net/mlx5/mlx5_mr.c                    |  3 ++
 drivers/net/virtio/virtio_user/vhost_kernel.c |  5 ++-
 lib/librte_eal/bsdapp/eal/Makefile            |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c               |  3 ++
 lib/librte_eal/bsdapp/eal/eal_memory.c        |  7 +++--
 lib/librte_eal/common/eal_common_memory.c     |  4 +++
 .../common/include/rte_eal_memconfig.h        |  1 +
 lib/librte_eal/common/include/rte_memory.h    |  9 ++++++
 lib/librte_eal/common/malloc_heap.c           |  9 ++++--
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 10 +++++-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  9 ++++++
 lib/librte_eal/linuxapp/eal/eal_vfio.c        | 17 +++++++---
 lib/librte_eal/meson.build                    |  2 +-
 lib/librte_mempool/rte_mempool.c              | 31 ++++++++++++++-----
 test/test/test_malloc.c                       |  3 ++
 test/test/test_memzone.c                      |  3 ++
 22 files changed, 122 insertions(+), 40 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e2dbee317..12122cb55 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: certain structures will change in EAL on account of upcoming external
-  memory support. Aside from internal changes leading to an ABI break, the
-  following externally visible changes will also be implemented:
-
-  - ``rte_memseg_list`` will change to include a boolean flag indicating
-    whether a particular memseg list is externally allocated. This will have
-    implications for any users of memseg-walk-related functions, as they will
-    now have to skip externally allocated segments in most cases if the intent
-    is to only iterate over internal DPDK memory.
-  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
-    as some socket ID's will now be representing externally allocated memory. No
-    changes will be required for existing code as backwards compatibility will
-    be kept, and those who do not use this feature will not see these extra
-    socket ID's.
-
 * eal: both declaring and identifying devices will be streamlined in v18.11.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..e2cbc82da 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -68,6 +68,13 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: The following API changes were made in 18.11:
+
+  - ``rte_memseg_list`` structure now has an additional flag indicating whether
+    the memseg list is externally allocated. This will have implications for any
+    users of memseg-walk-related functions, as they will now have to skip
+    externally allocated segments in most cases if the intent is to only iterate
+    over internal DPDK memory.
 
 ABI Changes
 -----------
@@ -84,6 +91,9 @@ ABI Changes
    =========================================================
 
 
+* eal: EAL library ABI version was changed due to previously announced work on
+       supporting external memory in DPDK.
+
 Removed Items
 -------------
 
@@ -129,7 +139,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
      librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.8
+   + librte_eal.so.9
      librte_ethdev.so.10
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
 }
 
 static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
-		 const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *n_segs = arg;
 	int ret;
 
+	if (msl->external)
+		return 0;
+
 	ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
 	if (ret)
 		DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e2..d9ed15880 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
 static void *uar_base;
 
 static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	void **addr = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (*addr == NULL)
 		*addr = ms->addr;
 	else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
 {
 	struct mr_find_contig_memsegs_data *data = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
 		return 0;
 	/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..885c59c8a 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
 	uint32_t region_nr;
 };
 static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, size_t len, void *arg)
 {
 	struct walk_arg *wa = arg;
 	struct vhost_memory_region *mr;
 	void *start_addr;
 
+	if (msl->external)
+		return 0;
+
 	if (wa->region_nr >= max_regions)
 		return -1;
 
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
 		return 1;
 
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
 	int seg_idx;
 };
 static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	struct attach_walk_args *wa = arg;
 	void *addr;
 
+	if (msl->external)
+		return 0;
+
 	addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
 			MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
 			wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0868bf681..55a11bf4d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
 {
 	uint64_t *total_len = arg;
 
+	if (msl->external)
+		return 0;
+
 	*total_len += msl->memseg_arr.count * msl->page_sz;
 
 	return 0;
@@ -547,6 +550,7 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
 	return ret;
 }
 
+
 /* init memory subsystem */
 int
 rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
 	size_t len; /**< Length of memory area covered by this memseg list. */
 	int socket_id; /**< Socket ID for all memsegs in this list. */
 	uint64_t page_sz; /**< Page size for all memsegs in this list. */
+	unsigned int external; /**< 1 if this list points to external memory */
 	volatile uint32_t version; /**< version number for multiprocess sync. */
 	struct rte_fbarray memseg_arr;
 };
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..b381d1cb6 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
  *       be used within memory-related callback functions.
  *
+ * @note This function will also walk through externally allocated segments. It
+ *       is up to the user to decide whether to skip through these segments.
+ *
  * @param func
  *   Iterator function
  * @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 12aaf2d72..8c37b9d7c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
 	struct malloc_heap *heap;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	heap = &mcfg->malloc_heaps[msl->socket_id];
 
 	/* msl is const, so find it */
@@ -756,8 +759,10 @@ malloc_heap_free(struct malloc_elem *elem)
 	/* anything after this is a bonus */
 	ret = 0;
 
-	/* ...of which we can't avail if we are in legacy mode */
-	if (internal_config.legacy_mem)
+	/* ...of which we can't avail if we are in legacy mode, or if this is an
+	 * externally allocated segment.
+	 */
+	if (internal_config.legacy_mem || msl->external)
 		goto free_unlock;
 
 	/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
 {
 	int *socket_id = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket_id == msl->socket_id;
 }
 
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
 		void *arg __rte_unused)
 {
 	/* ms is const, so find this memseg */
-	struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+	struct rte_memseg *found;
+
+	if (msl->external)
+		return 0;
+
+	found = rte_mem_virt2memseg(ms->addr, msl);
 
 	found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d040a2f71..8b0bbe43f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1250,6 +1250,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
 	unsigned int i;
 	int msl_idx;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1298,6 +1301,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
 	char name[PATH_MAX];
 	int msl_idx, ret;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	primary_msl = &mcfg->memsegs[msl_idx];
 	local_msl = &local_memsegs[msl_idx];
@@ -1328,6 +1334,9 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
 	int msl_idx;
 	int *data;
 
+	if (msl->external)
+		return 0;
+
 	msl_idx = msl - mcfg->memsegs;
 	len = msl->memseg_arr.len;
 
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
 }
 
 static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
-		const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+		void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
 }
 
 static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	int *vfio_container_fd = arg;
 
+	if (msl->external)
+		return 0;
+
 	return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
 			ms->len, 1);
 }
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
 	uint64_t hugepage_sz;
 };
 static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
 		const struct rte_memseg *ms, void *arg)
 {
 	struct spapr_walk_param *param = arg;
 	uint64_t max = ms->iova + ms->len;
 
+	if (msl->external)
+		return 0;
+
 	if (max > param->window_size) {
 		param->hugepage_sz = ms->hugepage_sz;
 		param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..d61c77da3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+struct pagesz_walk_arg {
+	int socket_id;
+	size_t min;
+};
+
 static int
 find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
-	size_t *min = arg;
+	struct pagesz_walk_arg *wa = arg;
+	bool valid;
 
-	if (msl->page_sz < *min)
-		*min = msl->page_sz;
+	valid = msl->socket_id == wa->socket_id;
+	valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+	if (!valid)
+		return 0;
+
+	if (msl->page_sz < wa->min)
+		wa->min = msl->page_sz;
 
 	return 0;
 }
 
 static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
 {
-	size_t min_pagesz = SIZE_MAX;
+	struct pagesz_walk_arg wa;
 
-	rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+	wa.min = SIZE_MAX;
+	wa.socket_id = socket_id;
 
-	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+	rte_memseg_list_walk(find_min_pagesz, &wa);
+
+	return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
 }
 
 
@@ -470,7 +485,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 		pg_sz = 0;
 		pg_shift = 0;
 	} else if (try_contig) {
-		pg_sz = get_min_page_size();
+		pg_sz = get_min_page_size(mp->socket_id);
 		pg_shift = rte_bsf32(pg_sz);
 	} else {
 		pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
 {
 	int32_t *socket = arg;
 
+	if (msl->external)
+		return 0;
+
 	return *socket == msl->socket_id;
 }
 
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
 {
 	struct walk_arg *wa = arg;
 
+	if (msl->external)
+		return 0;
+
 	if (msl->page_sz == RTE_PGSIZE_2M)
 		wa->hugepage_2MB_avail = 1;
 	if (msl->page_sz == RTE_PGSIZE_1G)
-- 
2.17.1

^ permalink raw reply	[relevance 16%]

* [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID
    2018-09-19 13:56 16%       ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-19 13:56  4%       ` Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
	laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
	janos.kobor, geza.koblo, srinath.mannam, scott.branden,
	ajit.khaparde, keith.wiles, bruce.richardson, thomas,
	shreyansh.jain, shahafs

We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.

This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst     | 7 +++++++
 lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
 lib/librte_eal/common/malloc_heap.c        | 2 +-
 lib/librte_eal/common/rte_malloc.c         | 4 ----
 4 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e2cbc82da..c04685d17 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -75,6 +75,13 @@ API Changes
     users of memseg-walk-related functions, as they will now have to skip
     externally allocated segments in most cases if the intent is to only iterate
     over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK has gained additional
+    meaning, as some socket ID's will now be representing externally allocated
+    memory. No changes will be required for existing code as backwards
+    compatibility will be kept, and those who do not use this feature will not
+    see these extra socket ID's. Any new API's must not check socket ID
+    parameters themselves, and must instead leave it to the memory subsystem to
+    decide whether socket ID is a valid one.
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 		return NULL;
 	}
 
-	if ((socket_id != SOCKET_ID_ANY) &&
-	    (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+	if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
 		rte_errno = EINVAL;
 		return NULL;
 	}
 
-	if (!rte_eal_has_hugepages())
+	/* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+	 * external heap.
+	 */
+	if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
 		socket_id = SOCKET_ID_ANY;
 
 	contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c4d303533..1dcb1de8f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -649,7 +649,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 	if (size == 0 || (align && !rte_is_power_of_2(align)))
 		return NULL;
 
-	if (!rte_eal_has_hugepages())
+	if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
 		socket_arg = SOCKET_ID_ANY;
 
 	if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
 	if (!rte_eal_has_hugepages())
 		socket_arg = SOCKET_ID_ANY;
 
-	/* Check socket parameter */
-	if (socket_arg >= RTE_MAX_NUMA_NODES)
-		return NULL;
-
 	return malloc_heap_alloc(type, size, socket_arg, 0,
 			align == 0 ? 1 : align, 0, false);
 }
-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries
  2018-08-27 12:24  3% [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries Anatoly Burakov
@ 2018-09-19  8:56  3% ` Thomas Monjalon
  2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-19  8:56 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

27/08/2018 14:24, Anatoly Burakov:
> Currently, command-line switches for legacy mem mode or single-file
> segments mode are only stored in internal config. This leads to a
> situation where these flags have to always match between primary
> and secondary, which is bad for usability.
> 
> Fix this by storing these flags in the shared config as well, so
> that secondary process can know if the primary was launched in
> single-file segments or legacy mem mode.
> 
> This bumps the EAL ABI, however there's an EAL deprecation notice
> already in place[1] for a different feature, so that's OK.

You need to update the release notes:
	- ABI change section
	- library version section

Thanks

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mbuf: remove deprecated segment free functions
  2018-09-17 12:45  8% ` [dpdk-dev] [PATCH v2] " David Marchand
@ 2018-09-19  8:34  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-19  8:34 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, olivier.matz, arybchenko

17/09/2018 14:45, David Marchand:
> __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> a long time now (early 17.05), are not part of the abi and are easily
> replaced with existing api.
> 
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-09-18 15:53  0%                               ` Ferruh Yigit
@ 2018-09-19  5:37  0%                                 ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-09-19  5:37 UTC (permalink / raw)
  To: Ferruh Yigit, Jerin Jacob, Kokkilagadda, Kiran
  Cc: Ola Liljedahl, Gavin Hu (Arm Technology China),
	Jacob, Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China),
	Bruce Richardson, Konstantin Ananyev



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Tuesday, September 18, 2018 10:54 AM
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Kokkilagadda, Kiran
> <Kiran.Kokkilagadda@cavium.com>
> Cc: Ola Liljedahl <Ola.Liljedahl@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>;
> dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>;
> Phil Yang (Arm Technology China) <Phil.Yang@arm.com>; Bruce Richardson
> <bruce.richardson@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> 
> On 9/14/2018 3:45 AM, Jerin Jacob wrote:
> > -----Original Message-----
> >> Date: Thu, 13 Sep 2018 23:45:31 +0000
> >> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
> >>  <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
> >>  <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin"
> >>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >> <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper
> >> <Steve.Capper@arm.com>, "Phil Yang (Arm  Technology China)"
> >> <Phil.Yang@arm.com>
> >> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >> synchronization
> >>
> >> External Email
> >>
> >> -----Original Message-----
> >>> Date: Thu, 13 Sep 2018 17:40:53 +0000
> >>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> >>> <Ola.Liljedahl@arm.com>
> >>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
> >>> (Arm  Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> >>> <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >>>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >>> <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper
> >>> <Steve.Capper@arm.com>, "Phil Yang (Arm Technology China)"
> >>> <Phil.Yang@arm.com>
> >>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>> synchronization
> >>>
> >>>
> >>> Hi Jerin,
> >>>         Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL',
> which is specific to rte_ring? I do not see a need for choosing only some
> algorithms to work with C11 model. I suggest that we change this to
> 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
> >>
> >>
> >> Yes. Makes sense to me to keep only single config option.
> >>
> >> rte_ring has 2 sets of algorithms for Arm architecture, one with C11
> memory model and the other with barriers. Going forward (for ex: for KNI), I
> think we should support C11 memory model only and skip the barriers.
> >
> > IMO, Both should be supported and set N as in the config/common_base.
> > Based on architecture or micro architecture the performance can vary.
> > So keeping both options and allowing to override to arch/micro arch
> > specific config file makes sense to me.(like existing model, as smp_*
> > ops are compiler NOP for x86)
> 
> Hi Jerin, Honnappa,  Kiran,
> 
> Will there be a new version for this release?
> 
> I can see two options:
> 1- Add read/write barriers for both library and kernel parts.
> 2- Use c11 atomics
>   2a- change existing RTE_RING_USE_C11_MEM_MODEL to
> RTE_USE_C11_MEM_MODEL
>   2b- Use RTE_USE_C11_MEM_MODEL to implement c11 atomic for arm and
> ppc
> 
> 2) seems agreed on, but is it clear who will work on it?

Sorry for the late reply. We have implemented 2), currently undergoing internal review. We will get this out today. We will work through the community reviews quickly after that.

> 
> And 1) looks easier to implement, if 2) won't make time for release can we
> fallback to this one?
> 
> Thanks,
> ferruh
> 
> >
> >> Also, do you see any issues in making C11 memory model default for Arm
> architecture?
> >
> > It is already set default Y to arm64. see config/common_armv8a_linuxapp.
> >
> > And it is possible for micro architecture to override, see
> > config/defconfig_arm64-thunderx-linuxapp-gcc
> >
> >
> >>
> >>>
> >>> Thank you,
> >>> Honnappa
> >>>
> >>> -----Original Message-----
> >>> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >>> Sent: Wednesday, August 29, 2018 3:58 AM
> >>> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
> >>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
> >>> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
> >>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
> >>> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>> synchronization
> >>>
> >>> -----Original Message-----
> >>>> Date: Wed, 29 Aug 2018 08:47:56 +0000
> >>>> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> >>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> >>>> <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin"
> >>>>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >>>> <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper
> >>>> <Steve.Capper@arm.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>> synchronization
> >>>> user-agent: Microsoft-MacOutlook/10.10.0.180812
> >>>>
> >>>>
> >>>> There was a mention of rte_ring which is a different data structure. But
> perhaps I misunderstood why this was mentioned and the idea was only to
> use the C11 memory model as is also used in rte_ring nowadays.
> >>>>
> >>>> But why would we have different code for x86 and for other
> architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC
> __atomic builtins), the code generated for x86 will be the same.
> __atomic_load(__ATOMIC_ACQUIRE) and
> __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and
> stores on x86?
> >>>
> >>> # One reason was __atomic builtins  primitives were implemented in gcc
> 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> >>> # The theme was no change in the existing code for x86.I am not sure
> about the code generation for x86 with __atomic builtins, I let x86
> maintainers to comments on this.
> >>>
> >>>
> >>>>
> >>>> -- Ola
> >>>>
> >>>> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com>
> wrote:
> >>>>
> >>>>     -----Original Message-----
> >>>>     > Date: Wed, 29 Aug 2018 07:34:34 +0000
> >>>>     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>>>     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>,
> Honnappa
> >>>>     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> <Gavin.Hu@arm.com>,
> >>>>     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >>>>     >  <Jerin.JacobKollanukkaran@cavium.com>
> >>>>     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve
> Capper
> >>>>     >  <Steve.Capper@arm.com>
> >>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>>     >  synchronization
> >>>>     > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >>>>     >
> >>>>     > Is the rte_kni kernel/user binary interface subject to backwards
> compatibility requirements? Or can we change it for a new DPDK release?
> >>>>
> >>>>     What would be the change in interface? Is it removing the volatile for
> >>>>     C11 case, Then you can use anonymous union OR #define to keep the
> size
> >>>>     and offset of the element intact.
> >>>>
> >>>>     struct rte_kni_fifo {
> >>>>     #ifndef RTE_C11...
> >>>>             volatile unsigned write;     /**< Next position to be written*/
> >>>>             volatile unsigned read;      /**< Next position to be read */
> >>>>     #else
> >>>>             unsigned write;     /**< Next position to be written*/
> >>>>             unsigned read;      /**< Next position to be read */
> >>>>     #endif
> >>>>             unsigned len;                /**< Circular buffer length */
> >>>>             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
> >>>>             void *volatile buffer[];     /**< The buffer contains mbuf
> >>>>     pointers */
> >>>>     };
> >>>>
> >>>>     Anonymous union example:
> >>>>     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >>>>
> >>>>     You can check the ABI breakage by devtools/validate-abi.sh
> >>>>
> >>>>     >
> >>>>     > -- Ola
> >>>>     >
> >>>>     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> >>>>     > Date: Wednesday, 29 August 2018 at 07:50
> >>>>     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
> Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> >>>>     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola
> Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> >>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>>     >
> >>>>     >
> >>>>     > Agreed. Please go a head and make the changes. You need to make
> same change in kernel side also. And please use c11 ring (see rte_ring)
> mechanism so that it won't impact other platforms like intel. We need this
> change just for arm and ppc.
> >>>>     >
> >>>>     > ________________________________
> >>>>     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>>>     > Sent: Wednesday, August 29, 2018 10:29 AM
> >>>>     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> >>>>     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> >>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>>     >
> >>>>     >
> >>>>     > External Email
> >>>>     >
> >>>>     > I agree with Gavin here. Store to fifo->write and fifo->read can get
> hoisted resulting in accessing invalid buffer array entries or over writing of the
> buffer array entries.
> >>>>     >
> >>>>     > IMO, we should solve this using c11 atomics. This will also help
> remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > If you want us to put together a patch with this idea, please let us
> know.
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > Thank you,
> >>>>     >
> >>>>     > Honnappa
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > From: Gavin Hu
> >>>>     > Sent: Tuesday, August 28, 2018 2:31 PM
> >>>>     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh
> Yigit <ferruh.yigit@intel.com>; Jacob, Jerin
> <Jerin.JacobKollanukkaran@cavium.com>
> >>>>     > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl
> <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > Assuming reader and writer may execute on different CPU's, this
> become standard multithreaded programming.
> >>>>     >
> >>>>     > We are concerned about that update the reader pointer too
> early(weak ordering may reorder it before reading from the slots), that means
> the slots are released and may immediately overwritten by the writer then
> you get “too new” data and get lost of the old data.
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > From: Kokkilagadda, Kiran
> <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> >>>>     > Sent: Tuesday, August 28, 2018 6:44 PM
> >>>>     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>;
> Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob,
> Jerin
> <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@ca
> vium.com>>
> >>>>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> >>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > In this instance there won't be any problem, as until the value of
> fifo->write changes, this loop won't get executed. As of now we didn't see any
> issue with it and for performance reasons, we don't want to keep read barrier.
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > ________________________________
> >>>>     >
> >>>>     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> >>>>     > Sent: Monday, August 27, 2018 9:10 PM
> >>>>     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> >>>>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> >>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>>     >
> >>>>     >
> >>>>     >
> >>>>     > External Email
> >>>>     >
> >>>>     > This fix is not complete, kni_fifo_get requires a read fence also,
> otherwise it probably gets stale data on a weak ordering platform.
> >>>>     >
> >>>>     > > -----Original Message-----
> >>>>     > > From: dev <dev-bounces@dpdk.org<mailto:dev-
> bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> >>>>     > > Sent: Monday, August 27, 2018 10:08 PM
> >>>>     > > To: Kiran Kumar
> <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetwor
> ks.com>>;
> >>>>     > >
> jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> >>>>     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> >>>>     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>>     > > synchronization
> >>>>     > >
> >>>>     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> >>>>     > > > With existing code in kni_fifo_put, rx_q values are not being
> updated
> >>>>     > > > before updating fifo_write. While reading rx_q in
> kni_net_rx_normal,
> >>>>     > > > This is causing the sync issue on other core. So adding a write
> >>>>     > > > barrier to make sure the values being synced before updating
> fifo_write.
> >>>>     > > >
> >>>>     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> >>>>     > > >
> >>>>     > > > Signed-off-by: Kiran Kumar
> <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetwor
> ks.com>>
> >>>>     > > > Acked-by: Jerin Jacob
> <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com
> >>
> >>>>     > >
> >>>>     > > Acked-by: Ferruh Yigit
> <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> >>>>     > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> >>>>
> >>>>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-09-14  2:45  0%                             ` Jerin Jacob
@ 2018-09-18 15:53  0%                               ` Ferruh Yigit
  2018-09-19  5:37  0%                                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-09-18 15:53 UTC (permalink / raw)
  To: Jerin Jacob, Honnappa Nagarahalli, Kokkilagadda, Kiran
  Cc: Ola Liljedahl, Gavin Hu (Arm Technology China),
	Jacob, Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China),
	Bruce Richardson, Konstantin Ananyev

On 9/14/2018 3:45 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Thu, 13 Sep 2018 23:45:31 +0000
>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
>>  <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
>>  <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
>>  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>>  Technology China)" <Phil.Yang@arm.com>
>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>  synchronization
>>
>> External Email
>>
>> -----Original Message-----
>>> Date: Thu, 13 Sep 2018 17:40:53 +0000
>>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
>>> <Ola.Liljedahl@arm.com>
>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
>>> (Arm  Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
>>> <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>>>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
>>> nd  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>>> Technology China)" <Phil.Yang@arm.com>
>>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>> synchronization
>>>
>>>
>>> Hi Jerin,
>>>         Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
>>
>>
>> Yes. Makes sense to me to keep only single config option.
>>
>> rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.
> 
> IMO, Both should be supported and set N as in the config/common_base.
> Based on architecture or micro architecture the performance can vary.
> So keeping both options and allowing to override to arch/micro arch
> specific config file makes sense to me.(like existing model, as smp_*
> ops are compiler NOP for x86)

Hi Jerin, Honnappa,  Kiran,

Will there be a new version for this release?

I can see two options:
1- Add read/write barriers for both library and kernel parts.
2- Use c11 atomics
  2a- change existing RTE_RING_USE_C11_MEM_MODEL to RTE_USE_C11_MEM_MODEL
  2b- Use RTE_USE_C11_MEM_MODEL to implement c11 atomic for arm and ppc

2) seems agreed on, but is it clear who will work on it?

And 1) looks easier to implement, if 2) won't make time for release can we
fallback to this one?

Thanks,
ferruh

>  
>> Also, do you see any issues in making C11 memory model default for Arm architecture?
> 
> It is already set default Y to arm64. see config/common_armv8a_linuxapp.
> 
> And it is possible for micro architecture to override, see
> config/defconfig_arm64-thunderx-linuxapp-gcc
> 
> 
>>
>>>
>>> Thank you,
>>> Honnappa
>>>
>>> -----Original Message-----
>>> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> Sent: Wednesday, August 29, 2018 3:58 AM
>>> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
>>> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
>>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
>>> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>> synchronization
>>>
>>> -----Original Message-----
>>>> Date: Wed, 29 Aug 2018 08:47:56 +0000
>>>> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
>>>> <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>>>>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
>>>> <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper
>>>> <Steve.Capper@arm.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>> synchronization
>>>> user-agent: Microsoft-MacOutlook/10.10.0.180812
>>>>
>>>>
>>>> There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
>>>>
>>>> But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
>>>
>>> # One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
>>> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
>>>
>>>
>>>>
>>>> -- Ola
>>>>
>>>> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
>>>>
>>>>     -----Original Message-----
>>>>     > Date: Wed, 29 Aug 2018 07:34:34 +0000
>>>>     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>>>     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>>>>     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>>>>     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>>>>     >  <Jerin.JacobKollanukkaran@cavium.com>
>>>>     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>>>>     >  <Steve.Capper@arm.com>
>>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>>     >  synchronization
>>>>     > user-agent: Microsoft-MacOutlook/10.10.0.180812
>>>>     >
>>>>     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
>>>>
>>>>     What would be the change in interface? Is it removing the volatile for
>>>>     C11 case, Then you can use anonymous union OR #define to keep the size
>>>>     and offset of the element intact.
>>>>
>>>>     struct rte_kni_fifo {
>>>>     #ifndef RTE_C11...
>>>>             volatile unsigned write;     /**< Next position to be written*/
>>>>             volatile unsigned read;      /**< Next position to be read */
>>>>     #else
>>>>             unsigned write;     /**< Next position to be written*/
>>>>             unsigned read;      /**< Next position to be read */
>>>>     #endif
>>>>             unsigned len;                /**< Circular buffer length */
>>>>             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
>>>>             void *volatile buffer[];     /**< The buffer contains mbuf
>>>>     pointers */
>>>>     };
>>>>
>>>>     Anonymous union example:
>>>>     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
>>>>
>>>>     You can check the ABI breakage by devtools/validate-abi.sh
>>>>
>>>>     >
>>>>     > -- Ola
>>>>     >
>>>>     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
>>>>     > Date: Wednesday, 29 August 2018 at 07:50
>>>>     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
>>>>     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
>>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>>     >
>>>>     >
>>>>     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
>>>>     >
>>>>     > ________________________________
>>>>     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>>     > Sent: Wednesday, August 29, 2018 10:29 AM
>>>>     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
>>>>     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
>>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>>     >
>>>>     >
>>>>     > External Email
>>>>     >
>>>>     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
>>>>     >
>>>>     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
>>>>     >
>>>>     >
>>>>     >
>>>>     > If you want us to put together a patch with this idea, please let us know.
>>>>     >
>>>>     >
>>>>     >
>>>>     > Thank you,
>>>>     >
>>>>     > Honnappa
>>>>     >
>>>>     >
>>>>     >
>>>>     > From: Gavin Hu
>>>>     > Sent: Tuesday, August 28, 2018 2:31 PM
>>>>     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
>>>>     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
>>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>>     >
>>>>     >
>>>>     >
>>>>     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
>>>>     >
>>>>     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
>>>>     >
>>>>     >
>>>>     >
>>>>     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
>>>>     > Sent: Tuesday, August 28, 2018 6:44 PM
>>>>     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
>>>>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
>>>>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>>     >
>>>>     >
>>>>     >
>>>>     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
>>>>     >
>>>>     >
>>>>     >
>>>>     >
>>>>     >
>>>>     > ________________________________
>>>>     >
>>>>     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
>>>>     > Sent: Monday, August 27, 2018 9:10 PM
>>>>     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
>>>>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
>>>>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>>     >
>>>>     >
>>>>     >
>>>>     > External Email
>>>>     >
>>>>     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
>>>>     >
>>>>     > > -----Original Message-----
>>>>     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
>>>>     > > Sent: Monday, August 27, 2018 10:08 PM
>>>>     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
>>>>     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
>>>>     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
>>>>     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>>     > > synchronization
>>>>     > >
>>>>     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
>>>>     > > > With existing code in kni_fifo_put, rx_q values are not being updated
>>>>     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
>>>>     > > > This is causing the sync issue on other core. So adding a write
>>>>     > > > barrier to make sure the values being synced before updating fifo_write.
>>>>     > > >
>>>>     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
>>>>     > > >
>>>>     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
>>>>     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
>>>>     > >
>>>>     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
>>>>     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>>>
>>>>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mbuf: remove deprecated segment free functions
  2018-09-10  5:18  3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
  2018-09-10  8:06  0% ` Andrew Rybchenko
@ 2018-09-17 12:45  8% ` David Marchand
  2018-09-19  8:34  0%   ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2018-09-17 12:45 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz, arybchenko, thomas

__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 doc/guides/rel_notes/release_18_11.rst |  5 +++++
 lib/librte_mbuf/rte_mbuf.h             | 16 ----------------
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..d98573072 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -68,6 +68,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
+  functions were deprecated since 17.05 and are removed:
+
+  Those functions were kept for compatibility and are replaced by
+  ``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
 
 ABI Changes
 -----------
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9ce5d76d7..a50b05c64 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1038,14 +1038,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
 	rte_mempool_put(m->pool, m);
 }
 
-/* compat with older versions */
-__rte_deprecated
-static inline void
-__rte_mbuf_raw_free(struct rte_mbuf *m)
-{
-	rte_mbuf_raw_free(m);
-}
-
 /**
  * The packet mbuf constructor.
  *
@@ -1658,14 +1650,6 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 	return NULL;
 }
 
-/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
-__rte_deprecated
-static inline struct rte_mbuf *
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
-{
-	return rte_pktmbuf_prefree_seg(m);
-}
-
 /**
  * Free a segment of a packet mbuf into its original mempool.
  *
-- 
2.17.1

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
  2018-09-16  9:39  0%   ` Thomas Monjalon
@ 2018-09-17  7:07  0%     ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-09-17  7:07 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: David Marchand, dev, Andrew Rybchenko

Hi Thomas,

On Sun, Sep 16, 2018 at 11:39:29AM +0200, Thomas Monjalon wrote:
> 10/09/2018 10:06, Andrew Rybchenko:
> > On 09/10/2018 08:18 AM, David Marchand wrote:
> > > __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> > > a long time now (early 17.05), are not part of the abi and are easily
> > > replaced with existing api.
> > >
> > > Signed-off-by: David Marchand <david.marchand@6wind.com>
> > 
> > Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> I think we need to bump the library version and update the API section
> in the release notes.

I don't think bumping the lib version is required here, the patch removes
two functions that are static inline.

But updating the API section would be nice, yes.

Thanks,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
  2018-09-10  8:06  0% ` Andrew Rybchenko
@ 2018-09-16  9:39  0%   ` Thomas Monjalon
  2018-09-17  7:07  0%     ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-09-16  9:39 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Andrew Rybchenko, olivier.matz

10/09/2018 10:06, Andrew Rybchenko:
> On 09/10/2018 08:18 AM, David Marchand wrote:
> > __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> > a long time now (early 17.05), are not part of the abi and are easily
> > replaced with existing api.
> >
> > Signed-off-by: David Marchand <david.marchand@6wind.com>
> 
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

I think we need to bump the library version and update the API section
in the release notes.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-09-13 23:45  0%                           ` Honnappa Nagarahalli
@ 2018-09-14  2:45  0%                             ` Jerin Jacob
  2018-09-18 15:53  0%                               ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-14  2:45 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ola Liljedahl, Kokkilagadda, Kiran,
	Gavin Hu (Arm Technology China),
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China)

-----Original Message-----
> Date: Thu, 13 Sep 2018 23:45:31 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
>  <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
>  <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
>  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>  Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>  synchronization
> 
> External Email
> 
> -----Original Message-----
> > Date: Thu, 13 Sep 2018 17:40:53 +0000
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> > <Ola.Liljedahl@arm.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
> > (Arm  Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> > <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> > nd  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
> > Technology China)" <Phil.Yang@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> >
> >
> > Hi Jerin,
> >         Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
> 
> 
> Yes. Makes sense to me to keep only single config option.
> 
> rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.

IMO, Both should be supported and set N as in the config/common_base.
Based on architecture or micro architecture the performance can vary.
So keeping both options and allowing to override to arch/micro arch
specific config file makes sense to me.(like existing model, as smp_*
ops are compiler NOP for x86)
 
> Also, do you see any issues in making C11 memory model default for Arm architecture?

It is already set default Y to arm64. see config/common_armv8a_linuxapp.

And it is possible for micro architecture to override, see
config/defconfig_arm64-thunderx-linuxapp-gcc


> 
> >
> > Thank you,
> > Honnappa
> >
> > -----Original Message-----
> > From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Sent: Wednesday, August 29, 2018 3:58 AM
> > To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
> > <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
> > Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
> > <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> >
> > -----Original Message-----
> > > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> > > <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> > >  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> > > <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper
> > > <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > synchronization
> > > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > >
> > >
> > > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> > >
> > > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
> >
> > # One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> > # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
> >
> >
> > >
> > > -- Ola
> > >
> > > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> > >
> > >     -----Original Message-----
> > >     > Date: Wed, 29 Aug 2018 07:34:34 +0000
> > >     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > >     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > >     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> > >     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> > >     >  <Jerin.JacobKollanukkaran@cavium.com>
> > >     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > >     >  <Steve.Capper@arm.com>
> > >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > >     >  synchronization
> > >     > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > >     >
> > >     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> > >
> > >     What would be the change in interface? Is it removing the volatile for
> > >     C11 case, Then you can use anonymous union OR #define to keep the size
> > >     and offset of the element intact.
> > >
> > >     struct rte_kni_fifo {
> > >     #ifndef RTE_C11...
> > >             volatile unsigned write;     /**< Next position to be written*/
> > >             volatile unsigned read;      /**< Next position to be read */
> > >     #else
> > >             unsigned write;     /**< Next position to be written*/
> > >             unsigned read;      /**< Next position to be read */
> > >     #endif
> > >             unsigned len;                /**< Circular buffer length */
> > >             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
> > >             void *volatile buffer[];     /**< The buffer contains mbuf
> > >     pointers */
> > >     };
> > >
> > >     Anonymous union example:
> > >     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> > >
> > >     You can check the ABI breakage by devtools/validate-abi.sh
> > >
> > >     >
> > >     > -- Ola
> > >     >
> > >     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> > >     > Date: Wednesday, 29 August 2018 at 07:50
> > >     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> > >     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >     >
> > >     >
> > >     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> > >     >
> > >     > ________________________________
> > >     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > >     > Sent: Wednesday, August 29, 2018 10:29 AM
> > >     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> > >     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> > >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >     >
> > >     >
> > >     > External Email
> > >     >
> > >     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> > >     >
> > >     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> > >     >
> > >     >
> > >     >
> > >     > If you want us to put together a patch with this idea, please let us know.
> > >     >
> > >     >
> > >     >
> > >     > Thank you,
> > >     >
> > >     > Honnappa
> > >     >
> > >     >
> > >     >
> > >     > From: Gavin Hu
> > >     > Sent: Tuesday, August 28, 2018 2:31 PM
> > >     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> > >     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >     >
> > >     >
> > >     >
> > >     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> > >     >
> > >     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> > >     >
> > >     >
> > >     >
> > >     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> > >     > Sent: Tuesday, August 28, 2018 6:44 PM
> > >     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> > >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> > >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >     >
> > >     >
> > >     >
> > >     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> > >     >
> > >     >
> > >     >
> > >     >
> > >     >
> > >     > ________________________________
> > >     >
> > >     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> > >     > Sent: Monday, August 27, 2018 9:10 PM
> > >     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> > >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> > >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >     >
> > >     >
> > >     >
> > >     > External Email
> > >     >
> > >     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> > >     >
> > >     > > -----Original Message-----
> > >     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > >     > > Sent: Monday, August 27, 2018 10:08 PM
> > >     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > >     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > >     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > >     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > >     > > synchronization
> > >     > >
> > >     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > >     > > > With existing code in kni_fifo_put, rx_q values are not being updated
> > >     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > >     > > > This is causing the sync issue on other core. So adding a write
> > >     > > > barrier to make sure the values being synced before updating fifo_write.
> > >     > > >
> > >     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > >     > > >
> > >     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > >     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> > >     > >
> > >     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> > >     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > >
> > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-09-13 17:51  0%                         ` Jerin Jacob
@ 2018-09-13 23:45  0%                           ` Honnappa Nagarahalli
  2018-09-14  2:45  0%                             ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-13 23:45 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ola Liljedahl, Kokkilagadda, Kiran,
	Gavin Hu (Arm Technology China),
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China)


-----Original Message-----
> Date: Thu, 13 Sep 2018 17:40:53 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl  
> <Ola.Liljedahl@arm.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu 
> (Arm  Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit  
> <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, 
> nd  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm  
> Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer  
> synchronization
> 
> 
> Hi Jerin,
>         Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.


Yes. Makes sense to me to keep only single config option.

rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.

Also, do you see any issues in making C11 memory model default for Arm architecture?

> 
> Thank you,
> Honnappa
> 
> -----Original Message-----
> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Sent: Wednesday, August 29, 2018 3:58 AM
> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa 
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu 
> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, 
> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd 
> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer 
> synchronization
> 
> -----Original Message-----
> > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa 
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu 
> > <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" 
> > <dev@dpdk.org>, nd  <nd@arm.com>, Steve Capper 
> > <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer 
> > synchronization
> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >
> >
> > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> >
> > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
> 
> # One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
> 
> 
> >
> > -- Ola
> >
> > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> >
> >     -----Original Message-----
> >     > Date: Wed, 29 Aug 2018 07:34:34 +0000
> >     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> >     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> >     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >     >  <Jerin.JacobKollanukkaran@cavium.com>
> >     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> >     >  <Steve.Capper@arm.com>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >     >  synchronization
> >     > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >     >
> >     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> >
> >     What would be the change in interface? Is it removing the volatile for
> >     C11 case, Then you can use anonymous union OR #define to keep the size
> >     and offset of the element intact.
> >
> >     struct rte_kni_fifo {
> >     #ifndef RTE_C11...
> >             volatile unsigned write;     /**< Next position to be written*/
> >             volatile unsigned read;      /**< Next position to be read */
> >     #else
> >             unsigned write;     /**< Next position to be written*/
> >             unsigned read;      /**< Next position to be read */
> >     #endif
> >             unsigned len;                /**< Circular buffer length */
> >             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
> >             void *volatile buffer[];     /**< The buffer contains mbuf
> >     pointers */
> >     };
> >
> >     Anonymous union example:
> >     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >
> >     You can check the ABI breakage by devtools/validate-abi.sh
> >
> >     >
> >     > -- Ola
> >     >
> >     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> >     > Date: Wednesday, 29 August 2018 at 07:50
> >     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> >     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> >     >
> >     > ________________________________
> >     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >     > Sent: Wednesday, August 29, 2018 10:29 AM
> >     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> >     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     > External Email
> >     >
> >     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> >     >
> >     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> >     >
> >     >
> >     >
> >     > If you want us to put together a patch with this idea, please let us know.
> >     >
> >     >
> >     >
> >     > Thank you,
> >     >
> >     > Honnappa
> >     >
> >     >
> >     >
> >     > From: Gavin Hu
> >     > Sent: Tuesday, August 28, 2018 2:31 PM
> >     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> >     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> >     >
> >     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> >     >
> >     >
> >     >
> >     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> >     > Sent: Tuesday, August 28, 2018 6:44 PM
> >     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > ________________________________
> >     >
> >     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> >     > Sent: Monday, August 27, 2018 9:10 PM
> >     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > External Email
> >     >
> >     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> >     >
> >     > > -----Original Message-----
> >     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> >     > > Sent: Monday, August 27, 2018 10:08 PM
> >     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> >     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> >     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> >     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >     > > synchronization
> >     > >
> >     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> >     > > > With existing code in kni_fifo_put, rx_q values are not being updated
> >     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> >     > > > This is causing the sync issue on other core. So adding a write
> >     > > > barrier to make sure the values being synced before updating fifo_write.
> >     > > >
> >     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> >     > > >
> >     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> >     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> >     > >
> >     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> >     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-09-13 17:40  0%                       ` Honnappa Nagarahalli
@ 2018-09-13 17:51  0%                         ` Jerin Jacob
  2018-09-13 23:45  0%                           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-13 17:51 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: Ola Liljedahl, Kokkilagadda, Kiran,
	Gavin Hu (Arm Technology China),
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China)

-----Original Message-----
> Date: Thu, 13 Sep 2018 17:40:53 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
>  <Ola.Liljedahl@arm.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm
>  Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
>  <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
>  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>  Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>  synchronization
> 
> 
> Hi Jerin,
>         Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.


Yes. Makes sense to me to keep only single config option.

> 
> Thank you,
> Honnappa
> 
> -----Original Message-----
> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Sent: Wednesday, August 29, 2018 3:58 AM
> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> -----Original Message-----
> > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> > <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> > nd  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >
> >
> > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> >
> > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
> 
> # One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
> 
> 
> >
> > -- Ola
> >
> > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> >
> >     -----Original Message-----
> >     > Date: Wed, 29 Aug 2018 07:34:34 +0000
> >     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> >     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> >     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
> >     >  <Jerin.JacobKollanukkaran@cavium.com>
> >     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> >     >  <Steve.Capper@arm.com>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >     >  synchronization
> >     > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >     >
> >     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> >
> >     What would be the change in interface? Is it removing the volatile for
> >     C11 case, Then you can use anonymous union OR #define to keep the size
> >     and offset of the element intact.
> >
> >     struct rte_kni_fifo {
> >     #ifndef RTE_C11...
> >             volatile unsigned write;     /**< Next position to be written*/
> >             volatile unsigned read;      /**< Next position to be read */
> >     #else
> >             unsigned write;     /**< Next position to be written*/
> >             unsigned read;      /**< Next position to be read */
> >     #endif
> >             unsigned len;                /**< Circular buffer length */
> >             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
> >             void *volatile buffer[];     /**< The buffer contains mbuf
> >     pointers */
> >     };
> >
> >     Anonymous union example:
> >     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >
> >     You can check the ABI breakage by devtools/validate-abi.sh
> >
> >     >
> >     > -- Ola
> >     >
> >     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> >     > Date: Wednesday, 29 August 2018 at 07:50
> >     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> >     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> >     >
> >     > ________________________________
> >     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >     > Sent: Wednesday, August 29, 2018 10:29 AM
> >     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> >     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     > External Email
> >     >
> >     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> >     >
> >     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> >     >
> >     >
> >     >
> >     > If you want us to put together a patch with this idea, please let us know.
> >     >
> >     >
> >     >
> >     > Thank you,
> >     >
> >     > Honnappa
> >     >
> >     >
> >     >
> >     > From: Gavin Hu
> >     > Sent: Tuesday, August 28, 2018 2:31 PM
> >     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> >     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> >     >
> >     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> >     >
> >     >
> >     >
> >     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> >     > Sent: Tuesday, August 28, 2018 6:44 PM
> >     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> >     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > ________________________________
> >     >
> >     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> >     > Sent: Monday, August 27, 2018 9:10 PM
> >     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> >     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> >     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >     >
> >     >
> >     >
> >     > External Email
> >     >
> >     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> >     >
> >     > > -----Original Message-----
> >     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> >     > > Sent: Monday, August 27, 2018 10:08 PM
> >     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> >     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> >     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> >     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >     > > synchronization
> >     > >
> >     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> >     > > > With existing code in kni_fifo_put, rx_q values are not being updated
> >     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> >     > > > This is causing the sync issue on other core. So adding a write
> >     > > > barrier to make sure the values being synced before updating fifo_write.
> >     > > >
> >     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> >     > > >
> >     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> >     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> >     > >
> >     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> >     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-08-29  8:57  0%                     ` Jerin Jacob
@ 2018-09-13 17:40  0%                       ` Honnappa Nagarahalli
  2018-09-13 17:51  0%                         ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-13 17:40 UTC (permalink / raw)
  To: Jerin Jacob, Ola Liljedahl
  Cc: Kokkilagadda, Kiran, Gavin Hu (Arm Technology China),
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper,
	Phil Yang (Arm Technology China)

Hi Jerin,
	Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.

Thank you,
Honnappa

-----Original Message-----
From: Jerin Jacob <jerin.jacob@caviumnetworks.com> 
Sent: Wednesday, August 29, 2018 3:58 AM
To: Ola Liljedahl <Ola.Liljedahl@arm.com>
Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization

-----Original Message-----
> Date: Wed, 29 Aug 2018 08:47:56 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa  
> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu 
> <Gavin.Hu@arm.com>,  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, 
> nd  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer  
> synchronization
> user-agent: Microsoft-MacOutlook/10.10.0.180812
> 
> 
> There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> 
> But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?

# One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
# The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.


> 
> -- Ola
> 
> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> 
>     -----Original Message-----
>     > Date: Wed, 29 Aug 2018 07:34:34 +0000
>     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>     >  <Jerin.JacobKollanukkaran@cavium.com>
>     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>     >  <Steve.Capper@arm.com>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>     >  synchronization
>     > user-agent: Microsoft-MacOutlook/10.10.0.180812
>     >
>     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> 
>     What would be the change in interface? Is it removing the volatile for
>     C11 case, Then you can use anonymous union OR #define to keep the size
>     and offset of the element intact.
> 
>     struct rte_kni_fifo {
>     #ifndef RTE_C11...
>             volatile unsigned write;     /**< Next position to be written*/
>             volatile unsigned read;      /**< Next position to be read */
>     #else
>             unsigned write;     /**< Next position to be written*/
>             unsigned read;      /**< Next position to be read */
>     #endif
>             unsigned len;                /**< Circular buffer length */
>             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
>             void *volatile buffer[];     /**< The buffer contains mbuf
>     pointers */
>     };
> 
>     Anonymous union example:
>     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> 
>     You can check the ABI breakage by devtools/validate-abi.sh
> 
>     >
>     > -- Ola
>     >
>     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
>     > Date: Wednesday, 29 August 2018 at 07:50
>     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
>     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
>     >
>     > ________________________________
>     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>     > Sent: Wednesday, August 29, 2018 10:29 AM
>     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
>     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     > External Email
>     >
>     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
>     >
>     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
>     >
>     >
>     >
>     > If you want us to put together a patch with this idea, please let us know.
>     >
>     >
>     >
>     > Thank you,
>     >
>     > Honnappa
>     >
>     >
>     >
>     > From: Gavin Hu
>     > Sent: Tuesday, August 28, 2018 2:31 PM
>     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
>     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
>     >
>     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
>     >
>     >
>     >
>     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
>     > Sent: Tuesday, August 28, 2018 6:44 PM
>     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
>     >
>     >
>     >
>     >
>     >
>     > ________________________________
>     >
>     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
>     > Sent: Monday, August 27, 2018 9:10 PM
>     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > External Email
>     >
>     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
>     >
>     > > -----Original Message-----
>     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
>     > > Sent: Monday, August 27, 2018 10:08 PM
>     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
>     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
>     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
>     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>     > > synchronization
>     > >
>     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
>     > > > With existing code in kni_fifo_put, rx_q values are not being updated
>     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
>     > > > This is causing the sync issue on other core. So adding a write
>     > > > barrier to make sure the values being synced before updating fifo_write.
>     > > >
>     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
>     > > >
>     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
>     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
>     > >
>     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
>     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-06 14:28  3%       ` Michel Machado
@ 2018-09-12 20:37  2%         ` Honnappa Nagarahalli
  2018-09-20 19:50  0%           ` Michel Machado
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-12 20:37 UTC (permalink / raw)
  To: Michel Machado, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang

Hi Michel,
	I applied your patch and tweaked the code to run few performance tests on Arm (Cortex-A72, 1.3GHz) and x86 (Intel Xeon CPU E5-2660 v4 @ 2.00GHz). The perf code looks as follows:

        count_b = rte_rdtsc_precise();
        int k = 0;
        rte_hash_iterator_init(tbl_rw_test_param.h, &state);

        while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
                /* Search for the key in the list of keys added .*/
                i = *(const uint32_t *)next_key;
                tbl_rw_test_param.found[i]++;
                k++;
        }

        count_a = rte_rdtsc_precise() - count_b;
        printf("*****Cycles2 per iterate call: %lu\n", count_a/k);

Further, I changed the rte_hash_iterate as follows and ran the same test.
int32_t rte_hash_iterate(const struct rte_hash *h, struct rte_hash_iterator_state *state, const void **key, void **data)

Finally, I used memset in the place of rte_hash_iterator_init with the required changes to rte_hash_iterate.

All these tests show little variation in 'cycles per iterate call' for both architectures.


-----Original Message-----
From: Michel Machado <michel@digirati.com.br> 
Sent: Thursday, September 6, 2018 9:29 AM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries

On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> +	uint32_t              next;
>> +	uint32_t              total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
> 
>      What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
> 
> This is to maintain consistency. For ex: 'struct queue_node', which is 
> an internal structure, is kept in rte_cuckoo_hash.h

    Okay. We'll move it there.

>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> +	struct rte_hash_iterator_state *state) {
>> +	struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> +	__state = (struct rte_hash_iterator_istate *)state;
>> +	__state->h = h;
>> +	__state->next = 0;
>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> +	return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
> 
>      Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
> 
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

    It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.

    About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.

As my tests showed, I do not see any impact of this.

>>    int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void 
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> +	struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void 
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
> 
>      The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
> 
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    There's a tiny cost saving: avoiding to push that parameter in the execution stack every time the iterator will get another entry. However, the reason I find more important is to make impossible to introduce a bug in the code. Consider a function that is dealing with two hash tables and two iterators. Without asking for the hash table to make progress in an iterator, it's impossible to mix up hash tables and iterator states.

IMO, similar arguments can be applied for other APIs too. It is more difficult to use the APIs if they are not consistent. I also do not see the benefit of the savings in my tests. 

    There's even the possibility that an iterator doesn't need the hash table after its initialization. This would be an *unlikely* case, but consider an iterator that only returns a couple of entries. It could cache those entries during initialization.

>>    	/* Calculate bucket and index of current iterator */
>> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    
>>    	/* If current position is empty, go to the next one */
>> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> -		(*next)++;
>> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> +		__state->next++;
>>    		/* End of table */
>> -		if (*next == total_entries)
>> +		if (__state->next == __state->total_entries)
>>    			return -ENOENT;
>> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    	}
>> -	__hash_rw_reader_lock(h);
>> +	__hash_rw_reader_lock(__state->h);
>>    	/* Get position of entry in key table */
>> -	position = h->buckets[bucket_idx].key_idx[idx];
>> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> -				position * h->key_entry_size);
>> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
>> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> +				position * __state->h->key_entry_size);
>>    	/* Return key and data */
>>    	*key = next_key->key;
>>    	*data = next_key->pdata;
>>    
>> -	__hash_rw_reader_unlock(h);
>> +	__hash_rw_reader_unlock(__state->h);
>>    
>>    	/* Increment iterator */
>> -	(*next)++;
>> +	__state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
> 
>      Even though __state->next does not depend on the lock?
> 
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

    If an application needs to share an iterator state between threads, it has to have a synchronization mechanism for that as it would for any other shared variable. The lock above is allowing applications to share a hash table between threads, it has no semantic over anything else.

Agree, the lock is for protecting the hash table, not the iterator state.

>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h 
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>>    #include <stdint.h>
>>    #include <stddef.h>
>>    
>> +#include <rte_compat.h>
>> +
>>    #ifdef __cplusplus
>>    extern "C" {
>>    #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>    /** @internal A hash table structure. */  struct rte_hash;
>>    
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> +	uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
> 
>      Okay.

    I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.

Ok. May be have a #define for 64?

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  2018-09-11 23:14  3%                                   ` Stephen Hemminger
@ 2018-09-12  4:02  0%                                     ` Jason Wang
  0 siblings, 0 replies; 200+ results
From: Jason Wang @ 2018-09-12  4:02 UTC (permalink / raw)
  To: Stephen Hemminger, Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev



On 2018年09月12日 07:14, Stephen Hemminger wrote:
> On Tue, 11 Sep 2018 19:07:47 -0300
> Dan Gora <dg@adax.com> wrote:
>
>> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> The carrier state has no meaning when device is down, at least for physical
>>> devices. Because often the PHY is powered off when the device is marked down.
>> The thing that caught my attention is that when you mark a kernel
>> ethernet device 'down', you get a message that the link is down in the
>> syslog.
>>
>> snappy:root:bash 2645 => ip link set down dev eth0
>> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
>>
>> With this method, that's not possible because you cannot change the
>> link state from the callback from kni_net_release.
>>
>> The carrier state doesn't have any meaning from a data transfer point
>> of view, but it's often useful for being able to diagnose connectivity
>> issues (is my cable plugged in or not).
>>
>> I'm still not really clear what the objection really is to the ioctl
>> method.  Is it just the number of changes?  That the kernel driver has
>> to change as well?  Just that there is another way to do it?
>>
>> thanks
>> dan
> I want to see KNI as part of the standard Linux kernel at some future date.
> Having KNI as an out of tree driver means it is doomed to chasing tail lights
> for the Linux kernel ABI instability and also problems with Linux distributions.

Why not use vhost_net instead? KNI duplicates its function.

Thanks

>
> One of the barriers to entry for Linux drivers is introducing new ioctl's.
> Ioctl's have issues with being device specific and also 32/64 compatiablity.
> If KNI has ioctl's it makes it harder to get merged some day.
>
> I freely admit that this is forcing KNI to respond to something that is not
> there yet, so if it is too hard, then doing it with ioctl is going to be
> necessary.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  @ 2018-09-11 23:14  3%                                   ` Stephen Hemminger
  2018-09-12  4:02  0%                                     ` Jason Wang
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-09-11 23:14 UTC (permalink / raw)
  To: Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev

On Tue, 11 Sep 2018 19:07:47 -0300
Dan Gora <dg@adax.com> wrote:

> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > The carrier state has no meaning when device is down, at least for physical
> > devices. Because often the PHY is powered off when the device is marked down.  
> 
> The thing that caught my attention is that when you mark a kernel
> ethernet device 'down', you get a message that the link is down in the
> syslog.
> 
> snappy:root:bash 2645 => ip link set down dev eth0
> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
> 
> With this method, that's not possible because you cannot change the
> link state from the callback from kni_net_release.
> 
> The carrier state doesn't have any meaning from a data transfer point
> of view, but it's often useful for being able to diagnose connectivity
> issues (is my cable plugged in or not).
> 
> I'm still not really clear what the objection really is to the ioctl
> method.  Is it just the number of changes?  That the kernel driver has
> to change as well?  Just that there is another way to do it?
> 
> thanks
> dan

I want to see KNI as part of the standard Linux kernel at some future date.
Having KNI as an out of tree driver means it is doomed to chasing tail lights
for the Linux kernel ABI instability and also problems with Linux distributions.

One of the barriers to entry for Linux drivers is introducing new ioctl's.
Ioctl's have issues with being device specific and also 32/64 compatiablity.
If KNI has ioctl's it makes it harder to get merged some day.

I freely admit that this is forcing KNI to respond to something that is not
there yet, so if it is too hard, then doing it with ioctl is going to be
necessary.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
  2018-09-11 14:06  0%           ` Bruce Richardson
@ 2018-09-11 16:05  0%             ` Luca Boccassi
  0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-09-11 16:05 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, 2018-09-11 at 15:06 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:41:36PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> > > On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > > > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi
> > > > > wrote:
> > > > > > The library is called librte_pmd_lio, so rename the map
> > > > > > file
> > > > > > and
> > > > > > set
> > > > > > the name in the meson file so that the built library names
> > > > > > with
> > > > > > meson
> > > > > > and legacy makefiles are the same
> > > > > > 
> > > > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > > > Cc: stable@dpdk.org
> > > > > > 
> > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > 
> > > > > Rather than doing this renaming, can we instead add a symlink
> > > > > in
> > > > > the
> > > > > install phase to map the old name to the new one? I'd like to
> > > > > see
> > > > > the
> > > > > consistency of directory name, map filename and driver name
> > > > > enforced
> > > > > strictly in the build system. Having exceptions is a pain.
> > > > > 
> > > > > /Bruce
> > > > 
> > > > We could, but the pain gets shifted on packagers then - what
> > > > about
> > > > renaming the directory entirely to net/lio?
> > > 
> > > For packagers, what sort of ABI compatibility guarantees do you
> > > try
> > > and
> > > keep between releases. Is this something that just needs a one-
> > > release ABI
> > > announcement, as with other ABI changes?
> > > 
> > > /Bruce
> > 
> > Currently in Debian/Ubuntu we are using the ABI override (because
> > of
> > the sticky ABI breakage issue) so the filenames and package names
> > are
> > different on every release anyway.
> > 
> > So in theory we could change the name of the libs and packages, but
> > what I'm mostly worried about is keeping consistency and some level
> > of
> > compatibility between old and new build systems, isn't that an
> > issue?
> > 
> 
> It's a good question, and I suspect everyone will have their own
> opinion.
> 
> Personally, I take the view that moving build system involves quite a
> number of changes anyway, so we should take the opportunity to clean
> up a
> few other things at the same time. This is why I'm so keep on trying
> to
> keep everything consistent as far as possible throughout the system
> and not
> put in special cases. For many of these a) if we put in lots of name
> overrides now we'll probably never get rid of them, and b) it's more
> likely
> that future drivers will adopt the same technique to have different
> naming
> of drivers and directories.
> 
> However, if keeping sonames consistent is a major concern, then
> perhaps we
> should look to rename some directories, like you suggested before.
> 
> /Bruce

Actually I tend to agree, it would be better to make the libraries
consistent, so I'm fine with having to deal with it once in packaging.
I'll send a v2 without most of the renames.

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
  2018-09-11 13:41  4%         ` Luca Boccassi
@ 2018-09-11 14:06  0%           ` Bruce Richardson
  2018-09-11 16:05  0%             ` Luca Boccassi
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-11 14:06 UTC (permalink / raw)
  To: Luca Boccassi; +Cc: dev

On Tue, Sep 11, 2018 at 02:41:36PM +0100, Luca Boccassi wrote:
> On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> > On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > > The library is called librte_pmd_lio, so rename the map file
> > > > > and
> > > > > set
> > > > > the name in the meson file so that the built library names with
> > > > > meson
> > > > > and legacy makefiles are the same
> > > > > 
> > > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > > Cc: stable@dpdk.org
> > > > > 
> > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > 
> > > > Rather than doing this renaming, can we instead add a symlink in
> > > > the
> > > > install phase to map the old name to the new one? I'd like to see
> > > > the
> > > > consistency of directory name, map filename and driver name
> > > > enforced
> > > > strictly in the build system. Having exceptions is a pain.
> > > > 
> > > > /Bruce
> > > 
> > > We could, but the pain gets shifted on packagers then - what about
> > > renaming the directory entirely to net/lio?
> > 
> > For packagers, what sort of ABI compatibility guarantees do you try
> > and
> > keep between releases. Is this something that just needs a one-
> > release ABI
> > announcement, as with other ABI changes?
> > 
> > /Bruce
> 
> Currently in Debian/Ubuntu we are using the ABI override (because of
> the sticky ABI breakage issue) so the filenames and package names are
> different on every release anyway.
> 
> So in theory we could change the name of the libs and packages, but
> what I'm mostly worried about is keeping consistency and some level of
> compatibility between old and new build systems, isn't that an issue?
> 

It's a good question, and I suspect everyone will have their own opinion.

Personally, I take the view that moving build system involves quite a
number of changes anyway, so we should take the opportunity to clean up a
few other things at the same time. This is why I'm so keep on trying to
keep everything consistent as far as possible throughout the system and not
put in special cases. For many of these a) if we put in lots of name
overrides now we'll probably never get rid of them, and b) it's more likely
that future drivers will adopt the same technique to have different naming
of drivers and directories.

However, if keeping sonames consistent is a major concern, then perhaps we
should look to rename some directories, like you suggested before.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
  2018-09-11 13:32  4%       ` Bruce Richardson
@ 2018-09-11 13:41  4%         ` Luca Boccassi
  2018-09-11 14:06  0%           ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2018-09-11 13:41 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > The library is called librte_pmd_lio, so rename the map file
> > > > and
> > > > set
> > > > the name in the meson file so that the built library names with
> > > > meson
> > > > and legacy makefiles are the same
> > > > 
> > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > 
> > > Rather than doing this renaming, can we instead add a symlink in
> > > the
> > > install phase to map the old name to the new one? I'd like to see
> > > the
> > > consistency of directory name, map filename and driver name
> > > enforced
> > > strictly in the build system. Having exceptions is a pain.
> > > 
> > > /Bruce
> > 
> > We could, but the pain gets shifted on packagers then - what about
> > renaming the directory entirely to net/lio?
> 
> For packagers, what sort of ABI compatibility guarantees do you try
> and
> keep between releases. Is this something that just needs a one-
> release ABI
> announcement, as with other ABI changes?
> 
> /Bruce

Currently in Debian/Ubuntu we are using the ABI override (because of
the sticky ABI breakage issue) so the filenames and package names are
different on every release anyway.

So in theory we could change the name of the libs and packages, but
what I'm mostly worried about is keeping consistency and some level of
compatibility between old and new build systems, isn't that an issue?

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
  @ 2018-09-11 13:38  3%         ` Luca Boccassi
  0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-09-11 13:38 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On Tue, 2018-09-11 at 14:30 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > The library is called librte_pmd_lio, so rename the map file
> > > > and
> > > > set
> > > > the name in the meson file so that the built library names with
> > > > meson
> > > > and legacy makefiles are the same
> > > > 
> > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > Cc: stable@dpdk.org
> > > > 
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > 
> > > Rather than doing this renaming, can we instead add a symlink in
> > > the
> > > install phase to map the old name to the new one? I'd like to see
> > > the
> > > consistency of directory name, map filename and driver name
> > > enforced
> > > strictly in the build system. Having exceptions is a pain.
> > > 
> > > /Bruce
> > 
> > We could, but the pain gets shifted on packagers then - what about
> > renaming the directory entirely to net/lio?
> > 
> 
> It is still an issue with packagers if the symlinks are created as
> part of
> the install step of DPDK itself (which is what I was intending)? I
> was
> thinking of adding a new post-install script for the backward
> compatible
> renames.

At least for Debian/Ubuntu, if I tell the tools that package libfoo1
needs to have libfoo.so.1.2.3, that's what it will do, without
following symlinks. So a broken link will be installed in the system,
unless I start tracking what symlinks are there and adding them
manually to the package they belong to.
There's also the fact that by policy the library package names should
match the file name of the library and its ABI revision, so
libfoo.so.1.2.3 should be in libfoo1 pkg vy policy - if they mismatch,
some linters tools are going to yell at me at the very least.

> As for renaming the directory, I don't mind, but I'll let the driver
> maintainers comment on their thoughts on it.
> 
> /Bruce

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
    @ 2018-09-11 13:32  4%       ` Bruce Richardson
  2018-09-11 13:41  4%         ` Luca Boccassi
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-11 13:32 UTC (permalink / raw)
  To: Luca Boccassi
  Cc: t, dev, keith.wiles, roy.fan.zhang, jingjing.wu, wenzhuo.lu,
	rasesh.mody, harish.patil, shahed.shaikh, amr.mokhtar,
	shijith.thotton, ssrinivasan, liang.j.ma, peter.mccarthy,
	jerin.jacob, maciej.czekaj, arybchenko, ashish.gupta, yongwang,
	thomas

On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > The library is called librte_pmd_lio, so rename the map file and
> > > set
> > > the name in the meson file so that the built library names with
> > > meson
> > > and legacy makefiles are the same
> > > 
> > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > Cc: stable@dpdk.org
> > > 
> > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > 
> > Rather than doing this renaming, can we instead add a symlink in the
> > install phase to map the old name to the new one? I'd like to see the
> > consistency of directory name, map filename and driver name enforced
> > strictly in the build system. Having exceptions is a pain.
> > 
> > /Bruce
> 
> We could, but the pain gets shifted on packagers then - what about
> renaming the directory entirely to net/lio?

For packagers, what sort of ABI compatibility guarantees do you try and
keep between releases. Is this something that just needs a one-release ABI
announcement, as with other ABI changes?

/Bruce

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
  2018-09-10  5:18  3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
@ 2018-09-10  8:06  0% ` Andrew Rybchenko
  2018-09-16  9:39  0%   ` Thomas Monjalon
  2018-09-17 12:45  8% ` [dpdk-dev] [PATCH v2] " David Marchand
  1 sibling, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-10  8:06 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: olivier.matz

On 09/10/2018 08:18 AM, David Marchand wrote:
> __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> a long time now (early 17.05), are not part of the abi and are easily
> replaced with existing api.
>
> Signed-off-by: David Marchand <david.marchand@6wind.com>

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
@ 2018-09-10  5:18  3% David Marchand
  2018-09-10  8:06  0% ` Andrew Rybchenko
  2018-09-17 12:45  8% ` [dpdk-dev] [PATCH v2] " David Marchand
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2018-09-10  5:18 UTC (permalink / raw)
  To: dev; +Cc: olivier.matz

__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.

Signed-off-by: David Marchand <david.marchand@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9ce5d76d7..a50b05c64 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1038,14 +1038,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
 	rte_mempool_put(m->pool, m);
 }
 
-/* compat with older versions */
-__rte_deprecated
-static inline void
-__rte_mbuf_raw_free(struct rte_mbuf *m)
-{
-	rte_mbuf_raw_free(m);
-}
-
 /**
  * The packet mbuf constructor.
  *
@@ -1658,14 +1650,6 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 	return NULL;
 }
 
-/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
-__rte_deprecated
-static inline struct rte_mbuf *
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
-{
-	return rte_pktmbuf_prefree_seg(m);
-}
-
 /**
  * Free a segment of a packet mbuf into its original mempool.
  *
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [dpdk-announce] DPDK 18.05.1 released
@ 2018-09-05 14:44  1% Christian Ehrhardt
  0 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2018-09-05 14:44 UTC (permalink / raw)
  To: announce

Hi all,

Here is a new stable release:
	https://fast.dpdk.org/rel/dpdk-18.05.1.tar.xz

The git tree is at:
	https://dpdk.org/browse/dpdk-stable/?h=18.05

Christian Ehrhardt <christian.ehrhardt@canonical.com>

---
 MAINTAINERS                                       |  12 +-
 app/test-crypto-perf/cperf_ops.c                  |   3 +
 app/test-eventdev/test_order_atq.c                |  12 +-
 app/test-eventdev/test_order_queue.c              |  12 +-
 app/test-pmd/cmdline.c                            |   8 +-
 app/test-pmd/cmdline_flow.c                       |  29 ++-
 app/test-pmd/cmdline_tm.c                         |  37 ++-
 app/test-pmd/testpmd.c                            |  46 +++-
 buildtools/pmdinfogen/Makefile                    |   2 +-
 config/meson.build                                |   3 +-
 devtools/test-build.sh                            |   1 -
 devtools/test-meson-builds.sh                     |  13 +-
 doc/guides/cryptodevs/dpaa2_sec.rst               |   1 -
 doc/guides/cryptodevs/dpaa_sec.rst                |   1 -
 doc/guides/eventdevs/octeontx.rst                 |   2 +-
 doc/guides/nics/qede.rst                          |  13 +-
 doc/guides/nics/vdev_netvsc.rst                   |   2 +-
 doc/guides/rel_notes/release_18_05.rst            | 234 +++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst       |   6 +-
 drivers/bus/dpaa/base/fman/fman_hw.c              |  20 +-
 drivers/bus/dpaa/base/fman/of.c                   |   5 +
 drivers/bus/dpaa/dpaa_bus.c                       |  14 +-
 drivers/bus/dpaa/include/compat.h                 |   6 +
 drivers/bus/pci/linux/pci_vfio.c                  |   2 +-
 drivers/compress/isal/isal_compress_pmd.c         |  68 ++++--
 drivers/compress/isal/isal_compress_pmd_ops.c     |   7 +-
 drivers/crypto/virtio/virtio_cryptodev.c          |   6 +
 drivers/crypto/virtio/virtio_cryptodev.h          |   3 +
 drivers/crypto/virtio/virtio_rxtx.c               |  14 +-
 drivers/event/octeontx/ssovf_evdev.c              |  14 +-
 drivers/event/octeontx/ssovf_worker.c             |  17 +-
 drivers/event/octeontx/timvf_evdev.c              |   2 +-
 drivers/mempool/octeontx/octeontx_fpavf.c         |  45 ++--
 drivers/mempool/octeontx/octeontx_fpavf.h         |   9 +
 drivers/meson.build                               |   3 +
 drivers/net/af_packet/rte_eth_af_packet.c         |   1 +
 drivers/net/avf/avf_ethdev.c                      |  17 +-
 drivers/net/bnx2x/bnx2x.c                         |  22 +-
 drivers/net/bnx2x/bnx2x.h                         |   1 +
 drivers/net/bnx2x/bnx2x_ethdev.c                  | 105 ++++++---
 drivers/net/bnx2x/bnx2x_ethdev.h                  |   3 +-
 drivers/net/bnxt/bnxt.h                           |   4 +
 drivers/net/bnxt/bnxt_ethdev.c                    |  56 +++--
 drivers/net/bnxt/bnxt_filter.c                    |  27 ++-
 drivers/net/bnxt/bnxt_hwrm.c                      |  57 +++--
 drivers/net/bnxt/bnxt_stats.c                     |   3 +
 drivers/net/bnxt/bnxt_txr.c                       |  59 ++++-
 drivers/net/bnxt/bnxt_txr.h                       |  10 +
 drivers/net/bnxt/bnxt_vnic.c                      |   5 +-
 drivers/net/bnxt/bnxt_vnic.h                      |   6 +-
 drivers/net/bonding/rte_eth_bond_api.c            |  14 +-
 drivers/net/bonding/rte_eth_bond_pmd.c            |  27 +--
 drivers/net/cxgbe/base/t4_hw.c                    |  97 ++++++--
 drivers/net/cxgbe/base/t4_regs.h                  |   3 +
 drivers/net/cxgbe/base/t4fw_interface.h           |   8 +
 drivers/net/cxgbe/base/t4vf_hw.c                  |   6 +
 drivers/net/cxgbe/cxgbe_compat.h                  |   9 -
 drivers/net/cxgbe/cxgbe_ethdev.c                  |   3 +-
 drivers/net/cxgbe/cxgbevf_ethdev.c                |   1 +
 drivers/net/cxgbe/sge.c                           |  10 +-
 drivers/net/dpaa/dpaa_ethdev.c                    |  36 ++-
 drivers/net/dpaa2/dpaa2_rxtx.c                    |  16 +-
 drivers/net/dpaa2/mc/dpni.c                       |   2 +-
 drivers/net/ena/base/ena_plat_dpdk.h              |  35 +--
 drivers/net/ena/ena_ethdev.c                      |   4 +-
 drivers/net/enic/base/vnic_dev.c                  |  16 ++
 drivers/net/enic/base/vnic_dev.h                  |   4 +
 drivers/net/enic/base/vnic_devcmd.h               |  23 +-
 drivers/net/enic/base/vnic_enet.h                 |   5 +-
 drivers/net/enic/base/vnic_nic.h                  |   4 +-
 drivers/net/enic/enic.h                           |   2 +
 drivers/net/enic/enic_ethdev.c                    |   5 +-
 drivers/net/enic/enic_main.c                      |  42 ++--
 drivers/net/enic/enic_res.c                       |  11 +-
 drivers/net/enic/enic_rxtx.c                      |  42 +++-
 drivers/net/failsafe/failsafe.c                   |   1 +
 drivers/net/fm10k/fm10k.h                         |   3 -
 drivers/net/i40e/i40e_ethdev.c                    | 197 ++++++++++++----
 drivers/net/i40e/i40e_ethdev_vf.c                 |   1 -
 drivers/net/i40e/i40e_rxtx.c                      |   2 +-
 drivers/net/i40e/i40e_rxtx_vec_avx2.c             |   2 +-
 drivers/net/i40e/rte_pmd_i40e.c                   |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.h                  |   5 +
 drivers/net/ixgbe/ixgbe_fdir.c                    |  30 ++-
 drivers/net/ixgbe/ixgbe_flow.c                    |  12 +-
 drivers/net/ixgbe/ixgbe_pf.c                      |  14 +-
 drivers/net/kni/rte_eth_kni.c                     |   1 +
 drivers/net/mlx4/Makefile                         |   2 +-
 drivers/net/mlx4/mlx4.c                           |  40 ++--
 drivers/net/mlx4/mlx4.h                           |   1 +
 drivers/net/mlx4/mlx4_rxq.c                       |   9 +-
 drivers/net/mlx5/Makefile                         |   4 +-
 drivers/net/mlx5/mlx5.c                           |  26 ++-
 drivers/net/mlx5/mlx5.h                           |   2 +-
 drivers/net/mlx5/mlx5_defs.h                      |   5 +-
 drivers/net/mlx5/mlx5_ethdev.c                    |  20 +-
 drivers/net/mlx5/mlx5_flow.c                      |   6 +-
 drivers/net/mlx5/mlx5_glue.c                      |   4 +
 drivers/net/mlx5/mlx5_mr.c                        | 119 +++++-----
 drivers/net/mlx5/mlx5_mr.h                        |   5 +-
 drivers/net/mlx5/mlx5_nl.c                        |   6 +-
 drivers/net/mlx5/mlx5_rxmode.c                    |  26 ++-
 drivers/net/mlx5/mlx5_rxq.c                       |  56 +----
 drivers/net/mlx5/mlx5_rxtx.c                      |  28 +--
 drivers/net/mlx5/mlx5_rxtx_vec.h                  |   4 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h             |  16 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h              |  16 +-
 drivers/net/mlx5/mlx5_socket.c                    |   6 +
 drivers/net/mlx5/mlx5_trigger.c                   |  45 ++--
 drivers/net/mlx5/mlx5_txq.c                       |  31 +--
 drivers/net/mvpp2/mrvl_ethdev.c                   |   5 +-
 drivers/net/nfp/nfp_net.c                         |  12 +-
 drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h |   1 -
 drivers/net/null/rte_eth_null.c                   |   1 +
 drivers/net/octeontx/octeontx_ethdev.c            |  15 +-
 drivers/net/octeontx/octeontx_rxtx.c              |   2 +-
 drivers/net/pcap/rte_eth_pcap.c                   |  87 +++-----
 drivers/net/qede/base/bcm_osal.c                  |   5 +
 drivers/net/qede/base/ecore_dev.c                 |  10 +-
 drivers/net/qede/base/ecore_int.c                 |  14 +-
 drivers/net/qede/base/ecore_sriov.c               |  44 ++++
 drivers/net/qede/base/ecore_vf.c                  |  33 +++
 drivers/net/qede/base/ecore_vf.h                  |   9 +
 drivers/net/qede/base/ecore_vfpf_if.h             |  16 ++
 drivers/net/qede/qede_ethdev.c                    | 261 ++++++++++++++--------
 drivers/net/qede/qede_ethdev.h                    |   1 +
 drivers/net/qede/qede_fdir.c                      |   3 +
 drivers/net/qede/qede_main.c                      |   7 +-
 drivers/net/qede/qede_rxtx.c                      |  23 +-
 drivers/net/qede/qede_rxtx.h                      |   1 -
 drivers/net/sfc/sfc_ef10_essb_rx.c                |  28 ++-
 drivers/net/sfc/sfc_ef10_rx_ev.h                  |   8 +-
 drivers/net/sfc/sfc_ethdev.c                      |   6 +-
 drivers/net/sfc/sfc_filter.c                      |  14 ++
 drivers/net/sfc/sfc_filter.h                      |  10 +
 drivers/net/sfc/sfc_flow.c                        |  20 +-
 drivers/net/sfc/sfc_rx.c                          |  26 ++-
 drivers/net/tap/rte_eth_tap.c                     |   2 +
 drivers/net/tap/tap_flow.c                        |  18 +-
 drivers/net/thunderx/nicvf_ethdev.c               |   5 +-
 drivers/net/thunderx/nicvf_rxtx.c                 |  24 +-
 drivers/net/vhost/rte_eth_vhost.c                 |   1 +
 drivers/raw/dpaa2_qdma/dpaa2_qdma.c               |   1 +
 examples/exception_path/main.c                    |   3 +
 examples/flow_filtering/main.c                    |  16 ++
 examples/ipsec-secgw/ipsec-secgw.c                |   7 +-
 examples/l2fwd-crypto/main.c                      |  37 +--
 examples/l3fwd/l3fwd_em.c                         |   1 -
 examples/l3fwd/l3fwd_lpm.c                        |   1 -
 examples/meson.build                              |   4 +
 kernel/linux/kni/ethtool/igb/igb_ethtool.c        |   7 +-
 kernel/linux/kni/ethtool/igb/kcompat.h            |   5 +
 lib/librte_bitratestats/rte_bitrate.c             |   6 +
 lib/librte_cryptodev/rte_crypto.h                 |  51 +++--
 lib/librte_eal/bsdapp/eal/eal_memory.c            |   2 +-
 lib/librte_eal/common/eal_common_dev.c            |  26 +--
 lib/librte_eal/common/eal_common_memory.c         |  33 ++-
 lib/librte_eal/common/eal_common_proc.c           |   6 +-
 lib/librte_eal/common/eal_common_thread.c         |   6 +-
 lib/librte_eal/common/include/rte_bitmap.h        |   8 +-
 lib/librte_eal/common/include/rte_version.h       |   2 +-
 lib/librte_eal/common/malloc_elem.c               |  14 +-
 lib/librte_eal/linuxapp/eal/eal_interrupts.c      |   2 +-
 lib/librte_eal/linuxapp/eal/eal_memalloc.c        |  23 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c          |   2 +-
 lib/librte_eal/linuxapp/eal/eal_thread.c          |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c            |  49 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h            |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c    |   8 -
 lib/librte_eal/meson.build                        |   2 +-
 lib/librte_ethdev/rte_ethdev.c                    |  10 +
 lib/librte_ethdev/rte_ethdev.h                    |   4 +-
 lib/librte_ethdev/rte_ethdev_driver.h             |   1 -
 lib/librte_ethdev/rte_flow.c                      |   2 +-
 lib/librte_eventdev/rte_event_eth_rx_adapter.c    |  38 +++-
 lib/librte_eventdev/rte_event_ring.c              |  15 +-
 lib/librte_hash/rte_cuckoo_hash.c                 |  21 +-
 lib/librte_hash/rte_cuckoo_hash_x86.h             |   3 +
 lib/librte_hash/rte_hash.h                        |  20 +-
 lib/librte_kni/rte_kni.c                          |   3 +
 lib/librte_latencystats/rte_latencystats.c        |   8 +-
 lib/librte_metrics/rte_metrics.c                  |  15 +-
 lib/librte_net/rte_ip.h                           |  28 +--
 lib/librte_ring/rte_ring.h                        |   2 +-
 lib/librte_ring/rte_ring_c11_mem.h                |   8 +-
 lib/librte_security/rte_security.c                |   3 +-
 lib/librte_vhost/iotlb.c                          |  10 +-
 lib/librte_vhost/iotlb.h                          |   2 +-
 lib/librte_vhost/vhost.h                          |   1 +
 lib/librte_vhost/vhost_user.c                     |   5 +
 lib/librte_vhost/virtio_net.c                     |   3 +-
 lib/meson.build                                   |   4 +
 mk/rte.sdkinstall.mk                              |  36 +--
 mk/rte.sdkroot.mk                                 |   4 +-
 mk/rte.sdktest.mk                                 |  32 ++-
 mk/toolchain/gcc/rte.toolchain-compat.mk          |   5 +
 mk/toolchain/gcc/rte.vars.mk                      |   9 +
 pkg/dpdk.spec                                     |   2 +-
 test/test/autotest_runner.py                      | 145 ++++++------
 test/test/meson.build                             |   7 +-
 test/test/test_cryptodev.c                        |   2 +-
 test/test/test_eal_flags.c                        |  33 +--
 test/test/test_flow_classify.c                    |  20 +-
 test/test/test_hash_multiwriter.c                 |  50 ++++-
 test/test/test_pmd_ring.c                         |   2 +
 205 files changed, 2565 insertions(+), 1246 deletions(-)
Adrien Mazarguil (8):
      app/testpmd: fix crash when attaching a device
      net/mlx4: fix minor resource leak during init
      net/mlx5: fix errno object in probe function
      net/mlx5: fix missing errno in probe function
      net/mlx5: fix error message in probe function
      net/mlx5: fix invalid error check
      maintainers: update for Mellanox PMDs
      net/mlx5: fix invalid network interface index

Ajit Khaparde (11):
      net/bnxt: fix clear port stats
      net/bnxt: fix close operation
      net/bnxt: fix HW Tx checksum offload check
      net/bnxt: check filter type before clearing it
      net/bnxt: fix set MTU
      net/bnxt: fix incorrect IO address handling in Tx
      net/bnxt: fix Rx ring count limitation
      net/bnxt: fix memory leaks in NVM commands
      net/bnxt: fix lock release on NVM write failure
      net/bnxt: check access denied for HWRM commands
      net/bnxt: fix RETA size

Alejandro Lucero (2):
      net/nfp: fix unused header reference
      net/nfp: fix field initialization in Tx descriptor

Alok Makhariya (1):
      bus/dpaa: fix phandle support for Linux 4.16

Anatoly Burakov (14):
      ipc: fix locking while sending messages
      mem: fix alignment of requested virtual areas
      eal/bsd: fix memory segment index display
      malloc: fix pad erasing
      eal/linux: fix invalid syntax in interrupts
      eal/linux: fix uninitialized value
      vfio: fix uninitialized variable
      malloc: do not skip pad on free
      test: fix EAL flags autotest on FreeBSD
      test: fix result printing
      test: fix code on report
      test: make autotest runner python 2/3 compliant
      test: print autotest categories
      test: improve filtering

Andrew Rybchenko (7):
      net/sfc: cut non VLAN ID bits from TCI
      net/sfc: discard packets with bad CRC on EF10 ESSB Rx
      net/sfc: fix double-free in EF10 ESSB Rx queue purge
      net/sfc: move Rx checksum offload check to device level
      net/sfc: fix Rx queue offloads reporting in queue info
      net/sfc: fix assert in set multicast address list
      net/sfc: handle unknown L3 packet class in EF10 event parser

Andy Green (2):
      ring: fix declaration after statement
      ring: fix sign conversion warning

Beilei Xing (5):
      net/i40e: fix shifts of 32-bit value
      net/i40e: fix PPPoL2TP packet type parsing
      net/i40e: fix packet type parsing with DDP
      net/i40e: fix setting TPID with AQ command
      net/i40e: fix device parameter parsing

Bruce Richardson (3):
      eal: fix error message for unsupported platforms
      examples/exception_path: fix out-of-bounds read
      mk: fix permissions when using make install

Chas Williams (2):
      net/bonding: always update bonding link status
      net/bonding: do not clear active slave count

Christian Ehrhardt (3):
      FIXUP: net/mlx5: fix invalid network interface index
      version: 18.05.1-rc1
      version: 18.05.1

Damjan Marion (1):
      net/i40e: do not reset device info data

Dan Gora (1):
      kni: fix crash with null name

Daria Kolistratova (1):
      net/ena: fix SIGFPE with 0 Rx queue

Dariusz Stojaczyk (7):
      mem: do not leave unmapped holes in EAL memory area
      mem: do not unmap overlapping region on mmap failure
      mem: avoid crash on memseg query with invalid address
      mem: fix alignment requested with --base-virtaddr
      mem: do not use --base-virtaddr in secondary processes
      eal: fix return codes on thread naming failure
      eal: fix return codes on control thread failure

David Marchand (1):
      net/bnxt: add missing ids in xstats

Drocula Lambda (1):
      kni: fix build on RHEL 7.5

Fan Zhang (1):
      crypto/virtio: fix IV physical address

Ferruh Yigit (4):
      kni: fix build with gcc 8.1
      net/thunderx: fix build with gcc optimization on
      app/testpmd: fix typo in setting Tx offload command
      drivers/net: fix crash in secondary process

Gage Eads (1):
      net: rename u16 to fix shadowed declaration

Gavin Hu (5):
      mk: fix cross build
      devtools: fix ninja command in build test
      build: fix for host clang and cross gcc
      net/dpaa2: remove loop for unused pool entries
      maintainers: claim maintainership for ARM v7 and v8

Haiyue Wang (1):
      net/i40e: workaround performance degradation

Harry van Haaren (2):
      net/i40e: fix rearm check in AVX2 Rx
      event: fix ring init failure handling

Hemant Agrawal (8):
      doc: fix limitations for dpaa crypto
      doc: fix limitations for dpaa2 crypto
      test/crypto: fix device id when stopping port
      bus/dpaa: fix SVR id fetch location
      bus/dpaa: fix buffer offset setting in FMAN
      net/dpaa: fix queue error handling and logs
      net/dpaa2: fix prefetch Rx to honor number of packets
      raw/dpaa2_qdma: fix IOVA as VA flag

Hyong Youb Kim (4):
      net/enic: fix receive packet types
      net/enic: update the UDP RSS detection mechanism
      net/enic: do not overwrite admin Tx queue limit
      net/enic: initialize RQ fetch index before enabling RQ

Ido Goshen (1):
      net/pcap: fix multiple queues

Igor Romanov (1):
      net/sfc: fix filter exceptions logic

Jananee Parthasarathy (1):
      mk: update targets for classified tests

Jay Ding (1):
      net/bnxt: check for invalid vNIC id

Jerin Jacob (3):
      doc: fix octeontx eventdev selftest argument
      ethdev: fix queue statistics mapping documentation
      eal: fix bitmap documentation

Kiran Kumar (3):
      net/bonding: fix MAC address reset
      ethdev: check queue stats mapping input arguments
      net/thunderx: avoid sq door bell write on zero packet

Konstantin Ananyev (3):
      examples/ipsec-secgw: fix IPv4 checksum at Tx
      examples/ipsec-secgw: fix bypass rule processing
      app/testpmd: fix DCB config

Krzysztof Kanas (2):
      app/testpmd: fix crash on TM command error
      app/testpmd: fix help for TM commit command

Lee Daly (1):
      compress/isal: fix offset usage

Matan Azrad (1):
      net/tap: fix zeroed flow mask configurations

Maxime Coquelin (2):
      vhost: fix missing increment of log cache count
      vhost: flush IOTLB cache on new mem table handling

Moti Haimovsky (2):
      net/mlx4: check RSS queues number limitation
      net/mlx4: advertise Rx jumbo frame support

Nelio Laranjeiro (3):
      net/mlx5: clean-up developer logs
      app/testpmd: fix missing count action fields
      net/mlx5: fix TCI mask filter

Nikhil Rao (5):
      eventdev: fix port in Rx adapter internal function
      eventdev: fix missing update to Rx adaper WRR position
      eventdev: add event buffer flush in Rx adapter
      eventdev: fix internal port logic in Rx adapter
      eventdev: fix Rx SW adapter stop

Nithin Dabilpuram (1):
      app/testpmd: fix buffer leak in TM command

Ophir Munk (1):
      net/mlx5: fix secondary process resource leakage

Pablo de Lara (13):
      cryptodev: fix ABI breakage
      net/ixgbe: fix crash on detach
      compress/isal: fix log type name
      compress/isal: set null pointer after freeing
      compress/isal: fix memory leak
      examples/l2fwd-crypto: fix digest with AEAD algo
      examples/l2fwd-crypto: check return value on IV size check
      examples/l2fwd-crypto: skip device not supporting operation
      devtools: remove already enabled nfp from build test
      test/hash: fix multiwriter with non consecutive cores
      test/hash: fix potential memory leak
      app/crypto-perf: fix auth IV offset
      hash: fix doxygen of return values

Pavan Nikhilesh (5):
      event/octeontx: fix flush callback
      mempool/octeontx: fix pool to aura mapping
      app/eventdev: fix order test service init
      event/octeontx: remove unnecessary port start and stop
      net/octeontx: fix stop clearing Rx/Tx functions

Qi Zhang (4):
      eal: fix hotplug add and remove
      vfio: fix PCI address comparison
      vfio: remove uneccessary IPC for group fd clear
      net/ixgbe: fix missing null check on detach

Radu Nicolau (4):
      security: fix crash on destroy null session
      net/bonding: fix invalid port id
      test: fix uninitialized port configuration
      net/bonding: fix race condition

Rafal Kozik (4):
      net/ena: check pointer before memset
      net/ena: change memory type
      net/ena: fix GENMASK_ULL macro
      net/ena: set link speed as none

Rahul Lakkireddy (4):
      net/cxgbe: report configured link auto-negotiation
      net/cxgbe: fix Rx channel map and queue type
      net/cxgbevf: add missing Tx byte counters
      net/cxgbe: fix init failure due to new flash parts

Rami Rosen (2):
      examples/l3fwd: remove useless include
      ethdev: fix a doxygen comment for port allocation

Rasesh Mody (11):
      net/qede: fix VF MTU update
      net/qede: fix for devargs
      net/qede: fix L2-handles used for RSS hash update
      net/qede: fix memory alloc for multiple port reconfig
      net/qede: remove primary MAC removal
      doc: update qede management firmware guide
      net/qede: fix default extended VLAN offload config
      net/qede/base: fix to clear HW indication
      net/qede/base: fix GRC attention callback
      net/bnx2x: fix FW command timeout during stop
      net/bnx2x: fix poll link status

Remy Horton (4):
      bitrate: add sanity check on parameters
      metrics: add check for invalid key
      metrics: do not fail silently when uninitialised
      metrics: disallow null as metric name

Reshma Pattan (3):
      test/flow_classify: fix return types
      mk: remove unnecessary test rules
      latency: free up the memzone

Rosen Xu (1):
      examples/flow_filtering: add flow director config for i40e

Shahaf Shuler (2):
      net/mlx5: separate generic tunnel TSO from the standard one
      net/mlx5: fix build with rdma-core v19

Shahed Shaikh (8):
      net/qede: fix incorrect link status update
      net/qede: fix link change event notification
      net/qede: fix unicast MAC address handling in VF
      net/qede: fix legacy interrupt mode
      net/qede: fix Rx/Tx offload flags
      net/qede: fix interrupt handler unregister
      net/qede: fix MAC address removal failure message
      net/qede: fix ntuple filter configuration

Shaopeng He (1):
      net/i40e: fix Tx queue setup after stop

Shreyansh Jain (1):
      doc: fix bonding command in testpmd

Somnath Kotur (4):
      net/bnxt: revert reset of L2 filter id
      net/bnxt: fix to move a flow to a different queue
      net/bnxt: use correct flags during VLAN configuration
      net/bnxt: fix filter freeing

Stephen Hemminger (2):
      net/mlx5: fix log initialization
      doc: fix typo in vdev_netvsc guide

Thomas Monjalon (2):
      bus/dpaa: fix build
      net/fm10k: remove unused constant

Timothy Redaelli (2):
      net/mlx4: avoid stripping the glue library
      net/mlx5: avoid stripping the glue library

Tiwei Bie (1):
      vhost: release locks on RARP packet failure

Tomasz Duszynski (1):
      net/mvpp2: check pointer before using it

Wei Zhao (7):
      net/ixgbe: add support for VLAN in IP mode FDIR
      net/ixgbe: fix tunnel id format error for FDIR
      net/ixgbe: fix tunnel type set error for FDIR
      net/ixgbe: fix mask bits register set error for FDIR
      app/testpmd: fix VLAN TCI mask set error for FDIR
      net/i40e: fix check of flow director programming status
      net/i40e: revert fix of flow director check

Xiaoxin Peng (1):
      net/bnxt: fix Tx with multiple mbuf

Xiaoyun Li (3):
      net/i40e: fix link speed
      app/testpmd: fix little performance drop
      net/avf: fix offload capabilities

Xueming Li (1):
      net/mlx5: fix crash in device probe

Yaroslav Brustinov (1):
      net/mlx5: fix linkage of glue lib with gcc 4.7.2

Yipeng Wang (3):
      hash: fix multiwriter lock memory allocation
      hash: fix a multi-writer race condition
      hash: fix key slot size accuracy

Yongseok Koh (6):
      net/mlx5: fix error number handling
      net/mlx5: fix Rx buffer replenishment threshold
      net/mlx5: fix assert for Tx completion queue count
      net/mlx5: fix queue rollback when starting device
      net/mlx5: preserve promiscuous flag for flow isolation mode
      net/mlx5: preserve allmulticast flag for flow isolation mode

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-05 22:13  4%     ` Honnappa Nagarahalli
@ 2018-09-06 14:28  3%       ` Michel Machado
  2018-09-12 20:37  2%         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Michel Machado @ 2018-09-06 14:28 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang

On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> +	uint32_t              next;
>> +	uint32_t              total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
> 
>      What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
> 
> This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h

    Okay. We'll move it there.

>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> +	struct rte_hash_iterator_state *state) {
>> +	struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> +	__state = (struct rte_hash_iterator_istate *)state;
>> +	__state->h = h;
>> +	__state->next = 0;
>> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> +	return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
> 
>      Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
> 
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

    It's not just about creating one init function for each iterator 
because an iterator may have a couple of init functions. For example, 
someone may eventually find useful to add another init function for the 
conflicting-entry iterator that we are advocating in this patch. A 
possibility would be for this new init function to use the key of the 
new entry instead of its signature to initialize the state. Similar to 
what is already done in rte_hash_lookup*() functions. In spite of 
possibly having multiple init functions, there will be a single iterator 
function.

    About the performance benchmarking, the current API only requites 
applications to initialize a single 32-bit integer. But with the 
adoption of a struct for the state, the initialization will grow to 64 
bytes.

>>    int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> +	struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
> 
>      The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
> 
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    There's a tiny cost saving: avoiding to push that parameter in the 
execution stack every time the iterator will get another entry. However, 
the reason I find more important is to make impossible to introduce a 
bug in the code. Consider a function that is dealing with two hash 
tables and two iterators. Without asking for the hash table to make 
progress in an iterator, it's impossible to mix up hash tables and 
iterator states.

    There's even the possibility that an iterator doesn't need the hash 
table after its initialization. This would be an *unlikely* case, but 
consider an iterator that only returns a couple of entries. It could 
cache those entries during initialization.

>>    	/* Calculate bucket and index of current iterator */
>> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    
>>    	/* If current position is empty, go to the next one */
>> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> -		(*next)++;
>> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> +		__state->next++;
>>    		/* End of table */
>> -		if (*next == total_entries)
>> +		if (__state->next == __state->total_entries)
>>    			return -ENOENT;
>> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>    	}
>> -	__hash_rw_reader_lock(h);
>> +	__hash_rw_reader_lock(__state->h);
>>    	/* Get position of entry in key table */
>> -	position = h->buckets[bucket_idx].key_idx[idx];
>> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> -				position * h->key_entry_size);
>> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
>> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> +				position * __state->h->key_entry_size);
>>    	/* Return key and data */
>>    	*key = next_key->key;
>>    	*data = next_key->pdata;
>>    
>> -	__hash_rw_reader_unlock(h);
>> +	__hash_rw_reader_unlock(__state->h);
>>    
>>    	/* Increment iterator */
>> -	(*next)++;
>> +	__state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
> 
>      Even though __state->next does not depend on the lock?
> 
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

    If an application needs to share an iterator state between threads, 
it has to have a synchronization mechanism for that as it would for any 
other shared variable. The lock above is allowing applications to share 
a hash table between threads, it has no semantic over anything else.

>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>>    #include <stdint.h>
>>    #include <stddef.h>
>>    
>> +#include <rte_compat.h>
>> +
>>    #ifdef __cplusplus
>>    extern "C" {
>>    #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>    /** @internal A hash table structure. */  struct rte_hash;
>>    
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> +	uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
> 
>      Okay.

    I think we should not replace 64 with RTE_CACHE_LINE_SIZE because 
the ABI would change based on the architecture for which it's compiled.

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-05 20:27  4%             ` Wang, Yipeng1
@ 2018-09-06 13:34  4%               ` Michel Machado
  0 siblings, 0 replies; 200+ results
From: Michel Machado @ 2018-09-06 13:34 UTC (permalink / raw)
  To: Wang, Yipeng1, Qiaobin Fu, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, doucette, Wiles, Keith, Gobriel, Sameh, Tai, Charlie,
	stephen, nd, honnappa.nagarahalli

On 09/05/2018 04:27 PM, Wang, Yipeng1 wrote:
> Hmm I see, it falls back to my original thought to have malloc inside the init function..
> Thanks for the explanation. :)
> 
> So I guess with your implementation, in future if we change the internal state to be larger,
> the ABI will be broken.

    If that happens, yes, the ABI would need to change again. But this 
concern is overblown for two reasons. First, this event is unlikely to 
happen because struct rte_hash_iterator_state is already allocating 64 
bytes while struct rte_hash_iterator_istate and struct 
rte_hash_iterator_conflict_entries_istate consume 16 and 20 bytes, 
respectively. Thus, the complexity of the underlying hash algorithm 
would need to grow substantially to force the necessary state of these 
iterators to grow more than 4x and 3x, respectively. This is unlikely to 
happen, and, if it does, it would likely break the ABI somewhere else 
and have a high impact on applications anyway.

    Second, even if the unlikely event happens, all one would need to do 
is to increase the size of struct rte_hash_iterator_state, mark the new 
API as a new version, and applications would be ready for the new ABI 
just recompiling.

> BTW, this patch set also changes API so proper notice is needed.
> People more familiar with API/ABI change policies may be able to help here.

    We'd be happy to get feedback on this aspect.

> Just to confirm, is there anyway like I said for your application to have some long-live states
> and reuse them throughout the application so that you don’t have to have short-lived ones in stack?

    Two things would need to happen for this to be possible. The init 
functions would need to accept previously allocated iterator states, 
that is, the init function would act as a reset of the state when acting 
on a previous allocated state. And, applications would now need to carry 
these pre-allocated state to avoid a malloc. In order words, we'll 
increase the complexity of the API.

    To emphasize that the cost of a malloc is not negligible, 
rte_malloc() needs to get a spinlock (see heap_alloc_on_socket()), do 
its thing to allocate memory, and, if the first attempt fails, try to 
allocate the memory on other sockets (see end of malloc_heap_alloc()). 
For an iterator that goes through the whole hash table, this cost may be 
okay, but for an iterator that goes through a couple entries, this cost 
is a lot to add.

    This memory allocation concern is not new. Function 
rte_pktmbuf_read(), for example, let applications pass buffers, which 
are often allocated in the execution stack, to avoid the malloc cost.

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
  2018-09-06  6:29  3% ` Andrew Rybchenko
@ 2018-09-06 10:52  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-09-06 10:52 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev

On Thu, 6 Sep 2018 09:29:32 +0300
Andrew Rybchenko <arybchenko@solarflare.com> wrote:

> On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > This addresses the usability issue raised by OVS at DPDK Userspace
> > summit. It adds general min/max mtu into device info. For compatiablity,
> > and to save space, it fits in a hole in existing structure.  
> 
> It is true for amd64, but it looks like it is false on 32-bit. So, ABI 
> breakage.

Yes it is ABI change on 32 bit, but 18.11 is a major release where
this is allowed/expected.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
  @ 2018-09-06  6:29  3% ` Andrew Rybchenko
  2018-09-06 10:52  3%   ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-06  6:29 UTC (permalink / raw)
  To: Stephen Hemminger, dev

On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> This addresses the usability issue raised by OVS at DPDK Userspace
> summit. It adds general min/max mtu into device info. For compatiablity,
> and to save space, it fits in a hole in existing structure.

It is true for amd64, but it looks like it is false on 32-bit. So, ABI 
breakage.

> The initial version sets max mtu to normal Ethernet, it is up to
> PMD to set larger value if it supports Jumbo frames.
>
> Fixing the drivers to use this is trivial and can be done by 18.11.
> Already have some of the patches done.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   lib/librte_ethdev/rte_ethdev.c | 7 +++++++
>   lib/librte_ethdev/rte_ethdev.h | 2 ++
>   2 files changed, 9 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 4c320250589a..df0c7536a7c4 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -2408,6 +2408,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
>   	dev_info->rx_desc_lim = lim;
>   	dev_info->tx_desc_lim = lim;
>   	dev_info->device = dev->device;
> +	dev_info->min_mtu = ETHER_MIN_MTU;
> +	dev_info->max_mtu = ETHER_MTU;
>   
>   	RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>   	(*dev->dev_ops->dev_infos_get)(dev, dev_info);
> @@ -2471,12 +2473,17 @@ int
>   rte_eth_dev_set_mtu(uint16_t port_id, uint16_t mtu)
>   {
>   	int ret;
> +	struct rte_eth_dev_info dev_info;
>   	struct rte_eth_dev *dev;
>   
>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>   	dev = &rte_eth_devices[port_id];
>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -ENOTSUP);
>   
> +	rte_eth_dev_info_get(port_id, &dev_info);
> +	if (mtu < dev_info.min_mtu || mtu > dev_info.max_mtu)
> +		return -EINVAL;
> +

The check breaks set MTU to value larger than ETHER_MTU for not
updated drivers. So, IMHO, it should be pushed only with appropriate
updates in all drivers which support bigger MTU.

>   	ret = (*dev->dev_ops->mtu_set)(dev, mtu);
>   	if (!ret)
>   		dev->data->mtu = mtu;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 7070e9ab408f..5171a9083288 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1015,6 +1015,8 @@ struct rte_eth_dev_info {
>   	const char *driver_name; /**< Device Driver name. */
>   	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>   		Use if_indextoname() to translate into an interface name. */
> +	uint16_t min_mtu;	/**< Minimum MTU allowed */
> +	uint16_t max_mtu;	/**< Maximum MTU allowed */
>   	const uint32_t *dev_flags; /**< Device flags */
>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-04 19:36  4%   ` Michel Machado
@ 2018-09-05 22:13  4%     ` Honnappa Nagarahalli
  2018-09-06 14:28  3%       ` Michel Machado
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-05 22:13 UTC (permalink / raw)
  To: Michel Machado, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang



-----Original Message-----
From: Michel Machado <michel@digirati.com.br> 
Sent: Tuesday, September 4, 2018 2:37 PM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries

Hi Honnappa,

On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:
> +/* istate stands for internal state. */ struct 
> +rte_hash_iterator_istate {
> +	const struct rte_hash *h;
> This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

    Discussed later.

> +	uint32_t              next;
> +	uint32_t              total_entries;
> +};
> This structure can be moved to rte_cuckoo_hash.h file.

    What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?

This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h

> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> +	struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced by 's'.
> 
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
> +	__state->h = h;
> +	__state->next = 0;
> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> +	return 0;
> +}
> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

    Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?

My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.

>   int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void 
> **data, uint32_t *next)
> +rte_hash_iterate(
> +	struct rte_hash_iterator_state *state, const void **key, void 
> +**data)
> 
> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
> rte_hash_iterate(const struct rte_hash *h, const void **key, void 
> **data, struct rte_hash_iterator_state *state)
> 
> This will help keep the API signature consistent with existing APIs.
> 
> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

    The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?

Are there any cost savings we can achieve by keeping the 'h' in the iterator state?

    Thank you for the link. We'll check how to proceed with the ABI change.

>   {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced with 's'.

    Gaëtan Rivet has already pointed this out in his review of this version of our patch.

>   	uint32_t bucket_idx, idx, position;
>   	struct rte_hash_key *next_key;
>   
> -	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
>   
> -	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>   	/* Out of bounds */
> -	if (*next >= total_entries)
> +	if (__state->next >= __state->total_entries)
>   		return -ENOENT;
>   
> 'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

    The argument to keep _init() is presented above in this email.

>   	/* Calculate bucket and index of current iterator */
> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   
>   	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> -		(*next)++;
> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		__state->next++;
>   		/* End of table */
> -		if (*next == total_entries)
> +		if (__state->next == __state->total_entries)
>   			return -ENOENT;
> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   	}
> -	__hash_rw_reader_lock(h);
> +	__hash_rw_reader_lock(__state->h);
>   	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> -				position * h->key_entry_size);
> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> +				position * __state->h->key_entry_size);
>   	/* Return key and data */
>   	*key = next_key->key;
>   	*data = next_key->pdata;
>   
> -	__hash_rw_reader_unlock(h);
> +	__hash_rw_reader_unlock(__state->h);
>   
>   	/* Increment iterator */
> -	(*next)++;
> +	__state->next++;
> This comment is not related to this change, it is better to place this inside the lock.

    Even though __state->next does not depend on the lock?

It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?

>   	return position - 1;
>   }
> +
> +/* istate stands for internal state. */ struct 
> +rte_hash_iterator_conflict_entries_istate {
> +	const struct rte_hash *h;
> This can be moved outside of this structure.

    Discussed earlier.

> +	uint32_t              vnext;
> +	uint32_t              primary_bidx;
> +	uint32_t              secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> +	hash_sig_t sig, struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +	__state->h = h;
> +	__state->vnext = 0;
> +
> +	/* Get the primary bucket index given the precomputed hash value. */
> +	__state->primary_bidx = sig & h->bucket_bitmask;
> +	/* Get the secondary bucket index given the precomputed hash value. */
> +	__state->secondary_bidx =
> +		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> +	return 0;
> +}
> IMO, as mentioned above, it is possible to avoid creating this API.

    Discussed earlier.

> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> +	struct rte_hash_iterator_state *state, const void **key, void 
> +**data)
> Signature of this API can be changed as follows:
> rte_hash_iterate_conflict_entries(
> 	struct rte_hash *h, const void **key, void **data, struct 
> rte_hash_iterator_state *state)

    Discussed earlier.

> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> +	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> +		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> +			__state->primary_bidx : __state->secondary_bidx;
> +		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
> 
> take the reader lock before reading bucket entry

    Thanks for pointing this out. We are going to do so. The lock came in as we go through the versions of this patch.

> +		uint32_t position = __state->h->buckets[bidx].key_idx[next];
> +		struct rte_hash_key *next_key;
> +
> +		/* Increment iterator. */
> +		__state->vnext++;
> +
> +		/*
> +		 * The test below is unlikely because this iterator is meant
> +		 * to be used after a failed insert.
> +		 */
> +		if (unlikely(position == EMPTY_SLOT))
> +			continue;
> +
> +		/* Get the entry in key table. */
> +		next_key = (struct rte_hash_key *) (
> +			(char *)__state->h->key_store +
> +			position * __state->h->key_entry_size);
> +		/* Return key and data. */
> +		*key = next_key->key;
> +		*data = next_key->pdata;
> give the reader lock

    We'll do so.

> +
> +		return position - 1;
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h 
> index 9e7d9315f..fdb01023e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -14,6 +14,8 @@
>   #include <stdint.h>
>   #include <stddef.h>
>   
> +#include <rte_compat.h>
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>   /** @internal A hash table structure. */  struct rte_hash;
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal A hash table iterator state structure.
> + */
> +struct rte_hash_iterator_state {
> +	uint8_t space[64];
> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.

    Okay.

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  @ 2018-09-05 20:27  4%             ` Wang, Yipeng1
  2018-09-06 13:34  4%               ` Michel Machado
  0 siblings, 1 reply; 200+ results
From: Wang, Yipeng1 @ 2018-09-05 20:27 UTC (permalink / raw)
  To: Michel Machado, Qiaobin Fu, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, doucette, Wiles, Keith, Gobriel, Sameh, Tai, Charlie,
	stephen, nd, honnappa.nagarahalli

Hmm I see, it falls back to my original thought to have malloc inside the init function..
Thanks for the explanation. :) 

So I guess with your implementation, in future if we change the internal state to be larger,
the ABI will be broken. BTW, this patch set also changes API so proper notice is needed.
People more familiar with API/ABI change policies may be able to help here.

Just to confirm, is there anyway like I said for your application to have some long-live states
and reuse them throughout the application so that you don’t have to have short-lived ones in stack?

Thanks
Yipeng

>
>    The fact that struct rte_hash does not expose its private fields but
>only its type to applications means that a compiler cannot find out the
>byte length of struct rte_hash using only the header rte_hash.h. Thus,
>an application cannot allocate memory on its own (e.g. as a local
>variable) for a struct rte_hash. An application can, however, have a
>pointer to a struct rte_hash since the byte length of a pointer only
>depends on the architecture of the machine. This is the motivation
>behind having struct rte_hash_iterator_state in rte_hash.h only holding
>an array of bytes.
>
>    There are good reasons to implement struct rte_hash as it is. For
>examples, struct rte_hash can change its byte length between versions of
>DPDK even if applications are dynamically linked to DPDK and not
>recompiled. Moreover a hash table is unlikely to be so short-lived as an
>iterator.
>
>[ ]'s
>Michel Machado

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] 18.05.1 patches review and test
  2018-08-27  9:29  0% ` Christian Ehrhardt
  2018-08-27 10:30  0%   ` [dpdk-dev] [dpdk-stable] " Marco Varlese
  2018-08-27 10:31  0%   ` Marco Varlese
@ 2018-09-05 14:41  0%   ` Christian Ehrhardt
  2 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2018-09-05 14:41 UTC (permalink / raw)
  To: stable; +Cc: dev

On Mon, Aug 27, 2018 at 11:29 AM Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

>
>
> On Wed, Aug 22, 2018 at 9:26 AM Christian Ehrhardt <
> christian.ehrhardt@canonical.com> wrote:
>
>> Hi all,
>>
>> Here is a list of patches targeted for stable release 18.05.1. Please
>> help review and test. The planned date for the final release is August,
>> 29th. Before that, please shout if anyone has objections with these
>> patches being applied.
>>
>
> There was neither positive nor negative feedback on 18.05.1-rc1 so far.
> Maybe 17.11.x priorities and general PTO time just beats 18.05 - which is
> fine to some extend.
> The only private message I got was about one party needing some extra time.
> For all of the above I will do two things:
> 1. the deadline to get back with results on 18.05.1-rc1 is extended to
> Tuesday the 4th of September
> 2. I'd highly appreciate feedback of people involved that intend to test
> it so I know what to wait for (or not)
>

I had multiple positive off-list feedbacks and the deadline has passed.
The TL;DR was always "no errors" or "only errors that already existed with
18.05(.0)"
So I'm gonna release 18.05.1 as-is today.

Thanks everybody who was running tests on this!


> Also for the companies committed to running regression tests,
>> please run the tests and report any issue before the release date.
>>
>> A release candidate tarball can be found at:
>>
>>     https://dpdk.org/browse/dpdk-stable/tag/?id=v18.05.1-rc1
>>
>> These patches are located at branch 18.05 of dpdk-stable repo:
>>
>>     https://git.dpdk.org/dpdk-stable/log/?h=18.05
>>
>> Thanks.
>>
>> Christian Ehrhardt <christian.ehrhardt@canonical.com>
>>
>> ---
>> Adrien Mazarguil (8):
>>       app/testpmd: fix crash when attaching a device
>>       net/mlx4: fix minor resource leak during init
>>       net/mlx5: fix errno object in probe function
>>       net/mlx5: fix missing errno in probe function
>>       net/mlx5: fix error message in probe function
>>       net/mlx5: fix invalid error check
>>       maintainers: update for Mellanox PMDs
>>       net/mlx5: fix invalid network interface index
>>
>> Ajit Khaparde (11):
>>       net/bnxt: fix clear port stats
>>       net/bnxt: fix close operation
>>       net/bnxt: fix HW Tx checksum offload check
>>       net/bnxt: check filter type before clearing it
>>       net/bnxt: fix set MTU
>>       net/bnxt: fix incorrect IO address handling in Tx
>>       net/bnxt: fix Rx ring count limitation
>>       net/bnxt: fix memory leaks in NVM commands
>>       net/bnxt: fix lock release on NVM write failure
>>       net/bnxt: check access denied for HWRM commands
>>       net/bnxt: fix RETA size
>>
>> Alejandro Lucero (2):
>>       net/nfp: fix unused header reference
>>       net/nfp: fix field initialization in Tx descriptor
>>
>> Alok Makhariya (1):
>>       bus/dpaa: fix phandle support for Linux 4.16
>>
>> Anatoly Burakov (14):
>>       ipc: fix locking while sending messages
>>       mem: fix alignment of requested virtual areas
>>       eal/bsd: fix memory segment index display
>>       malloc: fix pad erasing
>>       eal/linux: fix invalid syntax in interrupts
>>       eal/linux: fix uninitialized value
>>       vfio: fix uninitialized variable
>>       malloc: do not skip pad on free
>>       test: fix EAL flags autotest on FreeBSD
>>       test: fix result printing
>>       test: fix code on report
>>       test: make autotest runner python 2/3 compliant
>>       test: print autotest categories
>>       test: improve filtering
>>
>> Andrew Rybchenko (7):
>>       net/sfc: cut non VLAN ID bits from TCI
>>       net/sfc: discard packets with bad CRC on EF10 ESSB Rx
>>       net/sfc: fix double-free in EF10 ESSB Rx queue purge
>>       net/sfc: move Rx checksum offload check to device level
>>       net/sfc: fix Rx queue offloads reporting in queue info
>>       net/sfc: fix assert in set multicast address list
>>       net/sfc: handle unknown L3 packet class in EF10 event parser
>>
>> Andy Green (2):
>>       ring: fix declaration after statement
>>       ring: fix sign conversion warning
>>
>> Beilei Xing (5):
>>       net/i40e: fix shifts of 32-bit value
>>       net/i40e: fix PPPoL2TP packet type parsing
>>       net/i40e: fix packet type parsing with DDP
>>       net/i40e: fix setting TPID with AQ command
>>       net/i40e: fix device parameter parsing
>>
>> Bruce Richardson (3):
>>       eal: fix error message for unsupported platforms
>>       examples/exception_path: fix out-of-bounds read
>>       mk: fix permissions when using make install
>>
>> Chas Williams (2):
>>       net/bonding: always update bonding link status
>>       net/bonding: do not clear active slave count
>>
>> Christian Ehrhardt (2):
>>       FIXUP: net/mlx5: fix invalid network interface index
>>       version: 18.05.1-rc1
>>
>> Damjan Marion (1):
>>       net/i40e: do not reset device info data
>>
>> Dan Gora (1):
>>       kni: fix crash with null name
>>
>> Daria Kolistratova (1):
>>       net/ena: fix SIGFPE with 0 Rx queue
>>
>> Dariusz Stojaczyk (7):
>>       mem: do not leave unmapped holes in EAL memory area
>>       mem: do not unmap overlapping region on mmap failure
>>       mem: avoid crash on memseg query with invalid address
>>       mem: fix alignment requested with --base-virtaddr
>>       mem: do not use --base-virtaddr in secondary processes
>>       eal: fix return codes on thread naming failure
>>       eal: fix return codes on control thread failure
>>
>> David Marchand (1):
>>       net/bnxt: add missing ids in xstats
>>
>> Drocula Lambda (1):
>>       kni: fix build on RHEL 7.5
>>
>> Fan Zhang (1):
>>       crypto/virtio: fix IV physical address
>>
>> Ferruh Yigit (4):
>>       kni: fix build with gcc 8.1
>>       net/thunderx: fix build with gcc optimization on
>>       app/testpmd: fix typo in setting Tx offload command
>>       drivers/net: fix crash in secondary process
>>
>> Gage Eads (1):
>>       net: rename u16 to fix shadowed declaration
>>
>> Gavin Hu (5):
>>       mk: fix cross build
>>       devtools: fix ninja command in build test
>>       build: fix for host clang and cross gcc
>>       net/dpaa2: remove loop for unused pool entries
>>       maintainers: claim maintainership for ARM v7 and v8
>>
>> Haiyue Wang (1):
>>       net/i40e: workaround performance degradation
>>
>> Harry van Haaren (2):
>>       net/i40e: fix rearm check in AVX2 Rx
>>       event: fix ring init failure handling
>>
>> Hemant Agrawal (8):
>>       doc: fix limitations for dpaa crypto
>>       doc: fix limitations for dpaa2 crypto
>>       test/crypto: fix device id when stopping port
>>       bus/dpaa: fix SVR id fetch location
>>       bus/dpaa: fix buffer offset setting in FMAN
>>       net/dpaa: fix queue error handling and logs
>>       net/dpaa2: fix prefetch Rx to honor number of packets
>>       raw/dpaa2_qdma: fix IOVA as VA flag
>>
>> Hyong Youb Kim (4):
>>       net/enic: fix receive packet types
>>       net/enic: update the UDP RSS detection mechanism
>>       net/enic: do not overwrite admin Tx queue limit
>>       net/enic: initialize RQ fetch index before enabling RQ
>>
>> Ido Goshen (1):
>>       net/pcap: fix multiple queues
>>
>> Igor Romanov (1):
>>       net/sfc: fix filter exceptions logic
>>
>> Jananee Parthasarathy (1):
>>       mk: update targets for classified tests
>>
>> Jay Ding (1):
>>       net/bnxt: check for invalid vNIC id
>>
>> Jerin Jacob (3):
>>       doc: fix octeontx eventdev selftest argument
>>       ethdev: fix queue statistics mapping documentation
>>       eal: fix bitmap documentation
>>
>> Kiran Kumar (3):
>>       net/bonding: fix MAC address reset
>>       ethdev: check queue stats mapping input arguments
>>       net/thunderx: avoid sq door bell write on zero packet
>>
>> Konstantin Ananyev (3):
>>       examples/ipsec-secgw: fix IPv4 checksum at Tx
>>       examples/ipsec-secgw: fix bypass rule processing
>>       app/testpmd: fix DCB config
>>
>> Krzysztof Kanas (2):
>>       app/testpmd: fix crash on TM command error
>>       app/testpmd: fix help for TM commit command
>>
>> Lee Daly (1):
>>       compress/isal: fix offset usage
>>
>> Matan Azrad (1):
>>       net/tap: fix zeroed flow mask configurations
>>
>> Maxime Coquelin (2):
>>       vhost: fix missing increment of log cache count
>>       vhost: flush IOTLB cache on new mem table handling
>>
>> Moti Haimovsky (2):
>>       net/mlx4: check RSS queues number limitation
>>       net/mlx4: advertise Rx jumbo frame support
>>
>> Nelio Laranjeiro (3):
>>       net/mlx5: clean-up developer logs
>>       app/testpmd: fix missing count action fields
>>       net/mlx5: fix TCI mask filter
>>
>> Nikhil Rao (5):
>>       eventdev: fix port in Rx adapter internal function
>>       eventdev: fix missing update to Rx adaper WRR position
>>       eventdev: add event buffer flush in Rx adapter
>>       eventdev: fix internal port logic in Rx adapter
>>       eventdev: fix Rx SW adapter stop
>>
>> Nithin Dabilpuram (1):
>>       app/testpmd: fix buffer leak in TM command
>>
>> Ophir Munk (1):
>>       net/mlx5: fix secondary process resource leakage
>>
>> Pablo de Lara (13):
>>       cryptodev: fix ABI breakage
>>       net/ixgbe: fix crash on detach
>>       compress/isal: fix log type name
>>       compress/isal: set null pointer after freeing
>>       compress/isal: fix memory leak
>>       examples/l2fwd-crypto: fix digest with AEAD algo
>>       examples/l2fwd-crypto: check return value on IV size check
>>       examples/l2fwd-crypto: skip device not supporting operation
>>       devtools: remove already enabled nfp from build test
>>       test/hash: fix multiwriter with non consecutive cores
>>       test/hash: fix potential memory leak
>>       app/crypto-perf: fix auth IV offset
>>       hash: fix doxygen of return values
>>
>> Pavan Nikhilesh (5):
>>       event/octeontx: fix flush callback
>>       mempool/octeontx: fix pool to aura mapping
>>       app/eventdev: fix order test service init
>>       event/octeontx: remove unnecessary port start and stop
>>       net/octeontx: fix stop clearing Rx/Tx functions
>>
>> Qi Zhang (4):
>>       eal: fix hotplug add and remove
>>       vfio: fix PCI address comparison
>>       vfio: remove uneccessary IPC for group fd clear
>>       net/ixgbe: fix missing null check on detach
>>
>> Radu Nicolau (4):
>>       security: fix crash on destroy null session
>>       net/bonding: fix invalid port id
>>       test: fix uninitialized port configuration
>>       net/bonding: fix race condition
>>
>> Rafal Kozik (4):
>>       net/ena: check pointer before memset
>>       net/ena: change memory type
>>       net/ena: fix GENMASK_ULL macro
>>       net/ena: set link speed as none
>>
>> Rahul Lakkireddy (4):
>>       net/cxgbe: report configured link auto-negotiation
>>       net/cxgbe: fix Rx channel map and queue type
>>       net/cxgbevf: add missing Tx byte counters
>>       net/cxgbe: fix init failure due to new flash parts
>>
>> Rami Rosen (2):
>>       examples/l3fwd: remove useless include
>>       ethdev: fix a doxygen comment for port allocation
>>
>> Rasesh Mody (11):
>>       net/qede: fix VF MTU update
>>       net/qede: fix for devargs
>>       net/qede: fix L2-handles used for RSS hash update
>>       net/qede: fix memory alloc for multiple port reconfig
>>       net/qede: remove primary MAC removal
>>       doc: update qede management firmware guide
>>       net/qede: fix default extended VLAN offload config
>>       net/qede/base: fix to clear HW indication
>>       net/qede/base: fix GRC attention callback
>>       net/bnx2x: fix FW command timeout during stop
>>       net/bnx2x: fix poll link status
>>
>> Remy Horton (4):
>>       bitrate: add sanity check on parameters
>>       metrics: add check for invalid key
>>       metrics: do not fail silently when uninitialised
>>       metrics: disallow null as metric name
>>
>> Reshma Pattan (3):
>>       test/flow_classify: fix return types
>>       mk: remove unnecessary test rules
>>       latency: free up the memzone
>>
>> Rosen Xu (1):
>>       examples/flow_filtering: add flow director config for i40e
>>
>> Shahaf Shuler (2):
>>       net/mlx5: separate generic tunnel TSO from the standard one
>>       net/mlx5: fix build with rdma-core v19
>>
>> Shahed Shaikh (8):
>>       net/qede: fix incorrect link status update
>>       net/qede: fix link change event notification
>>       net/qede: fix unicast MAC address handling in VF
>>       net/qede: fix legacy interrupt mode
>>       net/qede: fix Rx/Tx offload flags
>>       net/qede: fix interrupt handler unregister
>>       net/qede: fix MAC address removal failure message
>>       net/qede: fix ntuple filter configuration
>>
>> Shaopeng He (1):
>>       net/i40e: fix Tx queue setup after stop
>>
>> Shreyansh Jain (1):
>>       doc: fix bonding command in testpmd
>>
>> Somnath Kotur (4):
>>       net/bnxt: revert reset of L2 filter id
>>       net/bnxt: fix to move a flow to a different queue
>>       net/bnxt: use correct flags during VLAN configuration
>>       net/bnxt: fix filter freeing
>>
>> Stephen Hemminger (2):
>>       net/mlx5: fix log initialization
>>       doc: fix typo in vdev_netvsc guide
>>
>> Thomas Monjalon (2):
>>       bus/dpaa: fix build
>>       net/fm10k: remove unused constant
>>
>> Timothy Redaelli (2):
>>       net/mlx4: avoid stripping the glue library
>>       net/mlx5: avoid stripping the glue library
>>
>> Tiwei Bie (1):
>>       vhost: release locks on RARP packet failure
>>
>> Tomasz Duszynski (1):
>>       net/mvpp2: check pointer before using it
>>
>> Wei Zhao (7):
>>       net/ixgbe: add support for VLAN in IP mode FDIR
>>       net/ixgbe: fix tunnel id format error for FDIR
>>       net/ixgbe: fix tunnel type set error for FDIR
>>       net/ixgbe: fix mask bits register set error for FDIR
>>       app/testpmd: fix VLAN TCI mask set error for FDIR
>>       net/i40e: fix check of flow director programming status
>>       net/i40e: revert fix of flow director check
>>
>> Xiaoxin Peng (1):
>>       net/bnxt: fix Tx with multiple mbuf
>>
>> Xiaoyun Li (3):
>>       net/i40e: fix link speed
>>       app/testpmd: fix little performance drop
>>       net/avf: fix offload capabilities
>>
>> Xueming Li (1):
>>       net/mlx5: fix crash in device probe
>>
>> Yaroslav Brustinov (1):
>>       net/mlx5: fix linkage of glue lib with gcc 4.7.2
>>
>> Yipeng Wang (3):
>>       hash: fix multiwriter lock memory allocation
>>       hash: fix a multi-writer race condition
>>       hash: fix key slot size accuracy
>>
>> Yongseok Koh (6):
>>       net/mlx5: fix error number handling
>>       net/mlx5: fix Rx buffer replenishment threshold
>>       net/mlx5: fix assert for Tx completion queue count
>>       net/mlx5: fix queue rollback when starting device
>>       net/mlx5: preserve promiscuous flag for flow isolation mode
>>       net/mlx5: preserve allmulticast flag for flow isolation mode
>>
>
>
> --
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  2018-09-04  0:47  0%                         ` Dan Gora
@ 2018-09-05 12:57  0%                           ` Stephen Hemminger
    0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-09-05 12:57 UTC (permalink / raw)
  To: Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev

On Mon, 3 Sep 2018 21:47:22 -0300
Dan Gora <dg@adax.com> wrote:

> Hi All,
> 
> One other problem with using the sysfs method to change the link state
> rather than this ioctl method.  The sysfs/netdev method to change the
> carrier was only introduced in kernel 3.9.  For older kernels, we
> would just be out of luck.  The ioctl method will work with any kernel
> version (2.6+).  It's not clear if this is a problem for DPDK apps or
> not.
> 
> thanks
> dan
> 
> 
> On Thu, Aug 30, 2018 at 7:11 PM, Dan Gora <dg@adax.com> wrote:
> > On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> >> On Thu, 30 Aug 2018 18:41:14 -0300
> >> Dan Gora <dg@adax.com> wrote:
> >>  
> >>> On the other hand, the "write to /sys" method is a bit more simple and
> >>> confines the changes to the user space library.  If we're confident
> >>> that the /sys ABI is stable and not going to be changed going forward
> >>> it seems like a valid alternative.  
> >>
> >> See Documentation/ABI/testing/sysfs-class-net  
> >
> > yeah, but it's in the 'testing' directory :)
> >
> > From Documentation/ABI/README:
> >
> > testing/
> >
> > This directory documents interfaces that are felt to be stable,
> > as the main development of this interface has been completed.
> > The interface can be changed to add new features, but the
> > current interface will not break by doing this, unless grave
> > errors or security problems are found in them.  Userspace
> > programs can start to rely on these interfaces, but they must be
> > aware of changes that can occur before these interfaces move to
> > be marked stable.  Programs that use these interfaces are
> > strongly encouraged to add their name to the description of
> > these interfaces, so that the kernel developers can easily
> > notify them if any changes occur (see the description of the
> > layout of the files below for details on how to do this.)
> >
> > Like I said, I'm ok with using this if that's what everyone wants to do.
> >
> > d  
> 
> 
> 

Linux 3.9 is no longer supported. Currently, upstream Linux kernel is supported
from 3.16 on. If someone is on a kernel that old, they aren't going to get
any security fixes.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  2018-09-02 22:05  2% ` Honnappa Nagarahalli
@ 2018-09-04 19:36  4%   ` Michel Machado
  2018-09-05 22:13  4%     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Michel Machado @ 2018-09-04 19:36 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang

Hi Honnappa,

On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:
> +/* istate stands for internal state. */ struct rte_hash_iterator_istate
> +{
> +	const struct rte_hash *h;
> This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

    Discussed later.

> +	uint32_t              next;
> +	uint32_t              total_entries;
> +};
> This structure can be moved to rte_cuckoo_hash.h file.

    What's the purpose of moving this struct to a header file since it's 
only used in the C file rte_cuckoo_hash.c?

> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> +	struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced by 's'.
> 
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
> +	__state->h = h;
> +	__state->next = 0;
> +	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> +	return 0;
> +}
> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

    Applications would have to initialize struct rte_hash_iterator_state 
*state before calling rte_hash_iterate() anyway. Why not initializing 
the fields of a state only once?

>   int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
> +rte_hash_iterate(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> 
> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
> rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)
> 
> This will help keep the API signature consistent with existing APIs.
> 
> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

    The ABI will change in a way or another, so why not going for a 
single state instead of requiring parameters that are already needed for 
the initialization of the state?

    Thank you for the link. We'll check how to proceed with the ABI change.

>   {
> +	struct rte_hash_iterator_istate *__state;
> '__state' can be replaced with 's'.

    Gaëtan Rivet has already pointed this out in his review of this 
version of our patch.

>   	uint32_t bucket_idx, idx, position;
>   	struct rte_hash_key *next_key;
>   
> -	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_istate *)state;
>   
> -	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>   	/* Out of bounds */
> -	if (*next >= total_entries)
> +	if (__state->next >= __state->total_entries)
>   		return -ENOENT;
>   
> 'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

    The argument to keep _init() is presented above in this email.

>   	/* Calculate bucket and index of current iterator */
> -	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -	idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   
>   	/* If current position is empty, go to the next one */
> -	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> -		(*next)++;
> +	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> +		__state->next++;
>   		/* End of table */
> -		if (*next == total_entries)
> +		if (__state->next == __state->total_entries)
>   			return -ENOENT;
> -		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> -		idx = *next % RTE_HASH_BUCKET_ENTRIES;
> +		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> +		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>   	}
> -	__hash_rw_reader_lock(h);
> +	__hash_rw_reader_lock(__state->h);
>   	/* Get position of entry in key table */
> -	position = h->buckets[bucket_idx].key_idx[idx];
> -	next_key = (struct rte_hash_key *) ((char *)h->key_store +
> -				position * h->key_entry_size);
> +	position = __state->h->buckets[bucket_idx].key_idx[idx];
> +	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> +				position * __state->h->key_entry_size);
>   	/* Return key and data */
>   	*key = next_key->key;
>   	*data = next_key->pdata;
>   
> -	__hash_rw_reader_unlock(h);
> +	__hash_rw_reader_unlock(__state->h);
>   
>   	/* Increment iterator */
> -	(*next)++;
> +	__state->next++;
> This comment is not related to this change, it is better to place this inside the lock.

    Even though __state->next does not depend on the lock?

>   	return position - 1;
>   }
> +
> +/* istate stands for internal state. */ struct
> +rte_hash_iterator_conflict_entries_istate {
> +	const struct rte_hash *h;
> This can be moved outside of this structure.

    Discussed earlier.

> +	uint32_t              vnext;
> +	uint32_t              primary_bidx;
> +	uint32_t              secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> +	hash_sig_t sig, struct rte_hash_iterator_state *state) {
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +	__state->h = h;
> +	__state->vnext = 0;
> +
> +	/* Get the primary bucket index given the precomputed hash value. */
> +	__state->primary_bidx = sig & h->bucket_bitmask;
> +	/* Get the secondary bucket index given the precomputed hash value. */
> +	__state->secondary_bidx =
> +		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> +	return 0;
> +}
> IMO, as mentioned above, it is possible to avoid creating this API.

    Discussed earlier.

> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> +	struct rte_hash_iterator_state *state, const void **key, void **data)
> Signature of this API can be changed as follows:
> rte_hash_iterate_conflict_entries(
> 	struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

    Discussed earlier.

> +{
> +	struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> +	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> +		(data == NULL)), -EINVAL);
> +
> +	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> +	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> +		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> +			__state->primary_bidx : __state->secondary_bidx;
> +		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
> 
> take the reader lock before reading bucket entry

    Thanks for pointing this out. We are going to do so. The lock came 
in as we go through the versions of this patch.

> +		uint32_t position = __state->h->buckets[bidx].key_idx[next];
> +		struct rte_hash_key *next_key;
> +
> +		/* Increment iterator. */
> +		__state->vnext++;
> +
> +		/*
> +		 * The test below is unlikely because this iterator is meant
> +		 * to be used after a failed insert.
> +		 */
> +		if (unlikely(position == EMPTY_SLOT))
> +			continue;
> +
> +		/* Get the entry in key table. */
> +		next_key = (struct rte_hash_key *) (
> +			(char *)__state->h->key_store +
> +			position * __state->h->key_entry_size);
> +		/* Return key and data. */
> +		*key = next_key->key;
> +		*data = next_key->pdata;
> give the reader lock

    We'll do so.

> +
> +		return position - 1;
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index 9e7d9315f..fdb01023e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -14,6 +14,8 @@
>   #include <stdint.h>
>   #include <stddef.h>
>   
> +#include <rte_compat.h>
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>   /** @internal A hash table structure. */  struct rte_hash;
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal A hash table iterator state structure.
> + */
> +struct rte_hash_iterator_state {
> +	uint8_t space[64];
> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.

    Okay.

[ ]'s
Michel Machado

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  2018-08-30 22:11  0%                       ` Dan Gora
@ 2018-09-04  0:47  0%                         ` Dan Gora
  2018-09-05 12:57  0%                           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Dan Gora @ 2018-09-04  0:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Igor Ryzhov, Ferruh Yigit, dev

Hi All,

One other problem with using the sysfs method to change the link state
rather than this ioctl method.  The sysfs/netdev method to change the
carrier was only introduced in kernel 3.9.  For older kernels, we
would just be out of luck.  The ioctl method will work with any kernel
version (2.6+).  It's not clear if this is a problem for DPDK apps or
not.

thanks
dan


On Thu, Aug 30, 2018 at 7:11 PM, Dan Gora <dg@adax.com> wrote:
> On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Thu, 30 Aug 2018 18:41:14 -0300
>> Dan Gora <dg@adax.com> wrote:
>>
>>> On the other hand, the "write to /sys" method is a bit more simple and
>>> confines the changes to the user space library.  If we're confident
>>> that the /sys ABI is stable and not going to be changed going forward
>>> it seems like a valid alternative.
>>
>> See Documentation/ABI/testing/sysfs-class-net
>
> yeah, but it's in the 'testing' directory :)
>
> From Documentation/ABI/README:
>
> testing/
>
> This directory documents interfaces that are felt to be stable,
> as the main development of this interface has been completed.
> The interface can be changed to add new features, but the
> current interface will not break by doing this, unless grave
> errors or security problems are found in them.  Userspace
> programs can start to rely on these interfaces, but they must be
> aware of changes that can occur before these interfaces move to
> be marked stable.  Programs that use these interfaces are
> strongly encouraged to add their name to the description of
> these interfaces, so that the kernel developers can easily
> notify them if any changes occur (see the description of the
> layout of the files below for details on how to do this.)
>
> Like I said, I'm ok with using this if that's what everyone wants to do.
>
> d



-- 
Dan Gora
Software Engineer

Adax, Inc.
Rua Dona Maria Alves, 1070 Casa 5
Centro
Ubatuba, SP
CEP 11680-000
Brasil

Tel: +55 (12) 3833-1021  (Brazil and outside of US)
    : +1 (510) 859-4801  (Inside of US)
    : dan_gora (Skype)

email: dg@adax.com

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
  @ 2018-09-02 22:05  2% ` Honnappa Nagarahalli
  2018-09-04 19:36  4%   ` Michel Machado
    1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-02 22:05 UTC (permalink / raw)
  To: Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
	nd, yipeng1.wang, michel

Hi Qiaobin,
	Thank you for the patch. Please see few comments inline.

-----Original Message-----
From: Qiaobin Fu <qiaobinf@bu.edu> 
Sent: Friday, August 31, 2018 11:51 AM
To: bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; yipeng1.wang@intel.com; michel@digirati.com.br; qiaobinf@bu.edu
Subject: [PATCH v3] hash table: add an iterator over conflicting entries

Function rte_hash_iterate_conflict_entries() iterates over the entries that conflict with an incoming entry.

Iterating over conflicting entries enables one to decide if the incoming entry is more valuable than the entries already in the hash table. This is particularly useful after an insertion failure.

v3:
* Make the rte_hash_iterate() API similar to
  rte_hash_iterate_conflict_entries()

v2:
* Fix the style issue

* Make the API more universal

Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 lib/librte_hash/Makefile             |   1 +
 lib/librte_hash/rte_cuckoo_hash.c    | 132 +++++++++++++++++++++++----
 lib/librte_hash/rte_hash.h           |  80 ++++++++++++++--
 lib/librte_hash/rte_hash_version.map |   8 ++
 test/test/test_hash.c                |   7 +-
 test/test/test_hash_multiwriter.c    |   8 +-
 test/test/test_hash_readwrite.c      |  16 ++--
 7 files changed, 219 insertions(+), 33 deletions(-)

diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index c8c435dfd..9be58a205 100644
--- a/lib/librte_hash/Makefile
+++ b/lib/librte_hash/Makefile
@@ -6,6 +6,7 @@ include $(RTE_SDK)/mk/rte.vars.mk  # library name  LIB = librte_hash.a
 
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 LDLIBS += -lrte_eal -lrte_ring
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index f7b86c8c9..cf5b28196 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1300,45 +1300,143 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
 	return __builtin_popcountl(*hit_mask);  }
 
+/* istate stands for internal state. */ struct rte_hash_iterator_istate 
+{
+	const struct rte_hash *h;
This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.

+	uint32_t              next;
+	uint32_t              total_entries;
+};
This structure can be moved to rte_cuckoo_hash.h file.

+

+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state) {
+	struct rte_hash_iterator_istate *__state;
'__state' can be replaced by 's'.

+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
+	__state->h = h;
+	__state->next = 0;
+	__state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+	return 0;
+}
IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.

+
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data)

IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

This will help keep the API signature consistent with existing APIs.

This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.

 {
+	struct rte_hash_iterator_istate *__state;
'__state' can be replaced with 's'.

 	uint32_t bucket_idx, idx, position;
 	struct rte_hash_key *next_key;
 
-	RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_istate *)state;
 
-	const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
 	/* Out of bounds */
-	if (*next >= total_entries)
+	if (__state->next >= __state->total_entries)
 		return -ENOENT;
 
'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.

 	/* Calculate bucket and index of current iterator */
-	bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-	idx = *next % RTE_HASH_BUCKET_ENTRIES;
+	bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+	idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
-		(*next)++;
+	while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+		__state->next++;
 		/* End of table */
-		if (*next == total_entries)
+		if (__state->next == __state->total_entries)
 			return -ENOENT;
-		bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
-		idx = *next % RTE_HASH_BUCKET_ENTRIES;
+		bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
+		idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
 	}
-	__hash_rw_reader_lock(h);
+	__hash_rw_reader_lock(__state->h);
 	/* Get position of entry in key table */
-	position = h->buckets[bucket_idx].key_idx[idx];
-	next_key = (struct rte_hash_key *) ((char *)h->key_store +
-				position * h->key_entry_size);
+	position = __state->h->buckets[bucket_idx].key_idx[idx];
+	next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
+				position * __state->h->key_entry_size);
 	/* Return key and data */
 	*key = next_key->key;
 	*data = next_key->pdata;
 
-	__hash_rw_reader_unlock(h);
+	__hash_rw_reader_unlock(__state->h);
 
 	/* Increment iterator */
-	(*next)++;
+	__state->next++;
This comment is not related to this change, it is better to place this inside the lock.
 
 	return position - 1;
 }
+
+/* istate stands for internal state. */ struct 
+rte_hash_iterator_conflict_entries_istate {
+	const struct rte_hash *h;
This can be moved outside of this structure. 

+	uint32_t              vnext;
+	uint32_t              primary_bidx;
+	uint32_t              secondary_bidx;
+};
+
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state) {
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+	__state->h = h;
+	__state->vnext = 0;
+
+	/* Get the primary bucket index given the precomputed hash value. */
+	__state->primary_bidx = sig & h->bucket_bitmask;
+	/* Get the secondary bucket index given the precomputed hash value. */
+	__state->secondary_bidx =
+		rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+
+	return 0;
+}
IMO, as mentioned above, it is possible to avoid creating this API.

+
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data) 
Signature of this API can be changed as follows:
rte_hash_iterate_conflict_entries(
	struct rte_hash *h, const void **key, void **data, struct rte_hash_iterator_state *state)

+{
+	struct rte_hash_iterator_conflict_entries_istate *__state;
+
+	RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
+		(data == NULL)), -EINVAL);
+
+	__state = (struct rte_hash_iterator_conflict_entries_istate *)state;
+
+	while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
+		uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
+			__state->primary_bidx : __state->secondary_bidx;
+		uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);

take the reader lock before reading bucket entry

+		uint32_t position = __state->h->buckets[bidx].key_idx[next];
+		struct rte_hash_key *next_key;
+
+		/* Increment iterator. */
+		__state->vnext++;
+
+		/*
+		 * The test below is unlikely because this iterator is meant
+		 * to be used after a failed insert.
+		 */
+		if (unlikely(position == EMPTY_SLOT))
+			continue;
+
+		/* Get the entry in key table. */
+		next_key = (struct rte_hash_key *) (
+			(char *)__state->h->key_store +
+			position * __state->h->key_entry_size);
+		/* Return key and data. */
+		*key = next_key->key;
+		*data = next_key->pdata;
give the reader lock

+
+		return position - 1;
+	}
+
+	return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h index 9e7d9315f..fdb01023e 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@
 #include <stdint.h>
 #include <stddef.h>
 
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
@@ -64,6 +66,16 @@ struct rte_hash_parameters {
 /** @internal A hash table structure. */  struct rte_hash;
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal A hash table iterator state structure.
+ */
+struct rte_hash_iterator_state {
+	uint8_t space[64];
I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
+} __rte_cache_aligned;
+
 /**
  * Create a new hash table.
  *
@@ -443,26 +455,82 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
 		      uint32_t num_keys, int32_t *positions);
 
 /**
- * Iterate through the hash table, returning key-value pairs.
+ * Initialize the iterator over the hash table.
  *
  * @param h
- *   Hash table to iterate
+ *   Hash table to iterate.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t
+rte_hash_iterator_init(const struct rte_hash *h,
+	struct rte_hash_iterator_state *state);
+
+/**
+ * Iterate through the hash table, returning key-value pairs.
+ *
+ * @param state
+ *   Pointer to the iterator state.
  * @param key
  *   Output containing the key where current iterator
  *   was pointing at
  * @param data
  *   Output containing the data associated with key.
  *   Returns NULL if data was not stored.
- * @param next
- *   Pointer to iterator. Should be 0 to start iterating the hash table.
- *   Iterator is incremented after each call of this function.
  * @return
  *   Position where key was stored, if successful.
  *   - -EINVAL if the parameters are invalid.
  *   - -ENOENT if end of the hash table.
  */
 int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+rte_hash_iterate(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Initialize the iterator over entries that conflict with a given hash.
+ *
+ * @param h
+ *   Hash table to iterate.
+ * @param sig
+ *   Precomputed hash value with which the returning entries conflict.
+ * @param state
+ *   Pointer to the iterator state.
+ * @return
+ *   - 0 if successful.
+ *   - -EINVAL if the parameters are invalid.
+ */
+int32_t __rte_experimental
+rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
+	hash_sig_t sig, struct rte_hash_iterator_state *state);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Iterate over entries that conflict with a given hash.
+ *
+ * @param state
+ *   Pointer to the iterator state.
+ * @param key
+ *   Output containing the key at where the iterator is currently pointing.
+ * @param data
+ *   Output containing the data associated with key.
+ *   Returns NULL if data was not stored.
+ * @return
+ *   Position where key was stored, if successful.
+ *   - -EINVAL if the parameters are invalid.
+ *   - -ENOENT if there is no more conflicting entries.
+ */
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries(
+	struct rte_hash_iterator_state *state, const void **key, void **data);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8e2..301d4638c 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -24,6 +24,7 @@ DPDK_2.1 {
 
 	rte_hash_add_key_data;
 	rte_hash_add_key_with_hash_data;
+	rte_hash_iterator_init;
 	rte_hash_iterate;
 	rte_hash_lookup_bulk_data;
 	rte_hash_lookup_data;
@@ -53,3 +54,10 @@ DPDK_18.08 {
 	rte_hash_count;
 
 } DPDK_16.07;
+
+EXPERIMENTAL {
+	global:
+
+	rte_hash_iterator_conflict_entries_init_with_hash;
+	rte_hash_iterate_conflict_entries;
+};
diff --git a/test/test/test_hash.c b/test/test/test_hash.c index b3db9fd10..bf57004c3 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1170,8 +1170,8 @@ static int test_hash_iteration(void)
 	void *next_data;
 	void *data[NUM_ENTRIES];
 	unsigned added_keys;
-	uint32_t iter = 0;
 	int ret = 0;
+	struct rte_hash_iterator_state state;
 
 	ut_params.entries = NUM_ENTRIES;
 	ut_params.name = "test_hash_iteration"; @@ -1180,6 +1180,9 @@ static int test_hash_iteration(void)
 	handle = rte_hash_create(&ut_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	/* Add random entries until key cannot be added */
 	for (added_keys = 0; added_keys < NUM_ENTRIES; added_keys++) {
 		data[added_keys] = (void *) ((uintptr_t) rte_rand()); @@ -1191,7 +1194,7 @@ static int test_hash_iteration(void)
 	}
 
 	/* Iterate through the hash table */
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added */
 		for (i = 0; i < NUM_ENTRIES; i++) {
 			if (memcmp(next_key, keys[i], ut_params.key_len) == 0) { diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
index 6a3eb10bd..48db8007d 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -125,18 +125,22 @@ test_hash_multiwriter(void)
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	uint32_t count;
 
+	struct rte_hash_iterator_state state;
+
 	snprintf(name, 32, "test%u", calledCount++);
 	hash_params.name = name;
 
 	handle = rte_hash_create(&hash_params);
 	RETURN_IF_ERROR(handle == NULL, "hash creation failed");
 
+	RETURN_IF_ERROR(rte_hash_iterator_init(handle, &state) != 0,
+			"initialization of the hash iterator failed");
+
 	tbl_multiwriter_test_params.h = handle;
 	tbl_multiwriter_test_params.nb_tsx_insertion =
 		nb_total_tsx_insertion / rte_lcore_count(); @@ -203,7 +207,7 @@ test_hash_multiwriter(void)
 		goto err3;
 	}
 
-	while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_multiwriter_test_params.found[i]++;
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c index 55ae33d80..9cdab9992 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -166,12 +166,13 @@ test_hash_readwrite_functional(int use_htm)
 	unsigned int i;
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 	int use_jhash = 1;
 
+	struct rte_hash_iterator_state state;
+
 	rte_atomic64_init(&gcycles);
 	rte_atomic64_clear(&gcycles);
 
@@ -188,6 +189,8 @@ test_hash_readwrite_functional(int use_htm)
 		tbl_rw_test_param.num_insert
 		* rte_lcore_count();
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	printf("++++++++Start function tests:+++++++++\n");
 
 	/* Fire all threads. */
@@ -195,8 +198,7 @@ test_hash_readwrite_functional(int use_htm)
 				 NULL, CALL_MASTER);
 	rte_eal_mp_wait_lcore();
 
-	while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
-			&next_data, &iter) >= 0) {
+	while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 		/* Search for the key in the list of keys added .*/
 		i = *(const uint32_t *)next_key;
 		tbl_rw_test_param.found[i]++;
@@ -315,9 +317,10 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 	const void *next_key;
 	void *next_data;
-	uint32_t iter = 0;
 	int use_jhash = 0;
 
+	struct rte_hash_iterator_state state;
+
 	uint32_t duplicated_keys = 0;
 	uint32_t lost_keys = 0;
 
@@ -336,6 +339,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 	if (init_params(use_htm, use_jhash) != 0)
 		goto err;
 
+	rte_hash_iterator_init(tbl_rw_test_param.h, &state);
+
 	/*
 	 * Do a readers finish faster or writers finish faster test.
 	 * When readers finish faster, we timing the readers, and when writers @@ -484,8 +489,7 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
 
 		rte_eal_mp_wait_lcore();
 
-		while (rte_hash_iterate(tbl_rw_test_param.h,
-				&next_key, &next_data, &iter) >= 0) {
+		while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
 			/* Search for the key in the list of keys added .*/
 			i = *(const uint32_t *)next_key;
 			tbl_rw_test_param.found[i]++;
--
2.17.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 0/7] ethdev: add flow API object converter
  2018-08-31  9:00  3%   ` [dpdk-dev] [PATCH v3 " Adrien Mazarguil
@ 2018-08-31 11:32  0%     ` Nélio Laranjeiro
  0 siblings, 0 replies; 200+ results
From: Nélio Laranjeiro @ 2018-08-31 11:32 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev

On Fri, Aug 31, 2018 at 11:00:57AM +0200, Adrien Mazarguil wrote:
> This is a follow up to the "Flow API helpers enhancements" series submitted
> almost a year ago [1]. The new title is due to the reduced scope of this
> version.
> 
> rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> temporary solution pending something better [2]. It replaces a lot of
> duplicated code found in testpmd and removes some of the maintenance burden
> that developers tend to forget (me included) when modifying pattern
> items or actions (updating app/test-pmd/config.c to be clear).
> 
> This series was unearthed in order to complete the implementation of
> RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> duplicate existing code once again.
> 
> See individual patches for specific changes in this version.
> 
> v3 changes:
> 
> - Marked rte_flow_conv() as experimental, modified net/bonding accordingly.
> - Fixed compilation issue on ARM.
> - Removed deprecation notice.
> 
> v2 changes:
> 
> - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> - Updated bonding PMD.
> - No more automatic generation of rte_flow_conv.h.
> 
> [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> [3] Currently the command-line parser (cmdline_flow.c) is aware of these
>     actions, however config.c isn't. Flow rules with such actions cannot
>     be created and cannot be validated with PMDs that implement them.
> 
> Adrien Mazarguil (7):
>   ethdev: add flow API object converter
>   ethdev: add flow API item/action name conversion
>   app/testpmd: rely on flow API conversion function
>   net/failsafe: switch to flow API object conversion function
>   net/bonding: switch to flow API object conversion function
>   ethdev: add missing items/actions to flow object converter
>   ethdev: deprecate rte_flow_copy function
> 
>  app/test-pmd/config.c                      | 407 +++------------
>  app/test-pmd/testpmd.h                     |   7 +-
>  doc/guides/prog_guide/rte_flow.rst         |  20 +
>  doc/guides/rel_notes/deprecation.rst       |   7 -
>  drivers/net/bonding/Makefile               |   1 +
>  drivers/net/bonding/meson.build            |   1 +
>  drivers/net/bonding/rte_eth_bond_api.c     |   6 +-
>  drivers/net/bonding/rte_eth_bond_flow.c    |  31 +-
>  drivers/net/bonding/rte_eth_bond_private.h |   5 +-
>  drivers/net/failsafe/failsafe_ether.c      |   6 +-
>  drivers/net/failsafe/failsafe_flow.c       |  31 +-
>  drivers/net/failsafe/failsafe_private.h    |   5 +-
>  lib/librte_ethdev/rte_ethdev_version.map   |   1 +
>  lib/librte_ethdev/rte_flow.c               | 666 ++++++++++++++++++------
>  lib/librte_ethdev/rte_flow.h               | 231 +++++++-
>  15 files changed, 886 insertions(+), 539 deletions(-)
> 
> -- 
> 2.11.0

Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 0/7] ethdev: add flow API object converter
  2018-08-03 13:36  3% ` [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter Adrien Mazarguil
  2018-08-23 13:48  0%   ` Ferruh Yigit
  2018-08-24 10:58  0%   ` Ferruh Yigit
@ 2018-08-31  9:00  3%   ` Adrien Mazarguil
  2018-08-31 11:32  0%     ` Nélio Laranjeiro
  2 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-08-31  9:00 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This is a follow up to the "Flow API helpers enhancements" series submitted
almost a year ago [1]. The new title is due to the reduced scope of this
version.

rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
temporary solution pending something better [2]. It replaces a lot of
duplicated code found in testpmd and removes some of the maintenance burden
that developers tend to forget (me included) when modifying pattern
items or actions (updating app/test-pmd/config.c to be clear).

This series was unearthed in order to complete the implementation of
RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
duplicate existing code once again.

See individual patches for specific changes in this version.

v3 changes:

- Marked rte_flow_conv() as experimental, modified net/bonding accordingly.
- Fixed compilation issue on ARM.
- Removed deprecation notice.

v2 changes:

- rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
- Updated bonding PMD.
- No more automatic generation of rte_flow_conv.h.

[1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
[2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
[3] Currently the command-line parser (cmdline_flow.c) is aware of these
    actions, however config.c isn't. Flow rules with such actions cannot
    be created and cannot be validated with PMDs that implement them.

Adrien Mazarguil (7):
  ethdev: add flow API object converter
  ethdev: add flow API item/action name conversion
  app/testpmd: rely on flow API conversion function
  net/failsafe: switch to flow API object conversion function
  net/bonding: switch to flow API object conversion function
  ethdev: add missing items/actions to flow object converter
  ethdev: deprecate rte_flow_copy function

 app/test-pmd/config.c                      | 407 +++------------
 app/test-pmd/testpmd.h                     |   7 +-
 doc/guides/prog_guide/rte_flow.rst         |  20 +
 doc/guides/rel_notes/deprecation.rst       |   7 -
 drivers/net/bonding/Makefile               |   1 +
 drivers/net/bonding/meson.build            |   1 +
 drivers/net/bonding/rte_eth_bond_api.c     |   6 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  31 +-
 drivers/net/bonding/rte_eth_bond_private.h |   5 +-
 drivers/net/failsafe/failsafe_ether.c      |   6 +-
 drivers/net/failsafe/failsafe_flow.c       |  31 +-
 drivers/net/failsafe/failsafe_private.h    |   5 +-
 lib/librte_ethdev/rte_ethdev_version.map   |   1 +
 lib/librte_ethdev/rte_flow.c               | 666 ++++++++++++++++++------
 lib/librte_ethdev/rte_flow.h               | 231 +++++++-
 15 files changed, 886 insertions(+), 539 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  2018-08-30 22:09  3%                     ` Stephen Hemminger
@ 2018-08-30 22:11  0%                       ` Dan Gora
  2018-09-04  0:47  0%                         ` Dan Gora
  0 siblings, 1 reply; 200+ results
From: Dan Gora @ 2018-08-30 22:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Igor Ryzhov, Ferruh Yigit, dev

On Thu, Aug 30, 2018 at 7:09 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 30 Aug 2018 18:41:14 -0300
> Dan Gora <dg@adax.com> wrote:
>
>> On the other hand, the "write to /sys" method is a bit more simple and
>> confines the changes to the user space library.  If we're confident
>> that the /sys ABI is stable and not going to be changed going forward
>> it seems like a valid alternative.
>
> See Documentation/ABI/testing/sysfs-class-net

yeah, but it's in the 'testing' directory :)

>From Documentation/ABI/README:

testing/

This directory documents interfaces that are felt to be stable,
as the main development of this interface has been completed.
The interface can be changed to add new features, but the
current interface will not break by doing this, unless grave
errors or security problems are found in them.  Userspace
programs can start to rely on these interfaces, but they must be
aware of changes that can occur before these interfaces move to
be marked stable.  Programs that use these interfaces are
strongly encouraged to add their name to the description of
these interfaces, so that the kernel developers can easily
notify them if any changes occur (see the description of the
layout of the files below for details on how to do this.)

Like I said, I'm ok with using this if that's what everyone wants to do.

d

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  2018-08-30 21:41  3%                   ` Dan Gora
@ 2018-08-30 22:09  3%                     ` Stephen Hemminger
  2018-08-30 22:11  0%                       ` Dan Gora
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-08-30 22:09 UTC (permalink / raw)
  To: Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev

On Thu, 30 Aug 2018 18:41:14 -0300
Dan Gora <dg@adax.com> wrote:

> On the other hand, the "write to /sys" method is a bit more simple and
> confines the changes to the user space library.  If we're confident
> that the /sys ABI is stable and not going to be changed going forward
> it seems like a valid alternative.

See Documentation/ABI/testing/sysfs-class-net

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
  @ 2018-08-30 21:41  3%                   ` Dan Gora
  2018-08-30 22:09  3%                     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Dan Gora @ 2018-08-30 21:41 UTC (permalink / raw)
  To: Igor Ryzhov; +Cc: Stephen Hemminger, Ferruh Yigit, dev

Hi All,

So I'm a little unclear as to what to do here moving forward..

Do we all agree at least that there should be some function in DPDK in
order to set the link state for KNI interfaces?

I'm a strong yes on this point.  I don't think that every KNI user
should have to implement something like this themselves if we can
provide a simple interface to do it.  It seems like a natural
extension of the KNI functionality IMHO.

If so, should we use a new ioctl as in my patch, or just use the
"write to /sys.../carrier" method as shown below?

I'm kind of ambivalent about this point.  The ioctl method has two
minor advantages to my mind:

1) It's already done :)

2) You can set the link speed and duplex as well and get a pretty
message in the syslog when the link status changes:

[ 2100.016079] rte_kni: adax0 NIC Link is Up 10000 Mbps Full Duplex.
[ 2100.016094] IPv6: ADDRCONF(NETDEV_CHANGE): adax0: link becomes ready
[ 2262.532126] IPv6: ADDRCONF(NETDEV_UP): adax1: link is not ready
[ 2263.432148] rte_kni: adax1 NIC Link is Up 10000 Mbps Full Duplex.
[ 2263.432170] IPv6: ADDRCONF(NETDEV_CHANGE): adax1: link becomes ready

On the other hand, the "write to /sys" method is a bit more simple and
confines the changes to the user space library.  If we're confident
that the /sys ABI is stable and not going to be changed going forward
it seems like a valid alternative.

I'm willing to do it either way.  I'll defer to the judgement of the
community...

thanks
dan


On Thu, Aug 30, 2018 at 6:49 AM, Igor Ryzhov <iryzhov@nfware.com> wrote:
> Hi Dan,
>
> We use KNI device exactly the same way you described – with IP addresses,
> routing, etc.
> And we also faced the same problem of having the actual link status in Linux
> kernel.
>
> There is a special callback for link state management in net_device_ops for
> soft-devices like KNI called ndo_change_carrier.
> Current KNI driver implements it already, you just need to write to
> /sys/class/net/<iface>/carrier to change link status.
>
> Right now we implement it on application side, but I think it'll be good to
> have this in rte_kni API.
>
> Here is our implementation:
>
> static int
> linux_set_carrier(const char *name, int status)
> {
> char path[64];
> const char *carrier = status ? "1" : "0";
> int fd, ret;
>
> sprintf(path, "/sys/devices/virtual/net/%s/carrier", name);
> fd = open(path, O_WRONLY);
> if (fd == -1) {
> return -errno;
> }
>
> ret = write(fd, carrier, 2);
> if (ret == -1) {
> close(fd);
> return -errno;
> }
>
> close(fd);
>
> return 0;
> }
>
> Best regards,
> Igor

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI
  2018-08-29  9:52  0%   ` Igor Ryzhov
@ 2018-08-30 10:55  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-08-30 10:55 UTC (permalink / raw)
  To: Igor Ryzhov; +Cc: dev

On 8/29/2018 10:52 AM, Igor Ryzhov wrote:
> Hello Ferruh,
> 
> Thanks for the review, comments inline.
> 
> On Mon, Aug 27, 2018 at 8:06 PM, Ferruh Yigit <ferruh.yigit@intel.com
> <mailto:ferruh.yigit@intel.com>> wrote:
> 
>     On 8/2/2018 3:25 PM, Igor Ryzhov wrote:
>     > Long time ago preallocation of memory for KNI was introduced in commit
>     > 0c6bc8e. It was done because of lack of ability to free previously
>     > allocated memzones, which led to memzone exhaustion. Currently memzones
>     > can be freed and this patch uses this ability for dynamic KNI memory
>     > allocation.
> 
>     Hi Igor,
> 
>     It is good to be able to allocate memory dynamically and get rid of the
>     "max_kni_ifaces" and "kni_memzone_pool", thanks for the patch.
> 
>     Overall looks good, a few comments below.
> 
>     > 
>     > Signed-off-by: Igor Ryzhov <iryzhov@nfware.com <mailto:iryzhov@nfware.com>>
>     > ---
>     >  lib/librte_kni/rte_kni.c | 392 ++++++++++++---------------------------
>     >  lib/librte_kni/rte_kni.h |   6 +-
>     >  test/test/test_kni.c     |   6 -
>     >  3 files changed, 128 insertions(+), 276 deletions(-)
>     > 
>     > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
>     > index 8a8f6c1cc..028b44bfd 100644
>     > --- a/lib/librte_kni/rte_kni.c
>     > +++ b/lib/librte_kni/rte_kni.c
>     > @@ -36,24 +36,33 @@
>     >   * KNI context
>     >   */
>     >  struct rte_kni {
>     > +     const struct rte_memzone *mz;       /**< KNI context memzone */
> 
>     I was thinking remove the context memzone and use rte_zmalloc() to create kni
>     objects but updated rte_kni_get() API seems relaying this.
>     If you see any other way to get kni object from name in rte_kni_get(), I am for
>     removing above *mz variable from rte_kni struct.
> 
> 
> I had absolutely the same thought but didn't find a way to save rte_kni_get() API.
> Maybe someone has any ideas?
> Or maybe this API can be marked deprecated and deleted in future?

I suggest keep API, there may be some users, we don't know. And it doesn't sound
right to remove an API because of internal implementation details, so looks like
will need to keep memzone.

>  
> 
> 
>     <...>
> 
>     > +static void
>     > +kni_ctx_release_mz(struct rte_kni *ctx)
>     > +{
>     > +     rte_memzone_free(ctx->m_tx_q);
>     > +     rte_memzone_free(ctx->m_rx_q);
>     > +     rte_memzone_free(ctx->m_alloc_q);
>     > +     rte_memzone_free(ctx->m_free_q);
>     > +     rte_memzone_free(ctx->m_req_q);
>     > +     rte_memzone_free(ctx->m_resp_q);
>     > +     rte_memzone_free(ctx->m_sync_addr);
> 
> 
>     "ctx" sounds confusing to me, isn't this "rte_kni" object instance, why not just
>     call it "kni" or if it is too generic "kni_obj" or similar? For other APIs
>     as well.
> 
> 
> "ctx" was already used in the code, so I didn't change it.
> I also think that it's better to use "kni" – will change it in v2.
>  
> 
> 
>     And this is just a detail but about order of APIs would you mind having first
>     reserve() one, later release() one?
> 
> 
> Ok.
>  
> 
> 
>     <...>
> 
>     > -/* Shall be called before any allocation happens */
>     > -void
>     > -rte_kni_init(unsigned int max_kni_ifaces)
>     > +static struct rte_kni *
>     > +kni_ctx_reserve(const char *name)
>     >  {
>     > -     uint32_t i;
>     > -     struct rte_kni_memzone_slot *it;
>     > +     struct rte_kni *ctx;
>     >       const struct rte_memzone *mz;
>     > -#define OBJNAMSIZ 32
>     > -     char obj_name[OBJNAMSIZ];
>     >       char mz_name[RTE_MEMZONE_NAMESIZE];
>     >  
>     > -     /* Immediately return if KNI is already initialized */
>     > -     if (kni_memzone_pool.initialized) {
>     > -             RTE_LOG(WARNING, KNI, "Double call to rte_kni_init()");
>     > -             return;
>     > -     }
>     > +     snprintf(mz_name, RTE_MEMZONE_NAMESIZE, "kni_info_%s", name);
> 
>     Can you please convert memzone names, like "kni_info" to defines, for all of
>     them?
> 
> 
> Ok.
>  
> 
> 
>     <...>
> 
>     > @@ -81,8 +81,12 @@ struct rte_kni_conf {
>     >   *
>     >   * @param max_kni_ifaces
>     >   *  The maximum number of KNI interfaces that can coexist concurrently
>     > + *
>     > + * @return
>     > + *  - 0 indicates success.
>     > + *  - negative value indicates failure.
>     >   */
>     > -void rte_kni_init(unsigned int max_kni_ifaces);
>     > +int rte_kni_init(unsigned int max_kni_ifaces);
> 
>     This changes the API. Return type changes from "void" to "int". I agree "int"
>     makes more sense since API can fail, but this changes the ABI/API.
> 
>     Since existing binaries doesn't check the return type at all there may be no
>     issue from ABI point of view but from API point of view some apps may get return
>     value not checked warnings, not sure though.
> 
>     And the need of the API is questionable at this stage, it may be possible to
>     move rte_kni_alloc() where it already has "kni_fd" check.
> 
>     What do you think keep API signature same for now, but add a deprecation notice
>     to remove the API. Next release (v19.02) remove rte_kni_init() completely?
> 
> 
> As I know, warnings can only be returned if the warn_unused_result attribute is
> used, which is not the case here.
> So I think that changing from void to int should not break anything. Can change
> it back in v2 if I'm wrong.
> 
> Regarding the API removal – I think it's better to save that function, to have a
> more clear API.

OK, fair enough. Lets keep it with same name.

> As we have rte_kni_close to close KNI device, we should have a function to open it.
> Maybe it should be renamed to rte_kni_open :)
>  
> 
> 
>     <...>
> 
>     >  /**
>     > diff --git a/test/test/test_kni.c b/test/test/test_kni.c
>     > index 1b876719a..56c98513a 100644
>     > --- a/test/test/test_kni.c
>     > +++ b/test/test/test_kni.c
>     > @@ -429,12 +429,6 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
>     >       }
>     >       test_kni_ctx = NULL;
>     >  
>     > -     /* test of releasing a released kni device */
>     > -     if (rte_kni_release(kni) == 0) {
>     > -             printf("should not release a released kni device\n");
>     > -             return -1;
>     > -     }
> 
>     Why need to remove this?
> 
> 
> Previously, rte_kni_release didn't free any memory, and the second call with the
> same argument just checked "in_use" flag and returned.
> After my changes, memory is actually freed, and rte_kni_release must not be
> called twice with the same argument.

OK.

> Will send v2 with approved changes in a couple of days.
> At the same time, I'll think what can we do with context memzone.

Thanks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI
  2018-08-27 17:06  4% ` Ferruh Yigit
@ 2018-08-29  9:52  0%   ` Igor Ryzhov
  2018-08-30 10:55  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Igor Ryzhov @ 2018-08-29  9:52 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

Hello Ferruh,

Thanks for the review, comments inline.

On Mon, Aug 27, 2018 at 8:06 PM, Ferruh Yigit <ferruh.yigit@intel.com>
wrote:

> On 8/2/2018 3:25 PM, Igor Ryzhov wrote:
> > Long time ago preallocation of memory for KNI was introduced in commit
> > 0c6bc8e. It was done because of lack of ability to free previously
> > allocated memzones, which led to memzone exhaustion. Currently memzones
> > can be freed and this patch uses this ability for dynamic KNI memory
> > allocation.
>
> Hi Igor,
>
> It is good to be able to allocate memory dynamically and get rid of the
> "max_kni_ifaces" and "kni_memzone_pool", thanks for the patch.
>
> Overall looks good, a few comments below.
>
> >
> > Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
> > ---
> >  lib/librte_kni/rte_kni.c | 392 ++++++++++++---------------------------
> >  lib/librte_kni/rte_kni.h |   6 +-
> >  test/test/test_kni.c     |   6 -
> >  3 files changed, 128 insertions(+), 276 deletions(-)
> >
> > diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> > index 8a8f6c1cc..028b44bfd 100644
> > --- a/lib/librte_kni/rte_kni.c
> > +++ b/lib/librte_kni/rte_kni.c
> > @@ -36,24 +36,33 @@
> >   * KNI context
> >   */
> >  struct rte_kni {
> > +     const struct rte_memzone *mz;       /**< KNI context memzone */
>
> I was thinking remove the context memzone and use rte_zmalloc() to create
> kni
> objects but updated rte_kni_get() API seems relaying this.
> If you see any other way to get kni object from name in rte_kni_get(), I
> am for
> removing above *mz variable from rte_kni struct.
>

I had absolutely the same thought but didn't find a way to save
rte_kni_get() API.
Maybe someone has any ideas?
Or maybe this API can be marked deprecated and deleted in future?


>
> <...>
>
> > +static void
> > +kni_ctx_release_mz(struct rte_kni *ctx)
> > +{
> > +     rte_memzone_free(ctx->m_tx_q);
> > +     rte_memzone_free(ctx->m_rx_q);
> > +     rte_memzone_free(ctx->m_alloc_q);
> > +     rte_memzone_free(ctx->m_free_q);
> > +     rte_memzone_free(ctx->m_req_q);
> > +     rte_memzone_free(ctx->m_resp_q);
> > +     rte_memzone_free(ctx->m_sync_addr);
>
>
> "ctx" sounds confusing to me, isn't this "rte_kni" object instance, why
> not just
> call it "kni" or if it is too generic "kni_obj" or similar? For other APIs
> as well.
>

"ctx" was already used in the code, so I didn't change it.
I also think that it's better to use "kni" – will change it in v2.


>
> And this is just a detail but about order of APIs would you mind having
> first
> reserve() one, later release() one?
>

Ok.


>
> <...>
>
> > -/* Shall be called before any allocation happens */
> > -void
> > -rte_kni_init(unsigned int max_kni_ifaces)
> > +static struct rte_kni *
> > +kni_ctx_reserve(const char *name)
> >  {
> > -     uint32_t i;
> > -     struct rte_kni_memzone_slot *it;
> > +     struct rte_kni *ctx;
> >       const struct rte_memzone *mz;
> > -#define OBJNAMSIZ 32
> > -     char obj_name[OBJNAMSIZ];
> >       char mz_name[RTE_MEMZONE_NAMESIZE];
> >
> > -     /* Immediately return if KNI is already initialized */
> > -     if (kni_memzone_pool.initialized) {
> > -             RTE_LOG(WARNING, KNI, "Double call to rte_kni_init()");
> > -             return;
> > -     }
> > +     snprintf(mz_name, RTE_MEMZONE_NAMESIZE, "kni_info_%s", name);
>
> Can you please convert memzone names, like "kni_info" to defines, for all
> of them?
>

Ok.


>
> <...>
>
> > @@ -81,8 +81,12 @@ struct rte_kni_conf {
> >   *
> >   * @param max_kni_ifaces
> >   *  The maximum number of KNI interfaces that can coexist concurrently
> > + *
> > + * @return
> > + *  - 0 indicates success.
> > + *  - negative value indicates failure.
> >   */
> > -void rte_kni_init(unsigned int max_kni_ifaces);
> > +int rte_kni_init(unsigned int max_kni_ifaces);
>
> This changes the API. Return type changes from "void" to "int". I agree
> "int"
> makes more sense since API can fail, but this changes the ABI/API.
>
> Since existing binaries doesn't check the return type at all there may be
> no
> issue from ABI point of view but from API point of view some apps may get
> return
> value not checked warnings, not sure though.
>
> And the need of the API is questionable at this stage, it may be possible
> to
> move rte_kni_alloc() where it already has "kni_fd" check.
>
> What do you think keep API signature same for now, but add a deprecation
> notice
> to remove the API. Next release (v19.02) remove rte_kni_init() completely?
>

As I know, warnings can only be returned if the warn_unused_result
attribute is used, which is not the case here.
So I think that changing from void to int should not break anything. Can
change it back in v2 if I'm wrong.

Regarding the API removal – I think it's better to save that function, to
have a more clear API.
As we have rte_kni_close to close KNI device, we should have a function to
open it.
Maybe it should be renamed to rte_kni_open :)


>
> <...>
>
> >  /**
> > diff --git a/test/test/test_kni.c b/test/test/test_kni.c
> > index 1b876719a..56c98513a 100644
> > --- a/test/test/test_kni.c
> > +++ b/test/test/test_kni.c
> > @@ -429,12 +429,6 @@ test_kni_processing(uint16_t port_id, struct
> rte_mempool *mp)
> >       }
> >       test_kni_ctx = NULL;
> >
> > -     /* test of releasing a released kni device */
> > -     if (rte_kni_release(kni) == 0) {
> > -             printf("should not release a released kni device\n");
> > -             return -1;
> > -     }
>
> Why need to remove this?
>

Previously, rte_kni_release didn't free any memory, and the second call
with the same argument just checked "in_use" flag and returned.
After my changes, memory is actually freed, and rte_kni_release must not be
called twice with the same argument.

Will send v2 with approved changes in a couple of days.
At the same time, I'll think what can we do with context memzone.

Best regards,
Igor

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-08-29  8:47  3%                   ` Ola Liljedahl
@ 2018-08-29  8:57  0%                     ` Jerin Jacob
  2018-09-13 17:40  0%                       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-08-29  8:57 UTC (permalink / raw)
  To: Ola Liljedahl
  Cc: Kokkilagadda, Kiran, Honnappa Nagarahalli, Gavin Hu,
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper

-----Original Message-----
> Date: Wed, 29 Aug 2018 08:47:56 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
>  <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>  synchronization
> user-agent: Microsoft-MacOutlook/10.10.0.180812
> 
> 
> There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> 
> But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?

# One reason was __atomic builtins  primitives were implemented in gcc 4.7 and x86 would
like to support < gcc 4.7 and ICC compiler.
# The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins,
I let x86 maintainers to comments on this.


> 
> -- Ola
> 
> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> 
>     -----Original Message-----
>     > Date: Wed, 29 Aug 2018 07:34:34 +0000
>     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>     > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>     >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>     >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>     >  <Jerin.JacobKollanukkaran@cavium.com>
>     > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>     >  <Steve.Capper@arm.com>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>     >  synchronization
>     > user-agent: Microsoft-MacOutlook/10.10.0.180812
>     >
>     > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> 
>     What would be the change in interface? Is it removing the volatile for
>     C11 case, Then you can use anonymous union OR #define to keep the size
>     and offset of the element intact.
> 
>     struct rte_kni_fifo {
>     #ifndef RTE_C11...
>             volatile unsigned write;     /**< Next position to be written*/
>             volatile unsigned read;      /**< Next position to be read */
>     #else
>             unsigned write;     /**< Next position to be written*/
>             unsigned read;      /**< Next position to be read */
>     #endif
>             unsigned len;                /**< Circular buffer length */
>             unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
>             void *volatile buffer[];     /**< The buffer contains mbuf
>     pointers */
>     };
> 
>     Anonymous union example:
>     https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> 
>     You can check the ABI breakage by devtools/validate-abi.sh
> 
>     >
>     > -- Ola
>     >
>     > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
>     > Date: Wednesday, 29 August 2018 at 07:50
>     > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
>     > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
>     >
>     > ________________________________
>     > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>     > Sent: Wednesday, August 29, 2018 10:29 AM
>     > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
>     > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     > External Email
>     >
>     > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
>     >
>     > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
>     >
>     >
>     >
>     > If you want us to put together a patch with this idea, please let us know.
>     >
>     >
>     >
>     > Thank you,
>     >
>     > Honnappa
>     >
>     >
>     >
>     > From: Gavin Hu
>     > Sent: Tuesday, August 28, 2018 2:31 PM
>     > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
>     > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
>     >
>     > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
>     >
>     >
>     >
>     > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
>     > Sent: Tuesday, August 28, 2018 6:44 PM
>     > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
>     > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
>     >
>     >
>     >
>     >
>     >
>     > ________________________________
>     >
>     > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
>     > Sent: Monday, August 27, 2018 9:10 PM
>     > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
>     > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
>     > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>     >
>     >
>     >
>     > External Email
>     >
>     > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
>     >
>     > > -----Original Message-----
>     > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
>     > > Sent: Monday, August 27, 2018 10:08 PM
>     > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
>     > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
>     > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
>     > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>     > > synchronization
>     > >
>     > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
>     > > > With existing code in kni_fifo_put, rx_q values are not being updated
>     > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
>     > > > This is causing the sync issue on other core. So adding a write
>     > > > barrier to make sure the values being synced before updating fifo_write.
>     > > >
>     > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
>     > > >
>     > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
>     > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
>     > >
>     > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
>     > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  2018-08-29  8:28  4%                 ` Jerin Jacob
@ 2018-08-29  8:47  3%                   ` Ola Liljedahl
  2018-08-29  8:57  0%                     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Ola Liljedahl @ 2018-08-29  8:47 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Kokkilagadda, Kiran, Honnappa Nagarahalli, Gavin Hu,
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper

There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.

But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?

-- Ola

On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:

    -----Original Message-----
    > Date: Wed, 29 Aug 2018 07:34:34 +0000
    > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
    >  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
    >  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
    >  <Jerin.JacobKollanukkaran@cavium.com>
    > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
    >  <Steve.Capper@arm.com>
    > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
    >  synchronization
    > user-agent: Microsoft-MacOutlook/10.10.0.180812
    > 
    > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
    
    What would be the change in interface? Is it removing the volatile for
    C11 case, Then you can use anonymous union OR #define to keep the size 
    and offset of the element intact.
    
    struct rte_kni_fifo { 
    #ifndef RTE_C11...
            volatile unsigned write;     /**< Next position to be written*/
            volatile unsigned read;      /**< Next position to be read */
    #else
            unsigned write;     /**< Next position to be written*/
            unsigned read;      /**< Next position to be read */
    #endif
            unsigned len;                /**< Circular buffer length */
            unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
            void *volatile buffer[];     /**< The buffer contains mbuf
    pointers */
    };
    
    Anonymous union example:
    https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
    
    You can check the ABI breakage by devtools/validate-abi.sh
    
    > 
    > -- Ola
    > 
    > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
    > Date: Wednesday, 29 August 2018 at 07:50
    > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
    > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
    > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
    > 
    > 
    > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
    > 
    > ________________________________
    > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    > Sent: Wednesday, August 29, 2018 10:29 AM
    > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
    > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
    > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
    > 
    > 
    > External Email
    > 
    > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
    > 
    > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
    > 
    > 
    > 
    > If you want us to put together a patch with this idea, please let us know.
    > 
    > 
    > 
    > Thank you,
    > 
    > Honnappa
    > 
    > 
    > 
    > From: Gavin Hu
    > Sent: Tuesday, August 28, 2018 2:31 PM
    > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
    > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
    > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
    > 
    > 
    > 
    > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
    > 
    > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
    > 
    > 
    > 
    > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
    > Sent: Tuesday, August 28, 2018 6:44 PM
    > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
    > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
    > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
    > 
    > 
    > 
    > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
    > 
    > 
    > 
    > 
    > 
    > ________________________________
    > 
    > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
    > Sent: Monday, August 27, 2018 9:10 PM
    > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
    > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
    > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
    > 
    > 
    > 
    > External Email
    > 
    > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
    > 
    > > -----Original Message-----
    > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
    > > Sent: Monday, August 27, 2018 10:08 PM
    > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
    > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
    > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
    > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
    > > synchronization
    > >
    > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
    > > > With existing code in kni_fifo_put, rx_q values are not being updated
    > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
    > > > This is causing the sync issue on other core. So adding a write
    > > > barrier to make sure the values being synced before updating fifo_write.
    > > >
    > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
    > > >
    > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
    > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
    > >
    > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
    > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
    


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
  @ 2018-08-29  8:28  4%                 ` Jerin Jacob
  2018-08-29  8:47  3%                   ` Ola Liljedahl
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-08-29  8:28 UTC (permalink / raw)
  To: Ola Liljedahl
  Cc: Kokkilagadda, Kiran, Honnappa Nagarahalli, Gavin Hu,
	Ferruh Yigit, Jacob,  Jerin, dev, nd, Steve Capper

-----Original Message-----
> Date: Wed, 29 Aug 2018 07:34:34 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>  Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>  Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,  Jerin"
>  <Jerin.JacobKollanukkaran@cavium.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>  <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>  synchronization
> user-agent: Microsoft-MacOutlook/10.10.0.180812
> 
> Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?

What would be the change in interface? Is it removing the volatile for
C11 case, Then you can use anonymous union OR #define to keep the size 
and offset of the element intact.

struct rte_kni_fifo { 
#ifndef RTE_C11...
        volatile unsigned write;     /**< Next position to be written*/
        volatile unsigned read;      /**< Next position to be read */
#else
        unsigned write;     /**< Next position to be written*/
        unsigned read;      /**< Next position to be read */
#endif
        unsigned len;                /**< Circular buffer length */
        unsigned elem_size;          /**< Pointer size - for 32/64 bitOS */
        void *volatile buffer[];     /**< The buffer contains mbuf
pointers */
};

Anonymous union example:
https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461

You can check the ABI breakage by devtools/validate-abi.sh

> 
> -- Ola
> 
> From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> Date: Wednesday, 29 August 2018 at 07:50
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> 
> Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> 
> ________________________________
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, August 29, 2018 10:29 AM
> To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> 
> External Email
> 
> I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> 
> IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> 
> 
> 
> If you want us to put together a patch with this idea, please let us know.
> 
> 
> 
> Thank you,
> 
> Honnappa
> 
> 
> 
> From: Gavin Hu
> Sent: Tuesday, August 28, 2018 2:31 PM
> To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> 
> 
> Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> 
> We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> 
> 
> 
> From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> Sent: Tuesday, August 28, 2018 6:44 PM
> To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> 
> 
> In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> 
> 
> 
> 
> 
> ________________________________
> 
> From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> Sent: Monday, August 27, 2018 9:10 PM
> To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> 
> 
> 
> External Email
> 
> This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > Sent: Monday, August 27, 2018 10:08 PM
> > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> >
> > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > > With existing code in kni_fifo_put, rx_q values are not being updated
> > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > > This is causing the sync issue on other core. So adding a write
> > > barrier to make sure the values being synced before updating fifo_write.
> > >
> > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > >
> > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> >
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI
  @ 2018-08-27 17:06  4% ` Ferruh Yigit
  2018-08-29  9:52  0%   ` Igor Ryzhov
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-08-27 17:06 UTC (permalink / raw)
  To: Igor Ryzhov, dev

On 8/2/2018 3:25 PM, Igor Ryzhov wrote:
> Long time ago preallocation of memory for KNI was introduced in commit
> 0c6bc8e. It was done because of lack of ability to free previously
> allocated memzones, which led to memzone exhaustion. Currently memzones
> can be freed and this patch uses this ability for dynamic KNI memory
> allocation.

Hi Igor,

It is good to be able to allocate memory dynamically and get rid of the
"max_kni_ifaces" and "kni_memzone_pool", thanks for the patch.

Overall looks good, a few comments below.

> 
> Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
> ---
>  lib/librte_kni/rte_kni.c | 392 ++++++++++++---------------------------
>  lib/librte_kni/rte_kni.h |   6 +-
>  test/test/test_kni.c     |   6 -
>  3 files changed, 128 insertions(+), 276 deletions(-)
> 
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index 8a8f6c1cc..028b44bfd 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -36,24 +36,33 @@
>   * KNI context
>   */
>  struct rte_kni {
> +	const struct rte_memzone *mz;       /**< KNI context memzone */

I was thinking remove the context memzone and use rte_zmalloc() to create kni
objects but updated rte_kni_get() API seems relaying this.
If you see any other way to get kni object from name in rte_kni_get(), I am for
removing above *mz variable from rte_kni struct.

<...>

> +static void
> +kni_ctx_release_mz(struct rte_kni *ctx)
> +{
> +	rte_memzone_free(ctx->m_tx_q);
> +	rte_memzone_free(ctx->m_rx_q);
> +	rte_memzone_free(ctx->m_alloc_q);
> +	rte_memzone_free(ctx->m_free_q);
> +	rte_memzone_free(ctx->m_req_q);
> +	rte_memzone_free(ctx->m_resp_q);
> +	rte_memzone_free(ctx->m_sync_addr);


"ctx" sounds confusing to me, isn't this "rte_kni" object instance, why not just
call it "kni" or if it is too generic "kni_obj" or similar? For other APIs as well.

And this is just a detail but about order of APIs would you mind having first
reserve() one, later release() one?

<...>

> -/* Shall be called before any allocation happens */
> -void
> -rte_kni_init(unsigned int max_kni_ifaces)
> +static struct rte_kni *
> +kni_ctx_reserve(const char *name)
>  {
> -	uint32_t i;
> -	struct rte_kni_memzone_slot *it;
> +	struct rte_kni *ctx;
>  	const struct rte_memzone *mz;
> -#define OBJNAMSIZ 32
> -	char obj_name[OBJNAMSIZ];
>  	char mz_name[RTE_MEMZONE_NAMESIZE];
>  
> -	/* Immediately return if KNI is already initialized */
> -	if (kni_memzone_pool.initialized) {
> -		RTE_LOG(WARNING, KNI, "Double call to rte_kni_init()");
> -		return;
> -	}
> +	snprintf(mz_name, RTE_MEMZONE_NAMESIZE, "kni_info_%s", name);

Can you please convert memzone names, like "kni_info" to defines, for all of them?

<...>

> @@ -81,8 +81,12 @@ struct rte_kni_conf {
>   *
>   * @param max_kni_ifaces
>   *  The maximum number of KNI interfaces that can coexist concurrently
> + *
> + * @return
> + *  - 0 indicates success.
> + *  - negative value indicates failure.
>   */
> -void rte_kni_init(unsigned int max_kni_ifaces);
> +int rte_kni_init(unsigned int max_kni_ifaces);

This changes the API. Return type changes from "void" to "int". I agree "int"
makes more sense since API can fail, but this changes the ABI/API.

Since existing binaries doesn't check the return type at all there may be no
issue from ABI point of view but from API point of view some apps may get return
value not checked warnings, not sure though.

And the need of the API is questionable at this stage, it may be possible to
move rte_kni_alloc() where it already has "kni_fd" check.

What do you think keep API signature same for now, but add a deprecation notice
to remove the API. Next release (v19.02) remove rte_kni_init() completely?

<...>

>  /**
> diff --git a/test/test/test_kni.c b/test/test/test_kni.c
> index 1b876719a..56c98513a 100644
> --- a/test/test/test_kni.c
> +++ b/test/test/test_kni.c
> @@ -429,12 +429,6 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
>  	}
>  	test_kni_ctx = NULL;
>  
> -	/* test of releasing a released kni device */
> -	if (rte_kni_release(kni) == 0) {
> -		printf("should not release a released kni device\n");
> -		return -1;
> -	}

Why need to remove this?

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter
  2018-08-23 13:48  0%   ` Ferruh Yigit
@ 2018-08-27 15:14  0%     ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-08-27 15:14 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Thu, Aug 23, 2018 at 02:48:37PM +0100, Ferruh Yigit wrote:
> On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> > This is a follow up to the "Flow API helpers enhancements" series submitted
> > almost a year ago [1]. The new title is due to the reduced scope of this
> > version.
> > 
> > rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> > temporary solution pending something better [2]. It replaces a lot of
> > duplicated code found in testpmd and removes some of the maintenance burden
> > that developers tend to forget (me included) when modifying pattern
> > item or actions (updating app/test-pmd/config.c to be clear).
> > 
> > This series was unearthed in order to complete the implementation of
> > RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> > duplicate existing code once again.
> > 
> > See individual patches for specific changes in this version.
> > 
> > v2 changes:
> > 
> > - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> > - Updated bonding PMD.
> > - No more automatic generation of rte_flow_conv.h.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> > [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> > [3] Currently the command-line parser (cmdline_flow.c) is aware of these
> >     actions, however config.c isn't. Flow rules with such actions cannot
> >     be created and cannot be validated with PMDs that implement them.
> > 
> > Adrien Mazarguil (7):
> >   ethdev: add flow API object converter
> >   ethdev: add flow API item/action name conversion
> >   app/testpmd: rely on flow API conversion function
> >   net/failsafe: switch to flow API object conversion function
> >   net/bonding: switch to flow API object conversion function
> >   ethdev: deprecate rte_flow_copy function
> >   ethdev: add missing item/actions to flow object converter
> 
> Patch needs to be rebased to target v18.11 (in map file),

Right, will do it for v3.

> and indeed new APIs
> (rte_flow_conv) needs to be experimental.

This is what I did at first. Problem is that experimental APIs cannot be
used in internal code without triggering a compilation error unless
ALLOW_EXPERIMENTAL_API is defined (bonding cannot rely on an API marked as
experimental).

Since this series reimplements rte_flow_copy() as a wrapper to
rte_flow_conv(), I thought it didn't make sense for internal code to keep
using the former either.

Considering this, shall I add -DDALLOW_EXPERIMENTAL_API to bonding PMD or
keep things not experimental?

> And needs to remove deprecation notice in this patchset.

Doesn't it make sense to deprecate this function immediately after providing
a replacement on top of which it is reimplemented? Users end up using the
new function whether they want it or not. I don't think maintaining the
old duplicated code around is the right thing to do either.

> Also do you think does make sense to announce this change in release notes?

I'm not sure it's worth a release note. It's a rather obscure helper
function part of rte_flow. We didn't do it for rte_flow_copy() for
instance. Please confirm if you think it's needed.

> Apart from above, any volunteer for reviewing actual implementation?

I hope Gaetan will take a look, he added rte_flow_copy() after all :)

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state
  2018-08-24 14:51 11% [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state Ferruh Yigit
@ 2018-08-27 15:00  0% ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-08-27 15:00 UTC (permalink / raw)
  To: Ferruh Yigit, Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, Thomas Monjalon, Matan Azrad

On 08/24/2018 05:51 PM, Ferruh Yigit wrote:
> Add a deprecation notice to remove RTE_ETH_DEV_DEFERRED state, but this
> is mostly a reminder because of a missing target.
> It doesn't worth to break the ABI because of this change and removal
> can be done when ethdev ABI version increased.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: Matan Azrad <matan@mellanox.com>
> ---
>   doc/guides/rel_notes/deprecation.rst | 4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index e2dbee317..9cd12ccd8 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -95,3 +95,7 @@ Deprecation Notices
>   
>     This is due to a lack of flexibility and reliance on a type unusable with
>     C++ programs (struct rte_flow_desc).
> +
> +* ethdev: remove deprecated RTE_ETH_DEV_DEFERRED device state.
> +  Since this is an enum filed in the middle, removing this field will break
> +  the ABI, so removing postponed to next ethdev ABI version increase.

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter
  2018-08-24 10:58  0%   ` Ferruh Yigit
@ 2018-08-27 14:12  0%     ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-08-27 14:12 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Jerin Jacob, Gavin Hu

On Fri, Aug 24, 2018 at 11:58:39AM +0100, Ferruh Yigit wrote:
> On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> > This is a follow up to the "Flow API helpers enhancements" series submitted
> > almost a year ago [1]. The new title is due to the reduced scope of this
> > version.
> > 
> > rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> > temporary solution pending something better [2]. It replaces a lot of
> > duplicated code found in testpmd and removes some of the maintenance burden
> > that developers tend to forget (me included) when modifying pattern
> > item or actions (updating app/test-pmd/config.c to be clear).
> > 
> > This series was unearthed in order to complete the implementation of
> > RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> > duplicate existing code once again.
> > 
> > See individual patches for specific changes in this version.
> > 
> > v2 changes:
> > 
> > - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> > - Updated bonding PMD.
> > - No more automatic generation of rte_flow_conv.h.
> > 
> > [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> > [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> > [3] Currently the command-line parser (cmdline_flow.c) is aware of these
> >     actions, however config.c isn't. Flow rules with such actions cannot
> >     be created and cannot be validated with PMDs that implement them.
> > 
> > Adrien Mazarguil (7):
> >   ethdev: add flow API object converter
> >   ethdev: add flow API item/action name conversion
> >   app/testpmd: rely on flow API conversion function
> >   net/failsafe: switch to flow API object conversion function
> >   net/bonding: switch to flow API object conversion function
> >   ethdev: deprecate rte_flow_copy function
> >   ethdev: add missing item/actions to flow object converter
> 
> Causing build error for arm, it looks like related to rte_memcpy macro:
> 
> .../lib/librte_ethdev/rte_flow.c: In function ‘rte_flow_conv_item_spec’:
> .../lib/librte_ethdev/rte_flow.c:373:58: error: macro "rte_memcpy" passed 9
> arguments, but takes just 3
>        (size > sizeof(*dst.raw) ? sizeof(*dst.raw) : size));

Thanks, noticed it after sending v2. I'll fix it for v3.

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries
@ 2018-08-27 12:24  3% Anatoly Burakov
  2018-09-19  8:56  3% ` Thomas Monjalon
  2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
  0 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-08-27 12:24 UTC (permalink / raw)
  To: dev

Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.

Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.

This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.

[1] http://patches.dpdk.org/patch/43502/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 .../common/include/rte_eal_memconfig.h        |  4 ++++
 lib/librte_eal/linuxapp/eal/Makefile          |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c             | 20 +++++++++++++++++++
 lib/librte_eal/meson.build                    |  2 +-
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..62a21c2dc 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -77,6 +77,10 @@ struct rte_mem_config {
 	 * exact same address the primary process maps it.
 	 */
 	uint64_t mem_cfg_addr;
+
+	/* legacy mem and single file segments options are shared */
+	uint32_t legacy_mem;
+	uint32_t single_file_segments;
 } __attribute__((__packed__));
 
 
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..4a55d3b69 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -352,6 +352,24 @@ eal_proc_type_detect(void)
 	return ptype;
 }
 
+/* copies data from internal config to shared config */
+static void
+eal_update_mem_config(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	mcfg->legacy_mem = internal_config.legacy_mem;
+	mcfg->single_file_segments = internal_config.single_file_segments;
+}
+
+/* copies data from shared config to internal config */
+static void
+eal_update_internal_config(void)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	internal_config.legacy_mem = mcfg->legacy_mem;
+	internal_config.single_file_segments = mcfg->single_file_segments;
+}
+
 /* Sets up rte_config structure with the pointer to shared memory config.*/
 static void
 rte_config_init(void)
@@ -361,11 +379,13 @@ rte_config_init(void)
 	switch (rte_config.process_type){
 	case RTE_PROC_PRIMARY:
 		rte_eal_config_create();
+		eal_update_mem_config();
 		break;
 	case RTE_PROC_SECONDARY:
 		rte_eal_config_attach();
 		rte_eal_mcfg_wait_complete(rte_config.mem_config);
 		rte_eal_config_reattach();
+		eal_update_internal_config();
 		break;
 	case RTE_PROC_AUTO:
 	case RTE_PROC_INVALID:
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 8  # the version of the EAL API
+version = 9  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-stable] 18.05.1 patches review and test
  2018-08-27  9:29  0% ` Christian Ehrhardt
  2018-08-27 10:30  0%   ` [dpdk-dev] [dpdk-stable] " Marco Varlese
@ 2018-08-27 10:31  0%   ` Marco Varlese
  2018-09-05 14:41  0%   ` [dpdk-dev] " Christian Ehrhardt
  2 siblings, 0 replies; 200+ results
From: Marco Varlese @ 2018-08-27 10:31 UTC (permalink / raw)
  To: Christian Ehrhardt, stable; +Cc: dev

Hi Christian,

Apologies for being late on this... I tested 18.05.1-rc1 via the usual test-pmd, 
OvS-DPDK and VPP, and did not find any issues with it.


Cheers,
Marco

On Mon, 2018-08-27 at 11:29 +0200, Christian Ehrhardt wrote:
> On Wed, Aug 22, 2018 at 9:26 AM Christian Ehrhardt <
> christian.ehrhardt@canonical.com> wrote:
> 
> > Hi all,
> > 
> > Here is a list of patches targeted for stable release 18.05.1. Please
> > help review and test. The planned date for the final release is August,
> > 29th. Before that, please shout if anyone has objections with these
> > patches being applied.
> > 
> 
> There was neither positive nor negative feedback on 18.05.1-rc1 so far.
> Maybe 17.11.x priorities and general PTO time just beats 18.05 - which is
> fine to some extend.
> The only private message I got was about one party needing some extra time.
> For all of the above I will do two things:
> 1. the deadline to get back with results on 18.05.1-rc1 is extended to
> Tuesday the 4th of September
> 2. I'd highly appreciate feedback of people involved that intend to test it
> so I know what to wait for (or not)
> 
> Also for the companies committed to running regression tests,
> > please run the tests and report any issue before the release date.
> > 
> > A release candidate tarball can be found at:
> > 
> >     https://dpdk.org/browse/dpdk-stable/tag/?id=v18.05.1-rc1
> > 
> > These patches are located at branch 18.05 of dpdk-stable repo:
> > 
> >     https://git.dpdk.org/dpdk-stable/log/?h=18.05
> > 
> > Thanks.
> > 
> > Christian Ehrhardt <christian.ehrhardt@canonical.com>
> > 
> > ---
> > Adrien Mazarguil (8):
> >       app/testpmd: fix crash when attaching a device
> >       net/mlx4: fix minor resource leak during init
> >       net/mlx5: fix errno object in probe function
> >       net/mlx5: fix missing errno in probe function
> >       net/mlx5: fix error message in probe function
> >       net/mlx5: fix invalid error check
> >       maintainers: update for Mellanox PMDs
> >       net/mlx5: fix invalid network interface index
> > 
> > Ajit Khaparde (11):
> >       net/bnxt: fix clear port stats
> >       net/bnxt: fix close operation
> >       net/bnxt: fix HW Tx checksum offload check
> >       net/bnxt: check filter type before clearing it
> >       net/bnxt: fix set MTU
> >       net/bnxt: fix incorrect IO address handling in Tx
> >       net/bnxt: fix Rx ring count limitation
> >       net/bnxt: fix memory leaks in NVM commands
> >       net/bnxt: fix lock release on NVM write failure
> >       net/bnxt: check access denied for HWRM commands
> >       net/bnxt: fix RETA size
> > 
> > Alejandro Lucero (2):
> >       net/nfp: fix unused header reference
> >       net/nfp: fix field initialization in Tx descriptor
> > 
> > Alok Makhariya (1):
> >       bus/dpaa: fix phandle support for Linux 4.16
> > 
> > Anatoly Burakov (14):
> >       ipc: fix locking while sending messages
> >       mem: fix alignment of requested virtual areas
> >       eal/bsd: fix memory segment index display
> >       malloc: fix pad erasing
> >       eal/linux: fix invalid syntax in interrupts
> >       eal/linux: fix uninitialized value
> >       vfio: fix uninitialized variable
> >       malloc: do not skip pad on free
> >       test: fix EAL flags autotest on FreeBSD
> >       test: fix result printing
> >       test: fix code on report
> >       test: make autotest runner python 2/3 compliant
> >       test: print autotest categories
> >       test: improve filtering
> > 
> > Andrew Rybchenko (7):
> >       net/sfc: cut non VLAN ID bits from TCI
> >       net/sfc: discard packets with bad CRC on EF10 ESSB Rx
> >       net/sfc: fix double-free in EF10 ESSB Rx queue purge
> >       net/sfc: move Rx checksum offload check to device level
> >       net/sfc: fix Rx queue offloads reporting in queue info
> >       net/sfc: fix assert in set multicast address list
> >       net/sfc: handle unknown L3 packet class in EF10 event parser
> > 
> > Andy Green (2):
> >       ring: fix declaration after statement
> >       ring: fix sign conversion warning
> > 
> > Beilei Xing (5):
> >       net/i40e: fix shifts of 32-bit value
> >       net/i40e: fix PPPoL2TP packet type parsing
> >       net/i40e: fix packet type parsing with DDP
> >       net/i40e: fix setting TPID with AQ command
> >       net/i40e: fix device parameter parsing
> > 
> > Bruce Richardson (3):
> >       eal: fix error message for unsupported platforms
> >       examples/exception_path: fix out-of-bounds read
> >       mk: fix permissions when using make install
> > 
> > Chas Williams (2):
> >       net/bonding: always update bonding link status
> >       net/bonding: do not clear active slave count
> > 
> > Christian Ehrhardt (2):
> >       FIXUP: net/mlx5: fix invalid network interface index
> >       version: 18.05.1-rc1
> > 
> > Damjan Marion (1):
> >       net/i40e: do not reset device info data
> > 
> > Dan Gora (1):
> >       kni: fix crash with null name
> > 
> > Daria Kolistratova (1):
> >       net/ena: fix SIGFPE with 0 Rx queue
> > 
> > Dariusz Stojaczyk (7):
> >       mem: do not leave unmapped holes in EAL memory area
> >       mem: do not unmap overlapping region on mmap failure
> >       mem: avoid crash on memseg query with invalid address
> >       mem: fix alignment requested with --base-virtaddr
> >       mem: do not use --base-virtaddr in secondary processes
> >       eal: fix return codes on thread naming failure
> >       eal: fix return codes on control thread failure
> > 
> > David Marchand (1):
> >       net/bnxt: add missing ids in xstats
> > 
> > Drocula Lambda (1):
> >       kni: fix build on RHEL 7.5
> > 
> > Fan Zhang (1):
> >       crypto/virtio: fix IV physical address
> > 
> > Ferruh Yigit (4):
> >       kni: fix build with gcc 8.1
> >       net/thunderx: fix build with gcc optimization on
> >       app/testpmd: fix typo in setting Tx offload command
> >       drivers/net: fix crash in secondary process
> > 
> > Gage Eads (1):
> >       net: rename u16 to fix shadowed declaration
> > 
> > Gavin Hu (5):
> >       mk: fix cross build
> >       devtools: fix ninja command in build test
> >       build: fix for host clang and cross gcc
> >       net/dpaa2: remove loop for unused pool entries
> >       maintainers: claim maintainership for ARM v7 and v8
> > 
> > Haiyue Wang (1):
> >       net/i40e: workaround performance degradation
> > 
> > Harry van Haaren (2):
> >       net/i40e: fix rearm check in AVX2 Rx
> >       event: fix ring init failure handling
> > 
> > Hemant Agrawal (8):
> >       doc: fix limitations for dpaa crypto
> >       doc: fix limitations for dpaa2 crypto
> >       test/crypto: fix device id when stopping port
> >       bus/dpaa: fix SVR id fetch location
> >       bus/dpaa: fix buffer offset setting in FMAN
> >       net/dpaa: fix queue error handling and logs
> >       net/dpaa2: fix prefetch Rx to honor number of packets
> >       raw/dpaa2_qdma: fix IOVA as VA flag
> > 
> > Hyong Youb Kim (4):
> >       net/enic: fix receive packet types
> >       net/enic: update the UDP RSS detection mechanism
> >       net/enic: do not overwrite admin Tx queue limit
> >       net/enic: initialize RQ fetch index before enabling RQ
> > 
> > Ido Goshen (1):
> >       net/pcap: fix multiple queues
> > 
> > Igor Romanov (1):
> >       net/sfc: fix filter exceptions logic
> > 
> > Jananee Parthasarathy (1):
> >       mk: update targets for classified tests
> > 
> > Jay Ding (1):
> >       net/bnxt: check for invalid vNIC id
> > 
> > Jerin Jacob (3):
> >       doc: fix octeontx eventdev selftest argument
> >       ethdev: fix queue statistics mapping documentation
> >       eal: fix bitmap documentation
> > 
> > Kiran Kumar (3):
> >       net/bonding: fix MAC address reset
> >       ethdev: check queue stats mapping input arguments
> >       net/thunderx: avoid sq door bell write on zero packet
> > 
> > Konstantin Ananyev (3):
> >       examples/ipsec-secgw: fix IPv4 checksum at Tx
> >       examples/ipsec-secgw: fix bypass rule processing
> >       app/testpmd: fix DCB config
> > 
> > Krzysztof Kanas (2):
> >       app/testpmd: fix crash on TM command error
> >       app/testpmd: fix help for TM commit command
> > 
> > Lee Daly (1):
> >       compress/isal: fix offset usage
> > 
> > Matan Azrad (1):
> >       net/tap: fix zeroed flow mask configurations
> > 
> > Maxime Coquelin (2):
> >       vhost: fix missing increment of log cache count
> >       vhost: flush IOTLB cache on new mem table handling
> > 
> > Moti Haimovsky (2):
> >       net/mlx4: check RSS queues number limitation
> >       net/mlx4: advertise Rx jumbo frame support
> > 
> > Nelio Laranjeiro (3):
> >       net/mlx5: clean-up developer logs
> >       app/testpmd: fix missing count action fields
> >       net/mlx5: fix TCI mask filter
> > 
> > Nikhil Rao (5):
> >       eventdev: fix port in Rx adapter internal function
> >       eventdev: fix missing update to Rx adaper WRR position
> >       eventdev: add event buffer flush in Rx adapter
> >       eventdev: fix internal port logic in Rx adapter
> >       eventdev: fix Rx SW adapter stop
> > 
> > Nithin Dabilpuram (1):
> >       app/testpmd: fix buffer leak in TM command
> > 
> > Ophir Munk (1):
> >       net/mlx5: fix secondary process resource leakage
> > 
> > Pablo de Lara (13):
> >       cryptodev: fix ABI breakage
> >       net/ixgbe: fix crash on detach
> >       compress/isal: fix log type name
> >       compress/isal: set null pointer after freeing
> >       compress/isal: fix memory leak
> >       examples/l2fwd-crypto: fix digest with AEAD algo
> >       examples/l2fwd-crypto: check return value on IV size check
> >       examples/l2fwd-crypto: skip device not supporting operation
> >       devtools: remove already enabled nfp from build test
> >       test/hash: fix multiwriter with non consecutive cores
> >       test/hash: fix potential memory leak
> >       app/crypto-perf: fix auth IV offset
> >       hash: fix doxygen of return values
> > 
> > Pavan Nikhilesh (5):
> >       event/octeontx: fix flush callback
> >       mempool/octeontx: fix pool to aura mapping
> >       app/eventdev: fix order test service init
> >       event/octeontx: remove unnecessary port start and stop
> >       net/octeontx: fix stop clearing Rx/Tx functions
> > 
> > Qi Zhang (4):
> >       eal: fix hotplug add and remove
> >       vfio: fix PCI address comparison
> >       vfio: remove uneccessary IPC for group fd clear
> >       net/ixgbe: fix missing null check on detach
> > 
> > Radu Nicolau (4):
> >       security: fix crash on destroy null session
> >       net/bonding: fix invalid port id
> >       test: fix uninitialized port configuration
> >       net/bonding: fix race condition
> > 
> > Rafal Kozik (4):
> >       net/ena: check pointer before memset
> >       net/ena: change memory type
> >       net/ena: fix GENMASK_ULL macro
> >       net/ena: set link speed as none
> > 
> > Rahul Lakkireddy (4):
> >       net/cxgbe: report configured link auto-negotiation
> >       net/cxgbe: fix Rx channel map and queue type
> >       net/cxgbevf: add missing Tx byte counters
> >       net/cxgbe: fix init failure due to new flash parts
> > 
> > Rami Rosen (2):
> >       examples/l3fwd: remove useless include
> >       ethdev: fix a doxygen comment for port allocation
> > 
> > Rasesh Mody (11):
> >       net/qede: fix VF MTU update
> >       net/qede: fix for devargs
> >       net/qede: fix L2-handles used for RSS hash update
> >       net/qede: fix memory alloc for multiple port reconfig
> >       net/qede: remove primary MAC removal
> >       doc: update qede management firmware guide
> >       net/qede: fix default extended VLAN offload config
> >       net/qede/base: fix to clear HW indication
> >       net/qede/base: fix GRC attention callback
> >       net/bnx2x: fix FW command timeout during stop
> >       net/bnx2x: fix poll link status
> > 
> > Remy Horton (4):
> >       bitrate: add sanity check on parameters
> >       metrics: add check for invalid key
> >       metrics: do not fail silently when uninitialised
> >       metrics: disallow null as metric name
> > 
> > Reshma Pattan (3):
> >       test/flow_classify: fix return types
> >       mk: remove unnecessary test rules
> >       latency: free up the memzone
> > 
> > Rosen Xu (1):
> >       examples/flow_filtering: add flow director config for i40e
> > 
> > Shahaf Shuler (2):
> >       net/mlx5: separate generic tunnel TSO from the standard one
> >       net/mlx5: fix build with rdma-core v19
> > 
> > Shahed Shaikh (8):
> >       net/qede: fix incorrect link status update
> >       net/qede: fix link change event notification
> >       net/qede: fix unicast MAC address handling in VF
> >       net/qede: fix legacy interrupt mode
> >       net/qede: fix Rx/Tx offload flags
> >       net/qede: fix interrupt handler unregister
> >       net/qede: fix MAC address removal failure message
> >       net/qede: fix ntuple filter configuration
> > 
> > Shaopeng He (1):
> >       net/i40e: fix Tx queue setup after stop
> > 
> > Shreyansh Jain (1):
> >       doc: fix bonding command in testpmd
> > 
> > Somnath Kotur (4):
> >       net/bnxt: revert reset of L2 filter id
> >       net/bnxt: fix to move a flow to a different queue
> >       net/bnxt: use correct flags during VLAN configuration
> >       net/bnxt: fix filter freeing
> > 
> > Stephen Hemminger (2):
> >       net/mlx5: fix log initialization
> >       doc: fix typo in vdev_netvsc guide
> > 
> > Thomas Monjalon (2):
> >       bus/dpaa: fix build
> >       net/fm10k: remove unused constant
> > 
> > Timothy Redaelli (2):
> >       net/mlx4: avoid stripping the glue library
> >       net/mlx5: avoid stripping the glue library
> > 
> > Tiwei Bie (1):
> >       vhost: release locks on RARP packet failure
> > 
> > Tomasz Duszynski (1):
> >       net/mvpp2: check pointer before using it
> > 
> > Wei Zhao (7):
> >       net/ixgbe: add support for VLAN in IP mode FDIR
> >       net/ixgbe: fix tunnel id format error for FDIR
> >       net/ixgbe: fix tunnel type set error for FDIR
> >       net/ixgbe: fix mask bits register set error for FDIR
> >       app/testpmd: fix VLAN TCI mask set error for FDIR
> >       net/i40e: fix check of flow director programming status
> >       net/i40e: revert fix of flow director check
> > 
> > Xiaoxin Peng (1):
> >       net/bnxt: fix Tx with multiple mbuf
> > 
> > Xiaoyun Li (3):
> >       net/i40e: fix link speed
> >       app/testpmd: fix little performance drop
> >       net/avf: fix offload capabilities
> > 
> > Xueming Li (1):
> >       net/mlx5: fix crash in device probe
> > 
> > Yaroslav Brustinov (1):
> >       net/mlx5: fix linkage of glue lib with gcc 4.7.2
> > 
> > Yipeng Wang (3):
> >       hash: fix multiwriter lock memory allocation
> >       hash: fix a multi-writer race condition
> >       hash: fix key slot size accuracy
> > 
> > Yongseok Koh (6):
> >       net/mlx5: fix error number handling
> >       net/mlx5: fix Rx buffer replenishment threshold
> >       net/mlx5: fix assert for Tx completion queue count
> >       net/mlx5: fix queue rollback when starting device
> >       net/mlx5: preserve promiscuous flag for flow isolation mode
> >       net/mlx5: preserve allmulticast flag for flow isolation mode
> > 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] 18.05.1 patches review and test
  2018-08-27  9:29  0% ` Christian Ehrhardt
@ 2018-08-27 10:30  0%   ` Marco Varlese
  2018-08-27 10:31  0%   ` Marco Varlese
  2018-09-05 14:41  0%   ` [dpdk-dev] " Christian Ehrhardt
  2 siblings, 0 replies; 200+ results
From: Marco Varlese @ 2018-08-27 10:30 UTC (permalink / raw)
  To: Christian Ehrhardt, stable; +Cc: dev

Hi Christian,

Apologies for being late on this... I tested 18.05.1-rc1 via the usual test-pmd, 
OvS-DPDK and VPP, and did not find any issues with it.


Cheers,
Marco

On Mon, 2018-08-27 at 11:29 +0200, Christian Ehrhardt wrote:
> On Wed, Aug 22, 2018 at 9:26 AM Christian Ehrhardt <
> christian.ehrhardt@canonical.com> wrote:
> 
> > Hi all,
> > 
> > Here is a list of patches targeted for stable release 18.05.1. Please
> > help review and test. The planned date for the final release is August,
> > 29th. Before that, please shout if anyone has objections with these
> > patches being applied.
> > 
> 
> There was neither positive nor negative feedback on 18.05.1-rc1 so far.
> Maybe 17.11.x priorities and general PTO time just beats 18.05 - which is
> fine to some extend.
> The only private message I got was about one party needing some extra time.
> For all of the above I will do two things:
> 1. the deadline to get back with results on 18.05.1-rc1 is extended to
> Tuesday the 4th of September
> 2. I'd highly appreciate feedback of people involved that intend to test it
> so I know what to wait for (or not)
> 
> Also for the companies committed to running regression tests,
> > please run the tests and report any issue before the release date.
> > 
> > A release candidate tarball can be found at:
> > 
> >     https://dpdk.org/browse/dpdk-stable/tag/?id=v18.05.1-rc1
> > 
> > These patches are located at branch 18.05 of dpdk-stable repo:
> > 
> >     https://git.dpdk.org/dpdk-stable/log/?h=18.05
> > 
> > Thanks.
> > 
> > Christian Ehrhardt <christian.ehrhardt@canonical.com>
> > 
> > ---
> > Adrien Mazarguil (8):
> >       app/testpmd: fix crash when attaching a device
> >       net/mlx4: fix minor resource leak during init
> >       net/mlx5: fix errno object in probe function
> >       net/mlx5: fix missing errno in probe function
> >       net/mlx5: fix error message in probe function
> >       net/mlx5: fix invalid error check
> >       maintainers: update for Mellanox PMDs
> >       net/mlx5: fix invalid network interface index
> > 
> > Ajit Khaparde (11):
> >       net/bnxt: fix clear port stats
> >       net/bnxt: fix close operation
> >       net/bnxt: fix HW Tx checksum offload check
> >       net/bnxt: check filter type before clearing it
> >       net/bnxt: fix set MTU
> >       net/bnxt: fix incorrect IO address handling in Tx
> >       net/bnxt: fix Rx ring count limitation
> >       net/bnxt: fix memory leaks in NVM commands
> >       net/bnxt: fix lock release on NVM write failure
> >       net/bnxt: check access denied for HWRM commands
> >       net/bnxt: fix RETA size
> > 
> > Alejandro Lucero (2):
> >       net/nfp: fix unused header reference
> >       net/nfp: fix field initialization in Tx descriptor
> > 
> > Alok Makhariya (1):
> >       bus/dpaa: fix phandle support for Linux 4.16
> > 
> > Anatoly Burakov (14):
> >       ipc: fix locking while sending messages
> >       mem: fix alignment of requested virtual areas
> >       eal/bsd: fix memory segment index display
> >       malloc: fix pad erasing
> >       eal/linux: fix invalid syntax in interrupts
> >       eal/linux: fix uninitialized value
> >       vfio: fix uninitialized variable
> >       malloc: do not skip pad on free
> >       test: fix EAL flags autotest on FreeBSD
> >       test: fix result printing
> >       test: fix code on report
> >       test: make autotest runner python 2/3 compliant
> >       test: print autotest categories
> >       test: improve filtering
> > 
> > Andrew Rybchenko (7):
> >       net/sfc: cut non VLAN ID bits from TCI
> >       net/sfc: discard packets with bad CRC on EF10 ESSB Rx
> >       net/sfc: fix double-free in EF10 ESSB Rx queue purge
> >       net/sfc: move Rx checksum offload check to device level
> >       net/sfc: fix Rx queue offloads reporting in queue info
> >       net/sfc: fix assert in set multicast address list
> >       net/sfc: handle unknown L3 packet class in EF10 event parser
> > 
> > Andy Green (2):
> >       ring: fix declaration after statement
> >       ring: fix sign conversion warning
> > 
> > Beilei Xing (5):
> >       net/i40e: fix shifts of 32-bit value
> >       net/i40e: fix PPPoL2TP packet type parsing
> >       net/i40e: fix packet type parsing with DDP
> >       net/i40e: fix setting TPID with AQ command
> >       net/i40e: fix device parameter parsing
> > 
> > Bruce Richardson (3):
> >       eal: fix error message for unsupported platforms
> >       examples/exception_path: fix out-of-bounds read
> >       mk: fix permissions when using make install
> > 
> > Chas Williams (2):
> >       net/bonding: always update bonding link status
> >       net/bonding: do not clear active slave count
> > 
> > Christian Ehrhardt (2):
> >       FIXUP: net/mlx5: fix invalid network interface index
> >       version: 18.05.1-rc1
> > 
> > Damjan Marion (1):
> >       net/i40e: do not reset device info data
> > 
> > Dan Gora (1):
> >       kni: fix crash with null name
> > 
> > Daria Kolistratova (1):
> >       net/ena: fix SIGFPE with 0 Rx queue
> > 
> > Dariusz Stojaczyk (7):
> >       mem: do not leave unmapped holes in EAL memory area
> >       mem: do not unmap overlapping region on mmap failure
> >       mem: avoid crash on memseg query with invalid address
> >       mem: fix alignment requested with --base-virtaddr
> >       mem: do not use --base-virtaddr in secondary processes
> >       eal: fix return codes on thread naming failure
> >       eal: fix return codes on control thread failure
> > 
> > David Marchand (1):
> >       net/bnxt: add missing ids in xstats
> > 
> > Drocula Lambda (1):
> >       kni: fix build on RHEL 7.5
> > 
> > Fan Zhang (1):
> >       crypto/virtio: fix IV physical address
> > 
> > Ferruh Yigit (4):
> >       kni: fix build with gcc 8.1
> >       net/thunderx: fix build with gcc optimization on
> >       app/testpmd: fix typo in setting Tx offload command
> >       drivers/net: fix crash in secondary process
> > 
> > Gage Eads (1):
> >       net: rename u16 to fix shadowed declaration
> > 
> > Gavin Hu (5):
> >       mk: fix cross build
> >       devtools: fix ninja command in build test
> >       build: fix for host clang and cross gcc
> >       net/dpaa2: remove loop for unused pool entries
> >       maintainers: claim maintainership for ARM v7 and v8
> > 
> > Haiyue Wang (1):
> >       net/i40e: workaround performance degradation
> > 
> > Harry van Haaren (2):
> >       net/i40e: fix rearm check in AVX2 Rx
> >       event: fix ring init failure handling
> > 
> > Hemant Agrawal (8):
> >       doc: fix limitations for dpaa crypto
> >       doc: fix limitations for dpaa2 crypto
> >       test/crypto: fix device id when stopping port
> >       bus/dpaa: fix SVR id fetch location
> >       bus/dpaa: fix buffer offset setting in FMAN
> >       net/dpaa: fix queue error handling and logs
> >       net/dpaa2: fix prefetch Rx to honor number of packets
> >       raw/dpaa2_qdma: fix IOVA as VA flag
> > 
> > Hyong Youb Kim (4):
> >       net/enic: fix receive packet types
> >       net/enic: update the UDP RSS detection mechanism
> >       net/enic: do not overwrite admin Tx queue limit
> >       net/enic: initialize RQ fetch index before enabling RQ
> > 
> > Ido Goshen (1):
> >       net/pcap: fix multiple queues
> > 
> > Igor Romanov (1):
> >       net/sfc: fix filter exceptions logic
> > 
> > Jananee Parthasarathy (1):
> >       mk: update targets for classified tests
> > 
> > Jay Ding (1):
> >       net/bnxt: check for invalid vNIC id
> > 
> > Jerin Jacob (3):
> >       doc: fix octeontx eventdev selftest argument
> >       ethdev: fix queue statistics mapping documentation
> >       eal: fix bitmap documentation
> > 
> > Kiran Kumar (3):
> >       net/bonding: fix MAC address reset
> >       ethdev: check queue stats mapping input arguments
> >       net/thunderx: avoid sq door bell write on zero packet
> > 
> > Konstantin Ananyev (3):
> >       examples/ipsec-secgw: fix IPv4 checksum at Tx
> >       examples/ipsec-secgw: fix bypass rule processing
> >       app/testpmd: fix DCB config
> > 
> > Krzysztof Kanas (2):
> >       app/testpmd: fix crash on TM command error
> >       app/testpmd: fix help for TM commit command
> > 
> > Lee Daly (1):
> >       compress/isal: fix offset usage
> > 
> > Matan Azrad (1):
> >       net/tap: fix zeroed flow mask configurations
> > 
> > Maxime Coquelin (2):
> >       vhost: fix missing increment of log cache count
> >       vhost: flush IOTLB cache on new mem table handling
> > 
> > Moti Haimovsky (2):
> >       net/mlx4: check RSS queues number limitation
> >       net/mlx4: advertise Rx jumbo frame support
> > 
> > Nelio Laranjeiro (3):
> >       net/mlx5: clean-up developer logs
> >       app/testpmd: fix missing count action fields
> >       net/mlx5: fix TCI mask filter
> > 
> > Nikhil Rao (5):
> >       eventdev: fix port in Rx adapter internal function
> >       eventdev: fix missing update to Rx adaper WRR position
> >       eventdev: add event buffer flush in Rx adapter
> >       eventdev: fix internal port logic in Rx adapter
> >       eventdev: fix Rx SW adapter stop
> > 
> > Nithin Dabilpuram (1):
> >       app/testpmd: fix buffer leak in TM command
> > 
> > Ophir Munk (1):
> >       net/mlx5: fix secondary process resource leakage
> > 
> > Pablo de Lara (13):
> >       cryptodev: fix ABI breakage
> >       net/ixgbe: fix crash on detach
> >       compress/isal: fix log type name
> >       compress/isal: set null pointer after freeing
> >       compress/isal: fix memory leak
> >       examples/l2fwd-crypto: fix digest with AEAD algo
> >       examples/l2fwd-crypto: check return value on IV size check
> >       examples/l2fwd-crypto: skip device not supporting operation
> >       devtools: remove already enabled nfp from build test
> >       test/hash: fix multiwriter with non consecutive cores
> >       test/hash: fix potential memory leak
> >       app/crypto-perf: fix auth IV offset
> >       hash: fix doxygen of return values
> > 
> > Pavan Nikhilesh (5):
> >       event/octeontx: fix flush callback
> >       mempool/octeontx: fix pool to aura mapping
> >       app/eventdev: fix order test service init
> >       event/octeontx: remove unnecessary port start and stop
> >       net/octeontx: fix stop clearing Rx/Tx functions
> > 
> > Qi Zhang (4):
> >       eal: fix hotplug add and remove
> >       vfio: fix PCI address comparison
> >       vfio: remove uneccessary IPC for group fd clear
> >       net/ixgbe: fix missing null check on detach
> > 
> > Radu Nicolau (4):
> >       security: fix crash on destroy null session
> >       net/bonding: fix invalid port id
> >       test: fix uninitialized port configuration
> >       net/bonding: fix race condition
> > 
> > Rafal Kozik (4):
> >       net/ena: check pointer before memset
> >       net/ena: change memory type
> >       net/ena: fix GENMASK_ULL macro
> >       net/ena: set link speed as none
> > 
> > Rahul Lakkireddy (4):
> >       net/cxgbe: report configured link auto-negotiation
> >       net/cxgbe: fix Rx channel map and queue type
> >       net/cxgbevf: add missing Tx byte counters
> >       net/cxgbe: fix init failure due to new flash parts
> > 
> > Rami Rosen (2):
> >       examples/l3fwd: remove useless include
> >       ethdev: fix a doxygen comment for port allocation
> > 
> > Rasesh Mody (11):
> >       net/qede: fix VF MTU update
> >       net/qede: fix for devargs
> >       net/qede: fix L2-handles used for RSS hash update
> >       net/qede: fix memory alloc for multiple port reconfig
> >       net/qede: remove primary MAC removal
> >       doc: update qede management firmware guide
> >       net/qede: fix default extended VLAN offload config
> >       net/qede/base: fix to clear HW indication
> >       net/qede/base: fix GRC attention callback
> >       net/bnx2x: fix FW command timeout during stop
> >       net/bnx2x: fix poll link status
> > 
> > Remy Horton (4):
> >       bitrate: add sanity check on parameters
> >       metrics: add check for invalid key
> >       metrics: do not fail silently when uninitialised
> >       metrics: disallow null as metric name
> > 
> > Reshma Pattan (3):
> >       test/flow_classify: fix return types
> >       mk: remove unnecessary test rules
> >       latency: free up the memzone
> > 
> > Rosen Xu (1):
> >       examples/flow_filtering: add flow director config for i40e
> > 
> > Shahaf Shuler (2):
> >       net/mlx5: separate generic tunnel TSO from the standard one
> >       net/mlx5: fix build with rdma-core v19
> > 
> > Shahed Shaikh (8):
> >       net/qede: fix incorrect link status update
> >       net/qede: fix link change event notification
> >       net/qede: fix unicast MAC address handling in VF
> >       net/qede: fix legacy interrupt mode
> >       net/qede: fix Rx/Tx offload flags
> >       net/qede: fix interrupt handler unregister
> >       net/qede: fix MAC address removal failure message
> >       net/qede: fix ntuple filter configuration
> > 
> > Shaopeng He (1):
> >       net/i40e: fix Tx queue setup after stop
> > 
> > Shreyansh Jain (1):
> >       doc: fix bonding command in testpmd
> > 
> > Somnath Kotur (4):
> >       net/bnxt: revert reset of L2 filter id
> >       net/bnxt: fix to move a flow to a different queue
> >       net/bnxt: use correct flags during VLAN configuration
> >       net/bnxt: fix filter freeing
> > 
> > Stephen Hemminger (2):
> >       net/mlx5: fix log initialization
> >       doc: fix typo in vdev_netvsc guide
> > 
> > Thomas Monjalon (2):
> >       bus/dpaa: fix build
> >       net/fm10k: remove unused constant
> > 
> > Timothy Redaelli (2):
> >       net/mlx4: avoid stripping the glue library
> >       net/mlx5: avoid stripping the glue library
> > 
> > Tiwei Bie (1):
> >       vhost: release locks on RARP packet failure
> > 
> > Tomasz Duszynski (1):
> >       net/mvpp2: check pointer before using it
> > 
> > Wei Zhao (7):
> >       net/ixgbe: add support for VLAN in IP mode FDIR
> >       net/ixgbe: fix tunnel id format error for FDIR
> >       net/ixgbe: fix tunnel type set error for FDIR
> >       net/ixgbe: fix mask bits register set error for FDIR
> >       app/testpmd: fix VLAN TCI mask set error for FDIR
> >       net/i40e: fix check of flow director programming status
> >       net/i40e: revert fix of flow director check
> > 
> > Xiaoxin Peng (1):
> >       net/bnxt: fix Tx with multiple mbuf
> > 
> > Xiaoyun Li (3):
> >       net/i40e: fix link speed
> >       app/testpmd: fix little performance drop
> >       net/avf: fix offload capabilities
> > 
> > Xueming Li (1):
> >       net/mlx5: fix crash in device probe
> > 
> > Yaroslav Brustinov (1):
> >       net/mlx5: fix linkage of glue lib with gcc 4.7.2
> > 
> > Yipeng Wang (3):
> >       hash: fix multiwriter lock memory allocation
> >       hash: fix a multi-writer race condition
> >       hash: fix key slot size accuracy
> > 
> > Yongseok Koh (6):
> >       net/mlx5: fix error number handling
> >       net/mlx5: fix Rx buffer replenishment threshold
> >       net/mlx5: fix assert for Tx completion queue count
> >       net/mlx5: fix queue rollback when starting device
> >       net/mlx5: preserve promiscuous flag for flow isolation mode
> >       net/mlx5: preserve allmulticast flag for flow isolation mode
> > 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 18.05.1 patches review and test
  2018-08-22  7:25  2% [dpdk-dev] 18.05.1 patches review and test Christian Ehrhardt
@ 2018-08-27  9:29  0% ` Christian Ehrhardt
  2018-08-27 10:30  0%   ` [dpdk-dev] [dpdk-stable] " Marco Varlese
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Christian Ehrhardt @ 2018-08-27  9:29 UTC (permalink / raw)
  To: stable; +Cc: dev

On Wed, Aug 22, 2018 at 9:26 AM Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> Hi all,
>
> Here is a list of patches targeted for stable release 18.05.1. Please
> help review and test. The planned date for the final release is August,
> 29th. Before that, please shout if anyone has objections with these
> patches being applied.
>

There was neither positive nor negative feedback on 18.05.1-rc1 so far.
Maybe 17.11.x priorities and general PTO time just beats 18.05 - which is
fine to some extend.
The only private message I got was about one party needing some extra time.
For all of the above I will do two things:
1. the deadline to get back with results on 18.05.1-rc1 is extended to
Tuesday the 4th of September
2. I'd highly appreciate feedback of people involved that intend to test it
so I know what to wait for (or not)

Also for the companies committed to running regression tests,
> please run the tests and report any issue before the release date.
>
> A release candidate tarball can be found at:
>
>     https://dpdk.org/browse/dpdk-stable/tag/?id=v18.05.1-rc1
>
> These patches are located at branch 18.05 of dpdk-stable repo:
>
>     https://git.dpdk.org/dpdk-stable/log/?h=18.05
>
> Thanks.
>
> Christian Ehrhardt <christian.ehrhardt@canonical.com>
>
> ---
> Adrien Mazarguil (8):
>       app/testpmd: fix crash when attaching a device
>       net/mlx4: fix minor resource leak during init
>       net/mlx5: fix errno object in probe function
>       net/mlx5: fix missing errno in probe function
>       net/mlx5: fix error message in probe function
>       net/mlx5: fix invalid error check
>       maintainers: update for Mellanox PMDs
>       net/mlx5: fix invalid network interface index
>
> Ajit Khaparde (11):
>       net/bnxt: fix clear port stats
>       net/bnxt: fix close operation
>       net/bnxt: fix HW Tx checksum offload check
>       net/bnxt: check filter type before clearing it
>       net/bnxt: fix set MTU
>       net/bnxt: fix incorrect IO address handling in Tx
>       net/bnxt: fix Rx ring count limitation
>       net/bnxt: fix memory leaks in NVM commands
>       net/bnxt: fix lock release on NVM write failure
>       net/bnxt: check access denied for HWRM commands
>       net/bnxt: fix RETA size
>
> Alejandro Lucero (2):
>       net/nfp: fix unused header reference
>       net/nfp: fix field initialization in Tx descriptor
>
> Alok Makhariya (1):
>       bus/dpaa: fix phandle support for Linux 4.16
>
> Anatoly Burakov (14):
>       ipc: fix locking while sending messages
>       mem: fix alignment of requested virtual areas
>       eal/bsd: fix memory segment index display
>       malloc: fix pad erasing
>       eal/linux: fix invalid syntax in interrupts
>       eal/linux: fix uninitialized value
>       vfio: fix uninitialized variable
>       malloc: do not skip pad on free
>       test: fix EAL flags autotest on FreeBSD
>       test: fix result printing
>       test: fix code on report
>       test: make autotest runner python 2/3 compliant
>       test: print autotest categories
>       test: improve filtering
>
> Andrew Rybchenko (7):
>       net/sfc: cut non VLAN ID bits from TCI
>       net/sfc: discard packets with bad CRC on EF10 ESSB Rx
>       net/sfc: fix double-free in EF10 ESSB Rx queue purge
>       net/sfc: move Rx checksum offload check to device level
>       net/sfc: fix Rx queue offloads reporting in queue info
>       net/sfc: fix assert in set multicast address list
>       net/sfc: handle unknown L3 packet class in EF10 event parser
>
> Andy Green (2):
>       ring: fix declaration after statement
>       ring: fix sign conversion warning
>
> Beilei Xing (5):
>       net/i40e: fix shifts of 32-bit value
>       net/i40e: fix PPPoL2TP packet type parsing
>       net/i40e: fix packet type parsing with DDP
>       net/i40e: fix setting TPID with AQ command
>       net/i40e: fix device parameter parsing
>
> Bruce Richardson (3):
>       eal: fix error message for unsupported platforms
>       examples/exception_path: fix out-of-bounds read
>       mk: fix permissions when using make install
>
> Chas Williams (2):
>       net/bonding: always update bonding link status
>       net/bonding: do not clear active slave count
>
> Christian Ehrhardt (2):
>       FIXUP: net/mlx5: fix invalid network interface index
>       version: 18.05.1-rc1
>
> Damjan Marion (1):
>       net/i40e: do not reset device info data
>
> Dan Gora (1):
>       kni: fix crash with null name
>
> Daria Kolistratova (1):
>       net/ena: fix SIGFPE with 0 Rx queue
>
> Dariusz Stojaczyk (7):
>       mem: do not leave unmapped holes in EAL memory area
>       mem: do not unmap overlapping region on mmap failure
>       mem: avoid crash on memseg query with invalid address
>       mem: fix alignment requested with --base-virtaddr
>       mem: do not use --base-virtaddr in secondary processes
>       eal: fix return codes on thread naming failure
>       eal: fix return codes on control thread failure
>
> David Marchand (1):
>       net/bnxt: add missing ids in xstats
>
> Drocula Lambda (1):
>       kni: fix build on RHEL 7.5
>
> Fan Zhang (1):
>       crypto/virtio: fix IV physical address
>
> Ferruh Yigit (4):
>       kni: fix build with gcc 8.1
>       net/thunderx: fix build with gcc optimization on
>       app/testpmd: fix typo in setting Tx offload command
>       drivers/net: fix crash in secondary process
>
> Gage Eads (1):
>       net: rename u16 to fix shadowed declaration
>
> Gavin Hu (5):
>       mk: fix cross build
>       devtools: fix ninja command in build test
>       build: fix for host clang and cross gcc
>       net/dpaa2: remove loop for unused pool entries
>       maintainers: claim maintainership for ARM v7 and v8
>
> Haiyue Wang (1):
>       net/i40e: workaround performance degradation
>
> Harry van Haaren (2):
>       net/i40e: fix rearm check in AVX2 Rx
>       event: fix ring init failure handling
>
> Hemant Agrawal (8):
>       doc: fix limitations for dpaa crypto
>       doc: fix limitations for dpaa2 crypto
>       test/crypto: fix device id when stopping port
>       bus/dpaa: fix SVR id fetch location
>       bus/dpaa: fix buffer offset setting in FMAN
>       net/dpaa: fix queue error handling and logs
>       net/dpaa2: fix prefetch Rx to honor number of packets
>       raw/dpaa2_qdma: fix IOVA as VA flag
>
> Hyong Youb Kim (4):
>       net/enic: fix receive packet types
>       net/enic: update the UDP RSS detection mechanism
>       net/enic: do not overwrite admin Tx queue limit
>       net/enic: initialize RQ fetch index before enabling RQ
>
> Ido Goshen (1):
>       net/pcap: fix multiple queues
>
> Igor Romanov (1):
>       net/sfc: fix filter exceptions logic
>
> Jananee Parthasarathy (1):
>       mk: update targets for classified tests
>
> Jay Ding (1):
>       net/bnxt: check for invalid vNIC id
>
> Jerin Jacob (3):
>       doc: fix octeontx eventdev selftest argument
>       ethdev: fix queue statistics mapping documentation
>       eal: fix bitmap documentation
>
> Kiran Kumar (3):
>       net/bonding: fix MAC address reset
>       ethdev: check queue stats mapping input arguments
>       net/thunderx: avoid sq door bell write on zero packet
>
> Konstantin Ananyev (3):
>       examples/ipsec-secgw: fix IPv4 checksum at Tx
>       examples/ipsec-secgw: fix bypass rule processing
>       app/testpmd: fix DCB config
>
> Krzysztof Kanas (2):
>       app/testpmd: fix crash on TM command error
>       app/testpmd: fix help for TM commit command
>
> Lee Daly (1):
>       compress/isal: fix offset usage
>
> Matan Azrad (1):
>       net/tap: fix zeroed flow mask configurations
>
> Maxime Coquelin (2):
>       vhost: fix missing increment of log cache count
>       vhost: flush IOTLB cache on new mem table handling
>
> Moti Haimovsky (2):
>       net/mlx4: check RSS queues number limitation
>       net/mlx4: advertise Rx jumbo frame support
>
> Nelio Laranjeiro (3):
>       net/mlx5: clean-up developer logs
>       app/testpmd: fix missing count action fields
>       net/mlx5: fix TCI mask filter
>
> Nikhil Rao (5):
>       eventdev: fix port in Rx adapter internal function
>       eventdev: fix missing update to Rx adaper WRR position
>       eventdev: add event buffer flush in Rx adapter
>       eventdev: fix internal port logic in Rx adapter
>       eventdev: fix Rx SW adapter stop
>
> Nithin Dabilpuram (1):
>       app/testpmd: fix buffer leak in TM command
>
> Ophir Munk (1):
>       net/mlx5: fix secondary process resource leakage
>
> Pablo de Lara (13):
>       cryptodev: fix ABI breakage
>       net/ixgbe: fix crash on detach
>       compress/isal: fix log type name
>       compress/isal: set null pointer after freeing
>       compress/isal: fix memory leak
>       examples/l2fwd-crypto: fix digest with AEAD algo
>       examples/l2fwd-crypto: check return value on IV size check
>       examples/l2fwd-crypto: skip device not supporting operation
>       devtools: remove already enabled nfp from build test
>       test/hash: fix multiwriter with non consecutive cores
>       test/hash: fix potential memory leak
>       app/crypto-perf: fix auth IV offset
>       hash: fix doxygen of return values
>
> Pavan Nikhilesh (5):
>       event/octeontx: fix flush callback
>       mempool/octeontx: fix pool to aura mapping
>       app/eventdev: fix order test service init
>       event/octeontx: remove unnecessary port start and stop
>       net/octeontx: fix stop clearing Rx/Tx functions
>
> Qi Zhang (4):
>       eal: fix hotplug add and remove
>       vfio: fix PCI address comparison
>       vfio: remove uneccessary IPC for group fd clear
>       net/ixgbe: fix missing null check on detach
>
> Radu Nicolau (4):
>       security: fix crash on destroy null session
>       net/bonding: fix invalid port id
>       test: fix uninitialized port configuration
>       net/bonding: fix race condition
>
> Rafal Kozik (4):
>       net/ena: check pointer before memset
>       net/ena: change memory type
>       net/ena: fix GENMASK_ULL macro
>       net/ena: set link speed as none
>
> Rahul Lakkireddy (4):
>       net/cxgbe: report configured link auto-negotiation
>       net/cxgbe: fix Rx channel map and queue type
>       net/cxgbevf: add missing Tx byte counters
>       net/cxgbe: fix init failure due to new flash parts
>
> Rami Rosen (2):
>       examples/l3fwd: remove useless include
>       ethdev: fix a doxygen comment for port allocation
>
> Rasesh Mody (11):
>       net/qede: fix VF MTU update
>       net/qede: fix for devargs
>       net/qede: fix L2-handles used for RSS hash update
>       net/qede: fix memory alloc for multiple port reconfig
>       net/qede: remove primary MAC removal
>       doc: update qede management firmware guide
>       net/qede: fix default extended VLAN offload config
>       net/qede/base: fix to clear HW indication
>       net/qede/base: fix GRC attention callback
>       net/bnx2x: fix FW command timeout during stop
>       net/bnx2x: fix poll link status
>
> Remy Horton (4):
>       bitrate: add sanity check on parameters
>       metrics: add check for invalid key
>       metrics: do not fail silently when uninitialised
>       metrics: disallow null as metric name
>
> Reshma Pattan (3):
>       test/flow_classify: fix return types
>       mk: remove unnecessary test rules
>       latency: free up the memzone
>
> Rosen Xu (1):
>       examples/flow_filtering: add flow director config for i40e
>
> Shahaf Shuler (2):
>       net/mlx5: separate generic tunnel TSO from the standard one
>       net/mlx5: fix build with rdma-core v19
>
> Shahed Shaikh (8):
>       net/qede: fix incorrect link status update
>       net/qede: fix link change event notification
>       net/qede: fix unicast MAC address handling in VF
>       net/qede: fix legacy interrupt mode
>       net/qede: fix Rx/Tx offload flags
>       net/qede: fix interrupt handler unregister
>       net/qede: fix MAC address removal failure message
>       net/qede: fix ntuple filter configuration
>
> Shaopeng He (1):
>       net/i40e: fix Tx queue setup after stop
>
> Shreyansh Jain (1):
>       doc: fix bonding command in testpmd
>
> Somnath Kotur (4):
>       net/bnxt: revert reset of L2 filter id
>       net/bnxt: fix to move a flow to a different queue
>       net/bnxt: use correct flags during VLAN configuration
>       net/bnxt: fix filter freeing
>
> Stephen Hemminger (2):
>       net/mlx5: fix log initialization
>       doc: fix typo in vdev_netvsc guide
>
> Thomas Monjalon (2):
>       bus/dpaa: fix build
>       net/fm10k: remove unused constant
>
> Timothy Redaelli (2):
>       net/mlx4: avoid stripping the glue library
>       net/mlx5: avoid stripping the glue library
>
> Tiwei Bie (1):
>       vhost: release locks on RARP packet failure
>
> Tomasz Duszynski (1):
>       net/mvpp2: check pointer before using it
>
> Wei Zhao (7):
>       net/ixgbe: add support for VLAN in IP mode FDIR
>       net/ixgbe: fix tunnel id format error for FDIR
>       net/ixgbe: fix tunnel type set error for FDIR
>       net/ixgbe: fix mask bits register set error for FDIR
>       app/testpmd: fix VLAN TCI mask set error for FDIR
>       net/i40e: fix check of flow director programming status
>       net/i40e: revert fix of flow director check
>
> Xiaoxin Peng (1):
>       net/bnxt: fix Tx with multiple mbuf
>
> Xiaoyun Li (3):
>       net/i40e: fix link speed
>       app/testpmd: fix little performance drop
>       net/avf: fix offload capabilities
>
> Xueming Li (1):
>       net/mlx5: fix crash in device probe
>
> Yaroslav Brustinov (1):
>       net/mlx5: fix linkage of glue lib with gcc 4.7.2
>
> Yipeng Wang (3):
>       hash: fix multiwriter lock memory allocation
>       hash: fix a multi-writer race condition
>       hash: fix key slot size accuracy
>
> Yongseok Koh (6):
>       net/mlx5: fix error number handling
>       net/mlx5: fix Rx buffer replenishment threshold
>       net/mlx5: fix assert for Tx completion queue count
>       net/mlx5: fix queue rollback when starting device
>       net/mlx5: preserve promiscuous flag for flow isolation mode
>       net/mlx5: preserve allmulticast flag for flow isolation mode
>


-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: make rte_eth_is_valid_owner_id return bool
  2018-08-21 18:31  0%           ` Stephen Hemminger
@ 2018-08-26  7:49  0%             ` Matan Azrad
  0 siblings, 0 replies; 200+ results
From: Matan Azrad @ 2018-08-26  7:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger



From: Stephen Hemminger 
> On Tue, 21 Aug 2018 15:48:19 +0000
> Matan Azrad <matan@mellanox.com> wrote:
> 
> > Hi
> >
> > From: Stephen Hemminger
> > > On Tue, 21 Aug 2018 10:20:43 +0000
> > > Matan Azrad <matan@mellanox.com> wrote:
> > >
> > > > From: Stephen Hemminger
> > > > > Function is boolean so use that.
> > > >
> > > > Ethdev is not using bool type, see also:
> > > > rte_eth_dev_is_valid_port
> > > > rte_eth_dev_is_removed
> > > > rte_eth_dev_pool_ops_supported
> > > >
> > > > I think it should be a full solution to all.
> > > >
> > > > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> > >
> > > I didn't want change type of visible (exported by ABI) functions.
> > >
> > Since ethdev now is not using bool type I think it's better not to change it
> only for this API.
> 
> I hate to pick nits but there is already a bool usage in internal function (static)
> in ethdev.
> 
> 
> static bool
> is_allocated(const struct rte_eth_dev *ethdev) {
> 	return ethdev->data->name[0] != '\0';
> }
> 
> Using bool functions doesn't really generate different code. It is is more
> about using modern C conventions.

Agree, but I think it should be the same API at least as  rte_eth_dev_is_valid_port, just for ethdev convention.

Let's give to the maintainer the decision.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
@ 2018-08-24 17:48  1% Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-08-24 17:48 UTC (permalink / raw)
  To: dev
  Cc: Konstantin Ananyev, Pablo de Lara, Akhil Goyal, Declan Doherty,
	Ravi Kumar, Jerin Jacob, Fan Zhang, Fiona Trahe,
	Tomasz Duszynski, Hemant Agrawal, Natalie Samsonov,
	Dmitri Epshtein, Jay Zhou

This RFC for proposes several changes inside rte_cryptodev_sym_session.
Note that this is just RFC not a complete patch, so for now
I modified only the librte_cryptodev itself,
some cryptodev PMD, test-crypto-perf and ipsec-secgw example.
Proposed changes means ABI/API breakage inside cryptodev,
so looking for feedback from crypto-dev lib and crypto-PMD maintainiers.
Below are details and reasoning for proposed changes.

1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
  operate based on cytpodev device id, though inside
  rte_cryptodev_sym_session device specific data is addressed
  by driver id (not device id).
  That creates a problem with current implementation when we have
  two or more devices with the same driver used by the same session.
  Consider the following example:
 
  struct rte_cryptodev_sym_session *sess;
  rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
  rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
  rte_cryptodev_sym_session_clear(dev_id=X, sess);

  After that point if X and Y uses the same driver,
  then sess can't be used by device Y any more.
  The reason for that - driver specific (not device specific)
  data per session, plus there is no information
  how many device instances use that data.
  Probably the simplest way to deal with that issue -
  add a reference counter per each driver data.

2.rte_cryptodev_sym_session_set_user_data() and
  rte_cryptodev_sym_session_get_user_data() -
  with current implementation there is no defined way for the user to
  determine what is the max allowed size of the private data.
  Even within rte_cryptodev_sym_session_set_user_data() we just blindly
  copying user provided data without checking memory boundaries violation.
  To overcome that issue I added 'uint16_t priv_size' into
  rte_cryptodev_sym_session structure.

3.rte_cryptodev_sym_session contains an array of variable size for
  driver specific data.
  Though number of elements in that array is determined by static
  variable nb_drivers, that could be modified by
  rte_cryptodev_allocate_driver().
  That construction seems to work ok so far, as right now users register
  all their PMDs at startup, though it doesn't mean that it would always
  remain like that.
  To make it less error prone I added 'uint16_t nb_drivers' into the
  rte_cryptodev_sym_session structure.
  At least that allows related functions to check that provided
  driver id wouldn't overrun variable array boundaries,
  again it allows to determine size of already allocated session
  without accessing global variable.

4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
  would have sort of readonly type data (init once at allocation time,
  keep unmodified through session life-time).
  That requires more changes in current cryptodev implementation: 
  Right now inside cryptodev framework both rte_cryptodev_sym_session
  and driver specific session data are two completely different sctrucures
  (e.g. struct struct null_crypto_session and struct null_crypto_session).
  Though current cryptodev implementation implicitly assumes that driver
  will allocate both of them from within the same mempool.
  Plus this is done in a manner that they override each other fields
  (reuse the same space - sort of implicit C union).
  That's probably not the best programming practice,
  plus make impossible to have readonly fields inside both of them.
  So to overcome that situation I changed an API a bit, to allow
  to use two different mempools for these two distinct data structures.

 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
   I suppose that self-explanatory, and might be used in a lot of places
   (would be quite useful for ipsec library we develop).

So the new proposed layout for rte_cryptodev_sym_session:
struct rte_cryptodev_sym_session {
        uint64_t userdata;
        /**< Can be used for external metadata */
        uint16_t nb_drivers;
        /**< number of elements in sess_data array */
        uint16_t priv_size;
        /**< session private data will be placed after sess_data */
        __extension__ struct {
                void *data;
                uint16_t refcnt;
        } sess_data[0];
        /**< Driver specific session material, variable size */
}; 


Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-crypto-perf/cperf.h                       |   1 +
 app/test-crypto-perf/cperf_ops.c                   |  11 +-
 app/test-crypto-perf/cperf_ops.h                   |   2 +-
 app/test-crypto-perf/cperf_test_latency.c          |   5 +-
 app/test-crypto-perf/cperf_test_latency.h          |   1 +
 app/test-crypto-perf/cperf_test_pmd_cyclecount.c   |   5 +-
 app/test-crypto-perf/cperf_test_pmd_cyclecount.h   |   1 +
 app/test-crypto-perf/cperf_test_throughput.c       |   5 +-
 app/test-crypto-perf/cperf_test_throughput.h       |   1 +
 app/test-crypto-perf/cperf_test_verify.c           |   5 +-
 app/test-crypto-perf/cperf_test_verify.h           |   1 +
 app/test-crypto-perf/main.c                        | 111 +++++++++++------
 drivers/crypto/aesni_gcm/aesni_gcm_pmd.c           |  10 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c       |   5 +-
 drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h   |   4 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c         |  10 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c     |   5 +-
 drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h |   4 +-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c        |   3 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c                 |   3 +-
 drivers/crypto/null/null_crypto_pmd.c              |  14 ++-
 drivers/crypto/null/null_crypto_pmd_ops.c          |   5 +-
 drivers/crypto/null/null_crypto_pmd_private.h      |   4 +-
 drivers/crypto/scheduler/scheduler_pmd_ops.c       |   5 +-
 drivers/crypto/virtio/virtio_cryptodev.c           |   6 +-
 examples/ipsec-secgw/ipsec-secgw.c                 | 116 ++++++++++++------
 examples/ipsec-secgw/ipsec.h                       |   2 +
 lib/librte_cryptodev/rte_cryptodev.c               | 134 ++++++++++++---------
 lib/librte_cryptodev/rte_cryptodev.h               |  53 ++++++--
 lib/librte_cryptodev/rte_cryptodev_pmd.h           |  16 ++-
 30 files changed, 356 insertions(+), 192 deletions(-)

diff --git a/app/test-crypto-perf/cperf.h b/app/test-crypto-perf/cperf.h
index db58228dc..2e0acac62 100644
--- a/app/test-crypto-perf/cperf.h
+++ b/app/test-crypto-perf/cperf.h
@@ -15,6 +15,7 @@ struct cperf_op_fns;
 
 typedef void  *(*cperf_constructor_t)(
 		struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id,
 		uint16_t qp_id,
 		const struct cperf_options *options,
diff --git a/app/test-crypto-perf/cperf_ops.c b/app/test-crypto-perf/cperf_ops.c
index 8f320099d..2a202f90a 100644
--- a/app/test-crypto-perf/cperf_ops.c
+++ b/app/test-crypto-perf/cperf_ops.c
@@ -469,6 +469,7 @@ cperf_set_ops_aead(struct rte_crypto_op **ops,
 
 static struct rte_cryptodev_sym_session *
 cperf_create_session(struct rte_mempool *sess_mp,
+	struct rte_mempool *priv_mp,
 	uint8_t dev_id,
 	const struct cperf_options *options,
 	const struct cperf_test_vector *test_vector,
@@ -505,7 +506,7 @@ cperf_create_session(struct rte_mempool *sess_mp,
 		}
 		/* create crypto session */
 		rte_cryptodev_sym_session_init(dev_id, sess, &cipher_xform,
-				sess_mp);
+				priv_mp);
 	/*
 	 *  auth only
 	 */
@@ -532,7 +533,7 @@ cperf_create_session(struct rte_mempool *sess_mp,
 		}
 		/* create crypto session */
 		rte_cryptodev_sym_session_init(dev_id, sess, &auth_xform,
-				sess_mp);
+				priv_mp);
 	/*
 	 * cipher and auth
 	 */
@@ -589,12 +590,12 @@ cperf_create_session(struct rte_mempool *sess_mp,
 			cipher_xform.next = &auth_xform;
 			/* create crypto session */
 			rte_cryptodev_sym_session_init(dev_id,
-					sess, &cipher_xform, sess_mp);
+					sess, &cipher_xform, priv_mp);
 		} else { /* auth then cipher */
 			auth_xform.next = &cipher_xform;
 			/* create crypto session */
 			rte_cryptodev_sym_session_init(dev_id,
-					sess, &auth_xform, sess_mp);
+					sess, &auth_xform, priv_mp);
 		}
 	} else { /* options->op_type == CPERF_AEAD */
 		aead_xform.type = RTE_CRYPTO_SYM_XFORM_AEAD;
@@ -615,7 +616,7 @@ cperf_create_session(struct rte_mempool *sess_mp,
 
 		/* Create crypto session */
 		rte_cryptodev_sym_session_init(dev_id,
-					sess, &aead_xform, sess_mp);
+					sess, &aead_xform, priv_mp);
 	}
 
 	return sess;
diff --git a/app/test-crypto-perf/cperf_ops.h b/app/test-crypto-perf/cperf_ops.h
index 29e109f2a..80b38d537 100644
--- a/app/test-crypto-perf/cperf_ops.h
+++ b/app/test-crypto-perf/cperf_ops.h
@@ -13,7 +13,7 @@
 
 
 typedef struct rte_cryptodev_sym_session *(*cperf_sessions_create_t)(
-		struct rte_mempool *sess_mp,
+		struct rte_mempool *sess_mp, struct rte_mempool *priv_mp,
 		uint8_t dev_id, const struct cperf_options *options,
 		const struct cperf_test_vector *test_vector,
 		uint16_t iv_offset);
diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c
index c9c98dc50..8c8f759eb 100644
--- a/app/test-crypto-perf/cperf_test_latency.c
+++ b/app/test-crypto-perf/cperf_test_latency.c
@@ -62,6 +62,7 @@ cperf_latency_test_free(struct cperf_latency_ctx *ctx)
 
 void *
 cperf_latency_test_constructor(struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id, uint16_t qp_id,
 		const struct cperf_options *options,
 		const struct cperf_test_vector *test_vector,
@@ -86,8 +87,8 @@ cperf_latency_test_constructor(struct rte_mempool *sess_mp,
 		sizeof(struct rte_crypto_sym_op) +
 		sizeof(struct cperf_op_result *);
 
-	ctx->sess = op_fns->sess_create(sess_mp, dev_id, options, test_vector,
-			iv_offset);
+	ctx->sess = op_fns->sess_create(sess_mp, priv_mp, dev_id, options,
+			test_vector, iv_offset);
 	if (ctx->sess == NULL)
 		goto err;
 
diff --git a/app/test-crypto-perf/cperf_test_latency.h b/app/test-crypto-perf/cperf_test_latency.h
index d3fc3218d..85c61586f 100644
--- a/app/test-crypto-perf/cperf_test_latency.h
+++ b/app/test-crypto-perf/cperf_test_latency.h
@@ -17,6 +17,7 @@
 void *
 cperf_latency_test_constructor(
 		struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id,
 		uint16_t qp_id,
 		const struct cperf_options *options,
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
index c8d16db6d..b3a06f810 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c
@@ -80,6 +80,7 @@ cperf_pmd_cyclecount_test_free(struct cperf_pmd_cyclecount_ctx *ctx)
 
 void *
 cperf_pmd_cyclecount_test_constructor(struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id, uint16_t qp_id,
 		const struct cperf_options *options,
 		const struct cperf_test_vector *test_vector,
@@ -106,8 +107,8 @@ cperf_pmd_cyclecount_test_constructor(struct rte_mempool *sess_mp,
 	uint16_t iv_offset = sizeof(struct rte_crypto_op) +
 			sizeof(struct rte_crypto_sym_op);
 
-	ctx->sess = op_fns->sess_create(
-			sess_mp, dev_id, options, test_vector, iv_offset);
+	ctx->sess = op_fns->sess_create(sess_mp, priv_mp, dev_id, options,
+			test_vector, iv_offset);
 	if (ctx->sess == NULL)
 		goto err;
 
diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.h b/app/test-crypto-perf/cperf_test_pmd_cyclecount.h
index beb441991..1b22508dd 100644
--- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.h
+++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.h
@@ -18,6 +18,7 @@
 void *
 cperf_pmd_cyclecount_test_constructor(
 		struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id,
 		uint16_t qp_id,
 		const struct cperf_options *options,
diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c
index 8766d6e9b..abd04a332 100644
--- a/app/test-crypto-perf/cperf_test_throughput.c
+++ b/app/test-crypto-perf/cperf_test_throughput.c
@@ -47,6 +47,7 @@ cperf_throughput_test_free(struct cperf_throughput_ctx *ctx)
 
 void *
 cperf_throughput_test_constructor(struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id, uint16_t qp_id,
 		const struct cperf_options *options,
 		const struct cperf_test_vector *test_vector,
@@ -69,8 +70,8 @@ cperf_throughput_test_constructor(struct rte_mempool *sess_mp,
 	uint16_t iv_offset = sizeof(struct rte_crypto_op) +
 		sizeof(struct rte_crypto_sym_op);
 
-	ctx->sess = op_fns->sess_create(sess_mp, dev_id, options, test_vector,
-					iv_offset);
+	ctx->sess = op_fns->sess_create(sess_mp, priv_mp, dev_id, options,
+					test_vector, iv_offset);
 	if (ctx->sess == NULL)
 		goto err;
 
diff --git a/app/test-crypto-perf/cperf_test_throughput.h b/app/test-crypto-perf/cperf_test_throughput.h
index 439ec8e55..b7bb2e749 100644
--- a/app/test-crypto-perf/cperf_test_throughput.h
+++ b/app/test-crypto-perf/cperf_test_throughput.h
@@ -18,6 +18,7 @@
 void *
 cperf_throughput_test_constructor(
 		struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id,
 		uint16_t qp_id,
 		const struct cperf_options *options,
diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c
index 9134b921e..e645fc5e8 100644
--- a/app/test-crypto-perf/cperf_test_verify.c
+++ b/app/test-crypto-perf/cperf_test_verify.c
@@ -51,6 +51,7 @@ cperf_verify_test_free(struct cperf_verify_ctx *ctx)
 
 void *
 cperf_verify_test_constructor(struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id, uint16_t qp_id,
 		const struct cperf_options *options,
 		const struct cperf_test_vector *test_vector,
@@ -73,8 +74,8 @@ cperf_verify_test_constructor(struct rte_mempool *sess_mp,
 	uint16_t iv_offset = sizeof(struct rte_crypto_op) +
 		sizeof(struct rte_crypto_sym_op);
 
-	ctx->sess = op_fns->sess_create(sess_mp, dev_id, options, test_vector,
-			iv_offset);
+	ctx->sess = op_fns->sess_create(sess_mp, priv_mp, dev_id, options,
+		test_vector, iv_offset);
 	if (ctx->sess == NULL)
 		goto err;
 
diff --git a/app/test-crypto-perf/cperf_test_verify.h b/app/test-crypto-perf/cperf_test_verify.h
index 9f70ad87b..2484af697 100644
--- a/app/test-crypto-perf/cperf_test_verify.h
+++ b/app/test-crypto-perf/cperf_test_verify.h
@@ -18,6 +18,7 @@
 void *
 cperf_verify_test_constructor(
 		struct rte_mempool *sess_mp,
+		struct rte_mempool *priv_mp,
 		uint8_t dev_id,
 		uint16_t qp_id,
 		const struct cperf_options *options,
diff --git a/app/test-crypto-perf/main.c b/app/test-crypto-perf/main.c
index 5c7dadb60..42a34c74f 100644
--- a/app/test-crypto-perf/main.c
+++ b/app/test-crypto-perf/main.c
@@ -22,6 +22,11 @@
 #include "cperf_test_pmd_cyclecount.h"
 
 
+static struct {
+	struct rte_mempool *sess_mp;
+	struct rte_mempool *priv_mp;
+} session_pool_socket[RTE_MAX_NUMA_NODES];
+
 const char *cperf_test_type_strs[] = {
 	[CPERF_TEST_TYPE_THROUGHPUT] = "throughput",
 	[CPERF_TEST_TYPE_LATENCY] = "latency",
@@ -61,8 +66,59 @@ const struct cperf_test cperf_testmap[] = {
 };
 
 static int
-cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs,
-			struct rte_mempool *session_pool_socket[])
+fill_session_pool_socket(int32_t socket_id, uint32_t max_sess_size,
+	uint32_t nb_sessions)
+{
+	char mp_name[RTE_MEMPOOL_NAMESIZE];
+	struct rte_mempool *sess_mp;
+
+	if (session_pool_socket[socket_id].priv_mp == NULL) {
+
+		snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
+			"priv_sess_mp_%u", socket_id);
+
+		sess_mp = rte_mempool_create(mp_name,
+					nb_sessions,
+					max_sess_size,
+					0, 0, NULL, NULL, NULL,
+					NULL, socket_id,
+					0);
+
+		if (sess_mp == NULL) {
+			printf("Cannot create pool \"%s\" on socket %d\n",
+				mp_name, socket_id);
+			return -ENOMEM;
+		}
+
+		printf("Allocated pool \"%s\" on socket %d\n",
+			mp_name, socket_id);
+		session_pool_socket[socket_id].priv_mp = sess_mp;
+	}
+
+	if (session_pool_socket[socket_id].sess_mp == NULL) {
+
+		snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
+			"sess_mp_%u", socket_id);
+
+		sess_mp = rte_cryptodev_sym_session_pool_create(mp_name,
+					nb_sessions, 0, 0, socket_id);
+
+		if (sess_mp == NULL) {
+			printf("Cannot create pool \"%s\" on socket %d\n",
+				mp_name, socket_id);
+			return -ENOMEM;
+		}
+
+		printf("Allocated pool \"%s\" on socket %d\n",
+			mp_name, socket_id);
+		session_pool_socket[socket_id].sess_mp = sess_mp;
+	}
+
+	return 0;
+}
+
+static int
+cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs)
 {
 	uint8_t enabled_cdev_count = 0, nb_lcores, cdev_id;
 	uint32_t sessions_needed = 0;
@@ -96,7 +152,7 @@ cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs,
 	uint32_t max_sess_size = 0, sess_size;
 
 	for (cdev_id = 0; cdev_id < rte_cryptodev_count(); cdev_id++) {
-		sess_size = rte_cryptodev_sym_get_private_session_size(cdev_id);
+		sess_size = rte_cryptodev_sym_private_session_size(cdev_id);
 		if (sess_size > max_sess_size)
 			max_sess_size = sess_size;
 	}
@@ -144,10 +200,6 @@ cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs,
 			.socket_id = socket_id
 		};
 
-		struct rte_cryptodev_qp_conf qp_conf = {
-			.nb_descriptors = opts->nb_descriptors
-		};
-
 		/**
 		 * Device info specifies the min headroom and tailroom
 		 * requirement for the crypto PMD. This need to be honoured
@@ -194,29 +246,19 @@ cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs,
 				"%u sessions\n", opts->nb_qps);
 			return -ENOTSUP;
 		}
-		if (session_pool_socket[socket_id] == NULL) {
-			char mp_name[RTE_MEMPOOL_NAMESIZE];
-			struct rte_mempool *sess_mp;
-
-			snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
-				"sess_mp_%u", socket_id);
-			sess_mp = rte_mempool_create(mp_name,
-						sessions_needed,
-						max_sess_size,
-						0,
-						0, NULL, NULL, NULL,
-						NULL, socket_id,
-						0);
-
-			if (sess_mp == NULL) {
-				printf("Cannot create session pool on socket %d\n",
-					socket_id);
-				return -ENOMEM;
-			}
 
-			printf("Allocated session pool on socket %d\n", socket_id);
-			session_pool_socket[socket_id] = sess_mp;
-		}
+		ret = fill_session_pool_socket(socket_id, max_sess_size,
+				sessions_needed);
+		if (ret < 0)
+			return ret;
+
+		struct rte_cryptodev_qp_conf qp_conf = {
+			.nb_descriptors = opts->nb_descriptors,
+			.sess_pool = session_pool_socket[socket_id].sess_mp,
+			.priv_sess_pool =
+				session_pool_socket[socket_id].priv_mp,
+		};
+
 
 		ret = rte_cryptodev_configure(cdev_id, &conf);
 		if (ret < 0) {
@@ -226,8 +268,7 @@ cperf_initialize_cryptodev(struct cperf_options *opts, uint8_t *enabled_cdevs,
 
 		for (j = 0; j < opts->nb_qps; j++) {
 			ret = rte_cryptodev_queue_pair_setup(cdev_id, j,
-				&qp_conf, socket_id,
-				session_pool_socket[socket_id]);
+				&qp_conf, socket_id);
 			if (ret < 0) {
 				printf("Failed to setup queue pair %u on "
 					"cryptodev %u",	j, cdev_id);
@@ -445,7 +486,6 @@ main(int argc, char **argv)
 	struct cperf_op_fns op_fns;
 
 	void *ctx[RTE_MAX_LCORE] = { };
-	struct rte_mempool *session_pool_socket[RTE_MAX_NUMA_NODES] = { 0 };
 
 	int nb_cryptodevs = 0;
 	uint16_t total_nb_qps = 0;
@@ -479,8 +519,7 @@ main(int argc, char **argv)
 		goto err;
 	}
 
-	nb_cryptodevs = cperf_initialize_cryptodev(&opts, enabled_cdevs,
-			session_pool_socket);
+	nb_cryptodevs = cperf_initialize_cryptodev(&opts, enabled_cdevs);
 
 	if (!opts.silent)
 		cperf_options_dump(&opts);
@@ -548,7 +587,9 @@ main(int argc, char **argv)
 		uint8_t socket_id = rte_cryptodev_socket_id(cdev_id);
 
 		ctx[i] = cperf_testmap[opts.test].constructor(
-				session_pool_socket[socket_id], cdev_id, qp_id,
+				session_pool_socket[socket_id].sess_mp,
+				session_pool_socket[socket_id].priv_mp,
+				cdev_id, qp_id,
 				&opts, t_vec, &op_fns);
 		if (ctx[i] == NULL) {
 			RTE_LOG(ERR, USER1, "Test run constructor failed\n");
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
index 752e0cd6a..c53705601 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd.c
@@ -137,7 +137,7 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
 		if (rte_mempool_get(qp->sess_mp, (void **)&_sess))
 			return NULL;
 
-		if (rte_mempool_get(qp->sess_mp, (void **)&_sess_private_data))
+		if (rte_mempool_get(qp->priv_mp, (void **)&_sess_private_data))
 			return NULL;
 
 		sess = (struct aesni_gcm_session *)_sess_private_data;
@@ -145,7 +145,7 @@ aesni_gcm_get_session(struct aesni_gcm_qp *qp, struct rte_crypto_op *op)
 		if (unlikely(aesni_gcm_set_session_parameters(qp->ops,
 				sess, sym_op->xform) != 0)) {
 			rte_mempool_put(qp->sess_mp, _sess);
-			rte_mempool_put(qp->sess_mp, _sess_private_data);
+			rte_mempool_put(qp->priv_mp, _sess_private_data);
 			sess = NULL;
 		}
 		sym_op->session = (struct rte_cryptodev_sym_session *)_sess;
@@ -391,9 +391,9 @@ handle_completed_gcm_crypto_op(struct aesni_gcm_qp *qp,
 	/* Free session if a session-less crypto op */
 	if (op->sess_type == RTE_CRYPTO_OP_SESSIONLESS) {
 		memset(sess, 0, sizeof(struct aesni_gcm_session));
-		memset(op->sym->session, 0,
-				rte_cryptodev_sym_get_header_session_size());
-		rte_mempool_put(qp->sess_mp, sess);
+		set_sym_session_private_data(op->sym->session,
+			cryptodev_driver_id, NULL);
+		rte_mempool_put(qp->priv_mp, sess);
 		rte_mempool_put(qp->sess_mp, op->sym->session);
 		op->sym->session = NULL;
 	}
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
index b6b4dd028..b02e43387 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_ops.c
@@ -201,7 +201,7 @@ aesni_gcm_pmd_qp_create_processed_pkts_ring(struct aesni_gcm_qp *qp,
 static int
 aesni_gcm_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id, struct rte_mempool *session_pool)
+		int socket_id)
 {
 	struct aesni_gcm_qp *qp = NULL;
 	struct aesni_gcm_private *internals = dev->data->dev_private;
@@ -229,7 +229,8 @@ aesni_gcm_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 	if (qp->processed_pkts == NULL)
 		goto qp_setup_cleanup;
 
-	qp->sess_mp = session_pool;
+	qp->sess_mp = qp_conf->sess_pool;
+	qp->priv_mp = qp_conf->priv_sess_pool;
 
 	memset(&qp->qp_stats, 0, sizeof(qp->qp_stats));
 
diff --git a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
index c13a12a57..bd42b3935 100644
--- a/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
+++ b/drivers/crypto/aesni_gcm/aesni_gcm_pmd_private.h
@@ -47,7 +47,9 @@ struct aesni_gcm_qp {
 	struct rte_cryptodev_stats qp_stats; /* 8 * 4 = 32 B */
 	/**< Queue pair statistics */
 	struct rte_mempool *sess_mp;
-	/**< Session Mempool */
+	/**< crypto session Mempool */
+	struct rte_mempool *priv_mp;
+	/**< private session Mempool */
 	uint16_t id;
 	/**< Queue Pair Identifier */
 	char name[RTE_CRYPTODEV_NAME_MAX_LEN];
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
index 93dc7a443..21e9fade7 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd.c
@@ -516,7 +516,7 @@ get_session(struct aesni_mb_qp *qp, struct rte_crypto_op *op)
 		if (rte_mempool_get(qp->sess_mp, (void **)&_sess))
 			return NULL;
 
-		if (rte_mempool_get(qp->sess_mp, (void **)&_sess_private_data))
+		if (rte_mempool_get(qp->priv_mp, (void **)&_sess_private_data))
 			return NULL;
 
 		sess = (struct aesni_mb_session *)_sess_private_data;
@@ -524,7 +524,7 @@ get_session(struct aesni_mb_qp *qp, struct rte_crypto_op *op)
 		if (unlikely(aesni_mb_set_session_parameters(qp->op_fns,
 				sess, op->sym->xform) != 0)) {
 			rte_mempool_put(qp->sess_mp, _sess);
-			rte_mempool_put(qp->sess_mp, _sess_private_data);
+			rte_mempool_put(qp->priv_mp, _sess_private_data);
 			sess = NULL;
 		}
 		op->sym->session = (struct rte_cryptodev_sym_session *)_sess;
@@ -741,9 +741,9 @@ post_process_mb_job(struct aesni_mb_qp *qp, JOB_AES_HMAC *job)
 	/* Free session if a session-less crypto op */
 	if (op->sess_type == RTE_CRYPTO_OP_SESSIONLESS) {
 		memset(sess, 0, sizeof(struct aesni_mb_session));
-		memset(op->sym->session, 0,
-				rte_cryptodev_sym_get_header_session_size());
-		rte_mempool_put(qp->sess_mp, sess);
+		set_sym_session_private_data(op->sym->session,
+			cryptodev_driver_id, NULL);
+		rte_mempool_put(qp->priv_mp, sess);
 		rte_mempool_put(qp->sess_mp, op->sym->session);
 		op->sym->session = NULL;
 	}
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
index ab26e5ae4..63cf82547 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_ops.c
@@ -482,7 +482,7 @@ aesni_mb_pmd_qp_create_processed_ops_ring(struct aesni_mb_qp *qp,
 static int
 aesni_mb_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id, struct rte_mempool *session_pool)
+		int socket_id)
 {
 	struct aesni_mb_qp *qp = NULL;
 	struct aesni_mb_private *internals = dev->data->dev_private;
@@ -520,7 +520,8 @@ aesni_mb_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		goto qp_setup_cleanup;
 	}
 
-	qp->sess_mp = session_pool;
+	qp->sess_mp = qp_conf->sess_pool;
+	qp->priv_mp = qp_conf->priv_sess_pool;
 
 	memset(&qp->stats, 0, sizeof(qp->stats));
 
diff --git a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
index 70e9d18e5..bca326851 100644
--- a/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
+++ b/drivers/crypto/aesni_mb/rte_aesni_mb_pmd_private.h
@@ -137,7 +137,9 @@ struct aesni_mb_qp {
 	struct rte_ring *ingress_queue;
        /**< Ring for placing operations ready for processing */
 	struct rte_mempool *sess_mp;
-	/**< Session Mempool */
+	/**< crypto session Mempool */
+	struct rte_mempool *priv_mp;
+	/**< private Session Mempool */
 	struct rte_cryptodev_stats stats;
 	/**< Queue pair statistics */
 	uint8_t digest_idx;
diff --git a/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c b/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
index 2a3c61c66..19935ee8e 100644
--- a/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
+++ b/drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c
@@ -1416,8 +1416,7 @@ dpaa2_sec_queue_pair_release(struct rte_cryptodev *dev, uint16_t queue_pair_id)
 static int
 dpaa2_sec_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		__rte_unused const struct rte_cryptodev_qp_conf *qp_conf,
-		__rte_unused int socket_id,
-		__rte_unused struct rte_mempool *session_pool)
+		__rte_unused int socket_id)
 {
 	struct dpaa2_sec_dev_private *priv = dev->data->dev_private;
 	struct dpaa2_sec_qp *qp;
diff --git a/drivers/crypto/dpaa_sec/dpaa_sec.c b/drivers/crypto/dpaa_sec/dpaa_sec.c
index f571050b7..5583d939d 100644
--- a/drivers/crypto/dpaa_sec/dpaa_sec.c
+++ b/drivers/crypto/dpaa_sec/dpaa_sec.c
@@ -1576,8 +1576,7 @@ dpaa_sec_queue_pair_release(struct rte_cryptodev *dev,
 static int
 dpaa_sec_queue_pair_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		__rte_unused const struct rte_cryptodev_qp_conf *qp_conf,
-		__rte_unused int socket_id,
-		__rte_unused struct rte_mempool *session_pool)
+		__rte_unused int socket_id)
 {
 	struct dpaa_sec_dev_private *internals;
 	struct dpaa_sec_qp *qp = NULL;
diff --git a/drivers/crypto/null/null_crypto_pmd.c b/drivers/crypto/null/null_crypto_pmd.c
index 6e29a21a6..10096d450 100644
--- a/drivers/crypto/null/null_crypto_pmd.c
+++ b/drivers/crypto/null/null_crypto_pmd.c
@@ -49,16 +49,18 @@ null_crypto_set_session_parameters(
 /** Process crypto operation for mbuf */
 static int
 process_op(const struct null_crypto_qp *qp, struct rte_crypto_op *op,
-		struct null_crypto_session *sess __rte_unused)
+		struct null_crypto_session *sess)
 {
 	/* set status as successful by default */
 	op->status = RTE_CRYPTO_OP_STATUS_SUCCESS;
 
 	/* Free session if a session-less crypto op. */
 	if (op->sess_type == RTE_CRYPTO_OP_SESSIONLESS) {
-		memset(op->sym->session, 0,
-				sizeof(struct null_crypto_session));
-		rte_cryptodev_sym_session_free(op->sym->session);
+		memset(sess, 0, sizeof(*sess));
+		set_sym_session_private_data(op->sym->session,
+			cryptodev_driver_id, NULL);
+		rte_mempool_put(qp->priv_mp, sess);
+		rte_mempool_put(qp->sess_mp, op->sym->session);
 		op->sym->session = NULL;
 	}
 
@@ -87,7 +89,7 @@ get_session(struct null_crypto_qp *qp, struct rte_crypto_op *op)
 		if (rte_mempool_get(qp->sess_mp, (void **)&_sess))
 			return NULL;
 
-		if (rte_mempool_get(qp->sess_mp, (void **)&_sess_private_data))
+		if (rte_mempool_get(qp->priv_mp, (void **)&_sess_private_data))
 			return NULL;
 
 		sess = (struct null_crypto_session *)_sess_private_data;
@@ -95,7 +97,7 @@ get_session(struct null_crypto_qp *qp, struct rte_crypto_op *op)
 		if (unlikely(null_crypto_set_session_parameters(sess,
 				sym_op->xform) != 0)) {
 			rte_mempool_put(qp->sess_mp, _sess);
-			rte_mempool_put(qp->sess_mp, _sess_private_data);
+			rte_mempool_put(qp->priv_mp, _sess_private_data);
 			sess = NULL;
 		}
 		sym_op->session = (struct rte_cryptodev_sym_session *)_sess;
diff --git a/drivers/crypto/null/null_crypto_pmd_ops.c b/drivers/crypto/null/null_crypto_pmd_ops.c
index bb2b6e144..d25be42a6 100644
--- a/drivers/crypto/null/null_crypto_pmd_ops.c
+++ b/drivers/crypto/null/null_crypto_pmd_ops.c
@@ -184,7 +184,7 @@ null_crypto_pmd_qp_create_processed_pkts_ring(struct null_crypto_qp *qp,
 static int
 null_crypto_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id, struct rte_mempool *session_pool)
+		int socket_id)
 {
 	struct null_crypto_private *internals = dev->data->dev_private;
 	struct null_crypto_qp *qp;
@@ -228,7 +228,8 @@ null_crypto_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		goto qp_setup_cleanup;
 	}
 
-	qp->sess_mp = session_pool;
+	qp->sess_mp = qp_conf->sess_pool;
+	qp->priv_mp = qp_conf->priv_sess_pool;
 
 	memset(&qp->qp_stats, 0, sizeof(qp->qp_stats));
 
diff --git a/drivers/crypto/null/null_crypto_pmd_private.h b/drivers/crypto/null/null_crypto_pmd_private.h
index d5905afd8..1853e2153 100644
--- a/drivers/crypto/null/null_crypto_pmd_private.h
+++ b/drivers/crypto/null/null_crypto_pmd_private.h
@@ -30,7 +30,9 @@ struct null_crypto_qp {
 	struct rte_ring *processed_pkts;
 	/**< Ring for placing process packets */
 	struct rte_mempool *sess_mp;
-	/**< Session Mempool */
+	/**< crypto session Mempool */
+	struct rte_mempool *priv_mp;
+	/**< private session Mempool */
 	struct rte_cryptodev_stats qp_stats;
 	/**< Queue pair statistics */
 } __rte_cache_aligned;
diff --git a/drivers/crypto/scheduler/scheduler_pmd_ops.c b/drivers/crypto/scheduler/scheduler_pmd_ops.c
index 778071ca0..e6c431687 100644
--- a/drivers/crypto/scheduler/scheduler_pmd_ops.c
+++ b/drivers/crypto/scheduler/scheduler_pmd_ops.c
@@ -390,8 +390,7 @@ scheduler_pmd_qp_release(struct rte_cryptodev *dev, uint16_t qp_id)
 /** Setup a queue pair */
 static int
 scheduler_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
-	const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
-	struct rte_mempool *session_pool)
+	const struct rte_cryptodev_qp_conf *qp_conf, int socket_id)
 {
 	struct scheduler_ctx *sched_ctx = dev->data->dev_private;
 	struct scheduler_qp_ctx *qp_ctx;
@@ -419,7 +418,7 @@ scheduler_pmd_qp_setup(struct rte_cryptodev *dev, uint16_t qp_id,
 		 * must be big enough for all the drivers used.
 		 */
 		ret = rte_cryptodev_queue_pair_setup(slave_id, qp_id,
-				qp_conf, socket_id, session_pool);
+				qp_conf, socket_id);
 		if (ret < 0)
 			return ret;
 	}
diff --git a/drivers/crypto/virtio/virtio_cryptodev.c b/drivers/crypto/virtio/virtio_cryptodev.c
index 568b5a406..4bae3b865 100644
--- a/drivers/crypto/virtio/virtio_cryptodev.c
+++ b/drivers/crypto/virtio/virtio_cryptodev.c
@@ -36,8 +36,7 @@ static void virtio_crypto_dev_stats_reset(struct rte_cryptodev *dev);
 static int virtio_crypto_qp_setup(struct rte_cryptodev *dev,
 		uint16_t queue_pair_id,
 		const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id,
-		struct rte_mempool *session_pool);
+		int socket_id);
 static int virtio_crypto_qp_release(struct rte_cryptodev *dev,
 		uint16_t queue_pair_id);
 static void virtio_crypto_dev_free_mbufs(struct rte_cryptodev *dev);
@@ -585,8 +584,7 @@ virtio_crypto_dev_stats_reset(struct rte_cryptodev *dev)
 static int
 virtio_crypto_qp_setup(struct rte_cryptodev *dev, uint16_t queue_pair_id,
 		const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id,
-		struct rte_mempool *session_pool __rte_unused)
+		int socket_id)
 {
 	int ret;
 	struct virtqueue *vq;
diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c
index b45b87bde..cce0789bf 100644
--- a/examples/ipsec-secgw/ipsec-secgw.c
+++ b/examples/ipsec-secgw/ipsec-secgw.c
@@ -821,11 +821,15 @@ main_loop(__attribute__((unused)) void *dummy)
 	qconf->inbound.sa_ctx = socket_ctx[socket_id].sa_in;
 	qconf->inbound.cdev_map = cdev_map_in;
 	qconf->inbound.session_pool = socket_ctx[socket_id].session_pool;
+	qconf->inbound.priv_session_pool =
+		socket_ctx[socket_id].priv_session_pool;
 	qconf->outbound.sp4_ctx = socket_ctx[socket_id].sp_ip4_out;
 	qconf->outbound.sp6_ctx = socket_ctx[socket_id].sp_ip6_out;
 	qconf->outbound.sa_ctx = socket_ctx[socket_id].sa_out;
 	qconf->outbound.cdev_map = cdev_map_out;
 	qconf->outbound.session_pool = socket_ctx[socket_id].session_pool;
+	qconf->outbound.priv_session_pool =
+		socket_ctx[socket_id].priv_session_pool;
 
 	if (qconf->nb_rx_queue == 0) {
 		RTE_LOG(INFO, IPSEC, "lcore %u has nothing to do\n", lcore_id);
@@ -972,20 +976,19 @@ print_usage(const char *prgname)
 }
 
 static int32_t
-parse_portmask(const char *portmask)
+parse_portmask(const char *portmask, uint32_t *pmv)
 {
-	char *end = NULL;
+	char *end;
 	unsigned long pm;
 
 	/* parse hexadecimal string */
+	errno = 0;
 	pm = strtoul(portmask, &end, 16);
-	if ((portmask[0] == '\0') || (end == NULL) || (*end != '\0'))
+	if (errno != 0 || *end != '\0' || pm > UINT32_MAX)
 		return -1;
 
-	if ((pm == 0) && errno)
-		return -1;
-
-	return pm;
+	*pmv = pm;
+	return 0;
 }
 
 static int32_t
@@ -1063,6 +1066,7 @@ parse_args(int32_t argc, char **argv)
 	int32_t opt, ret;
 	char **argvopt;
 	int32_t option_index;
+	uint32_t v;
 	char *prgname = argv[0];
 	int32_t f_present = 0;
 
@@ -1073,8 +1077,8 @@ parse_args(int32_t argc, char **argv)
 
 		switch (opt) {
 		case 'p':
-			enabled_port_mask = parse_portmask(optarg);
-			if (enabled_port_mask == 0) {
+			ret = parse_portmask(optarg, &enabled_port_mask);
+			if (ret < 0 || enabled_port_mask == 0) {
 				printf("invalid portmask\n");
 				print_usage(prgname);
 				return -1;
@@ -1085,8 +1089,8 @@ parse_args(int32_t argc, char **argv)
 			promiscuous_on = 1;
 			break;
 		case 'u':
-			unprotected_port_mask = parse_portmask(optarg);
-			if (unprotected_port_mask == 0) {
+			ret = parse_portmask(optarg, &unprotected_port_mask);
+			if (ret < 0) {
 				printf("invalid unprotected portmask\n");
 				print_usage(prgname);
 				return -1;
@@ -1147,15 +1151,16 @@ parse_args(int32_t argc, char **argv)
 					single_sa_idx);
 			break;
 		case CMD_LINE_OPT_CRYPTODEV_MASK_NUM:
-			ret = parse_portmask(optarg);
+			ret = parse_portmask(optarg, &v);
 			if (ret == -1) {
-				printf("Invalid argument[portmask]\n");
+				printf("Invalid argument[%s]\n",
+					CMD_LINE_OPT_CRYPTODEV_MASK);
 				print_usage(prgname);
 				return -1;
 			}
 
 			/* else */
-			enabled_cryptodev_mask = ret;
+			enabled_cryptodev_mask = v;
 			break;
 		default:
 			print_usage(prgname);
@@ -1360,6 +1365,54 @@ check_cryptodev_mask(uint8_t cdev_id)
 
 	return -1;
 }
+static int
+fill_session_pool(int32_t socket_id, uint32_t max_sess_size)
+{
+	char mp_name[RTE_MEMPOOL_NAMESIZE];
+	struct rte_mempool *sess_mp;
+
+	if (!socket_ctx[socket_id].priv_session_pool) {
+
+		snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
+				"priv_sess_mp_%u", socket_id);
+		sess_mp = rte_mempool_create(mp_name,
+					CDEV_MP_NB_OBJS,
+					max_sess_size,
+					CDEV_MP_CACHE_SZ,
+					0, NULL, NULL, NULL,
+					NULL, socket_id,
+					0);
+		if (sess_mp == NULL) {
+			printf("Cannot create pool \"%s\" on socket %d\n",
+					mp_name, socket_id);
+			return -ENOMEM;
+		}
+
+		printf("Allocated pool \"%s\" on socket %d\n",
+				mp_name, socket_id);
+		socket_ctx[socket_id].priv_session_pool = sess_mp;
+	}
+
+	if (!socket_ctx[socket_id].session_pool) {
+
+		snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
+				"sess_mp_%u", socket_id);
+
+		sess_mp = rte_cryptodev_sym_session_pool_create(mp_name,
+			CDEV_MP_NB_OBJS, CDEV_MP_CACHE_SZ, 0, socket_id);
+		if (sess_mp == NULL) {
+			printf("Cannot create pool \"%s\" on socket %d\n",
+					mp_name, socket_id);
+			return -ENOMEM;
+		}
+
+		printf("Allocated pool \"%s\" on socket %d\n",
+				mp_name, socket_id);
+		socket_ctx[socket_id].session_pool = sess_mp;
+	}
+
+	return 0;
+}
 
 static int32_t
 cryptodevs_init(void)
@@ -1392,7 +1445,7 @@ cryptodevs_init(void)
 
 	uint32_t max_sess_sz = 0, sess_sz;
 	for (cdev_id = 0; cdev_id < rte_cryptodev_count(); cdev_id++) {
-		sess_sz = rte_cryptodev_sym_get_private_session_size(cdev_id);
+		sess_sz = rte_cryptodev_sym_private_session_size(cdev_id);
 		if (sess_sz > max_sess_sz)
 			max_sess_sz = sess_sz;
 	}
@@ -1448,38 +1501,23 @@ cryptodevs_init(void)
 				"Device does not support at least %u "
 				"sessions", CDEV_MP_NB_OBJS / 2);
 
-		if (!socket_ctx[dev_conf.socket_id].session_pool) {
-			char mp_name[RTE_MEMPOOL_NAMESIZE];
-			struct rte_mempool *sess_mp;
-
-			snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
-					"sess_mp_%u", dev_conf.socket_id);
-			sess_mp = rte_mempool_create(mp_name,
-					CDEV_MP_NB_OBJS,
-					max_sess_sz,
-					CDEV_MP_CACHE_SZ,
-					0, NULL, NULL, NULL,
-					NULL, dev_conf.socket_id,
-					0);
-			if (sess_mp == NULL)
-				rte_exit(EXIT_FAILURE,
-					"Cannot create session pool on socket %d\n",
-					dev_conf.socket_id);
-			else
-				printf("Allocated session pool on socket %d\n",
-					dev_conf.socket_id);
-			socket_ctx[dev_conf.socket_id].session_pool = sess_mp;
-		}
+		if (fill_session_pool(dev_conf.socket_id, max_sess_sz) != 0)
+			rte_exit(EXIT_FAILURE,
+				"Cannot create session pools on socket %d\n",
+				dev_conf.socket_id);
 
 		if (rte_cryptodev_configure(cdev_id, &dev_conf))
 			rte_panic("Failed to initialize cryptodev %u\n",
 					cdev_id);
 
 		qp_conf.nb_descriptors = CDEV_QUEUE_DESC;
+		qp_conf.sess_pool = socket_ctx[dev_conf.socket_id].session_pool;
+		qp_conf.priv_sess_pool =
+			socket_ctx[dev_conf.socket_id].priv_session_pool;
+
 		for (qp = 0; qp < dev_conf.nb_queue_pairs; qp++)
 			if (rte_cryptodev_queue_pair_setup(cdev_id, qp,
-					&qp_conf, dev_conf.socket_id,
-					socket_ctx[dev_conf.socket_id].session_pool))
+					&qp_conf, dev_conf.socket_id))
 				rte_panic("Failed to setup queue %u for "
 						"cdev_id %u\n",	0, cdev_id);
 
diff --git a/examples/ipsec-secgw/ipsec.h b/examples/ipsec-secgw/ipsec.h
index c998c8076..8ddc5d4f6 100644
--- a/examples/ipsec-secgw/ipsec.h
+++ b/examples/ipsec-secgw/ipsec.h
@@ -144,6 +144,7 @@ struct ipsec_ctx {
 	uint16_t last_qp;
 	struct cdev_qp tbl[MAX_QP_PER_LCORE];
 	struct rte_mempool *session_pool;
+	struct rte_mempool *priv_session_pool;
 	struct rte_mbuf *ol_pkts[MAX_PKT_BURST] __rte_aligned(sizeof(void *));
 	uint16_t ol_pkts_cnt;
 };
@@ -166,6 +167,7 @@ struct socket_ctx {
 	struct rt_ctx *rt_ip6;
 	struct rte_mempool *mbuf_pool;
 	struct rte_mempool *session_pool;
+	struct rte_mempool *priv_session_pool;
 };
 
 struct cnt_blk {
diff --git a/lib/librte_cryptodev/rte_cryptodev.c b/lib/librte_cryptodev/rte_cryptodev.c
index 63ae23f00..e25282445 100644
--- a/lib/librte_cryptodev/rte_cryptodev.c
+++ b/lib/librte_cryptodev/rte_cryptodev.c
@@ -943,8 +943,7 @@ rte_cryptodev_close(uint8_t dev_id)
 
 int
 rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
-		const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
-		struct rte_mempool *session_pool)
+		const struct rte_cryptodev_qp_conf *qp_conf, int socket_id)
 
 {
 	struct rte_cryptodev *dev;
@@ -954,6 +953,12 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
 		return -EINVAL;
 	}
 
+	if (qp_conf == NULL || qp_conf->sess_pool == NULL ||
+			qp_conf->priv_sess_pool == NULL) {
+		CDEV_LOG_ERR("Invalid queue_pair config");
+		return -EINVAL;
+	}
+
 	dev = &rte_crypto_devices[dev_id];
 	if (queue_pair_id >= dev->data->nb_queue_pairs) {
 		CDEV_LOG_ERR("Invalid queue_pair_id=%d", queue_pair_id);
@@ -969,7 +974,7 @@ rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_pair_setup, -ENOTSUP);
 
 	return (*dev->dev_ops->queue_pair_setup)(dev, queue_pair_id, qp_conf,
-			socket_id, session_pool);
+			socket_id);
 }
 
 
@@ -1146,6 +1151,41 @@ rte_cryptodev_pmd_callback_process(struct rte_cryptodev *dev,
 	rte_spinlock_unlock(&rte_cryptodev_cb_lock);
 }
 
+static void
+cryptodev_sym_session_init_elem(__rte_unused struct rte_mempool *pool,
+	void *arg, void *obj, __rte_unused uint32_t idx)
+{
+	struct rte_cryptodev_sym_session *ds;
+	const struct rte_cryptodev_sym_session *ss;
+
+	ds = obj;
+	ss = arg;
+
+	*ds = *ss;
+	memset(ds->sess_data, 0, rte_cryptodev_sym_session_data_size(ds));
+}
+
+struct rte_mempool *
+rte_cryptodev_sym_session_pool_create(const char *name,
+	uint32_t nb_elts, uint32_t cache_size, uint16_t priv_size,
+	int socket_id)
+{
+	struct rte_mempool *mp;
+	uint32_t elt_size;
+	struct rte_cryptodev_sym_session s = {
+		.nb_drivers = nb_drivers,
+		.priv_size = priv_size,
+	};
+
+	elt_size = rte_cryptodev_sym_session_max_size(priv_size);
+	mp = rte_mempool_create(name, nb_elts, elt_size, cache_size, 0,
+		NULL, NULL, cryptodev_sym_session_init_elem, &s,
+		socket_id, 0);
+	if (mp == NULL)
+		CDEV_LOG_ERR("%s(name=%s) failed, rte_errno=%d\n",
+			__func__, name, rte_errno);
+	return mp;
+}
 
 int
 rte_cryptodev_sym_session_init(uint8_t dev_id,
@@ -1163,12 +1203,15 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
 		return -EINVAL;
 
 	index = dev->driver_id;
+	if (index > sess->nb_drivers)
+		return -EINVAL;
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_configure, -ENOTSUP);
 
-	if (sess->sess_private_data[index] == NULL) {
+	if (sess->sess_data[index].refcnt == 0) {
 		ret = dev->dev_ops->sym_session_configure(dev, xforms,
-							sess, mp);
+			sess, mp);
+
 		if (ret < 0) {
 			CDEV_LOG_ERR(
 				"dev_id %d failed to configure session details",
@@ -1177,6 +1220,7 @@ rte_cryptodev_sym_session_init(uint8_t dev_id,
 		}
 	}
 
+	sess->sess_data[index].refcnt++;
 	return 0;
 }
 
@@ -1229,8 +1273,7 @@ rte_cryptodev_sym_session_create(struct rte_mempool *mp)
 	/* Clear device session pointer.
 	 * Include the flag indicating presence of user data
 	 */
-	memset(sess, 0, (sizeof(void *) * nb_drivers) + sizeof(uint8_t));
-
+	memset(sess->sess_data, 0, rte_cryptodev_sym_session_data_size(sess));
 	return sess;
 }
 
@@ -1258,16 +1301,20 @@ rte_cryptodev_sym_session_clear(uint8_t dev_id,
 		struct rte_cryptodev_sym_session *sess)
 {
 	struct rte_cryptodev *dev;
+	uint32_t idx;
 
 	dev = rte_cryptodev_pmd_get_dev(dev_id);
 
-	if (dev == NULL || sess == NULL)
+	if (dev == NULL || sess == NULL || dev->driver_id > sess->nb_drivers)
 		return -EINVAL;
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->sym_session_clear, -ENOTSUP);
 
-	dev->dev_ops->sym_session_clear(dev, sess);
+	idx = dev->driver_id;
+	if (--sess->sess_data[idx].refcnt != 0)
+		return -EBUSY;
 
+	dev->dev_ops->sym_session_clear(dev, sess);
 	return 0;
 }
 
@@ -1285,7 +1332,6 @@ rte_cryptodev_asym_session_clear(uint8_t dev_id,
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->asym_session_clear, -ENOTSUP);
 
 	dev->dev_ops->asym_session_clear(dev, sess);
-
 	return 0;
 }
 
@@ -1293,7 +1339,6 @@ int
 rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
 {
 	uint8_t i;
-	void *sess_priv;
 	struct rte_mempool *sess_mp;
 
 	if (sess == NULL)
@@ -1301,8 +1346,7 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
 
 	/* Check that all device private data has been freed */
 	for (i = 0; i < nb_drivers; i++) {
-		sess_priv = get_sym_session_private_data(sess, i);
-		if (sess_priv != NULL)
+		if (sess->sess_data[i].refcnt != 0)
 			return -EBUSY;
 	}
 
@@ -1313,6 +1357,23 @@ rte_cryptodev_sym_session_free(struct rte_cryptodev_sym_session *sess)
 	return 0;
 }
 
+unsigned int
+rte_cryptodev_sym_session_max_data_size(void)
+{
+	struct rte_cryptodev_sym_session *sess = NULL;
+
+	return (sizeof(sess->sess_data[0]) * nb_drivers);
+}
+
+size_t
+rte_cryptodev_sym_session_max_size(uint16_t priv_size)
+{
+	struct rte_cryptodev_sym_session *sess = NULL;
+
+	return (sizeof(*sess) + priv_size +
+		rte_cryptodev_sym_session_max_data_size());
+}
+
 int __rte_experimental
 rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
 {
@@ -1337,18 +1398,6 @@ rte_cryptodev_asym_session_free(struct rte_cryptodev_asym_session *sess)
 	return 0;
 }
 
-
-unsigned int
-rte_cryptodev_sym_get_header_session_size(void)
-{
-	/*
-	 * Header contains pointers to the private data
-	 * of all registered drivers, and a flag which
-	 * indicates presence of user data
-	 */
-	return ((sizeof(void *) * nb_drivers) + sizeof(uint8_t));
-}
-
 unsigned int __rte_experimental
 rte_cryptodev_asym_get_header_session_size(void)
 {
@@ -1361,11 +1410,9 @@ rte_cryptodev_asym_get_header_session_size(void)
 }
 
 unsigned int
-rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
+rte_cryptodev_sym_private_session_size(uint8_t dev_id)
 {
 	struct rte_cryptodev *dev;
-	unsigned int header_size = sizeof(void *) * nb_drivers;
-	unsigned int priv_sess_size;
 
 	if (!rte_cryptodev_pmd_is_valid_dev(dev_id))
 		return 0;
@@ -1375,18 +1422,7 @@ rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)
 	if (*dev->dev_ops->sym_session_get_size == NULL)
 		return 0;
 
-	priv_sess_size = (*dev->dev_ops->sym_session_get_size)(dev);
-
-	/*
-	 * If size is less than session header size,
-	 * return the latter, as this guarantees that
-	 * sessionless operations will work
-	 */
-	if (priv_sess_size < header_size)
-		return header_size;
-
-	return priv_sess_size;
-
+	return (*dev->dev_ops->sym_session_get_size)(dev);
 }
 
 unsigned int __rte_experimental
@@ -1409,7 +1445,6 @@ rte_cryptodev_asym_get_private_session_size(uint8_t dev_id)
 		return header_size;
 
 	return priv_sess_size;
-
 }
 
 int __rte_experimental
@@ -1418,15 +1453,10 @@ rte_cryptodev_sym_session_set_user_data(
 					void *data,
 					uint16_t size)
 {
-	uint16_t off_set = sizeof(void *) * nb_drivers;
-	uint8_t *user_data_present = (uint8_t *)sess + off_set;
-
-	if (sess == NULL)
+	if (sess == NULL || sess->priv_size < size)
 		return -EINVAL;
 
-	*user_data_present = 1;
-	off_set += sizeof(uint8_t);
-	rte_memcpy((uint8_t *)sess + off_set, data, size);
+	rte_memcpy(sess->sess_data + sess->nb_drivers, data, size);
 	return 0;
 }
 
@@ -1434,14 +1464,10 @@ void * __rte_experimental
 rte_cryptodev_sym_session_get_user_data(
 					struct rte_cryptodev_sym_session *sess)
 {
-	uint16_t off_set = sizeof(void *) * nb_drivers;
-	uint8_t *user_data_present = (uint8_t *)sess + off_set;
-
-	if (sess == NULL || !*user_data_present)
+	if (sess == NULL || sess->priv_size == 0)
 		return NULL;
 
-	off_set += sizeof(uint8_t);
-	return (uint8_t *)sess + off_set;
+	return (sess->sess_data + sess->nb_drivers);
 }
 
 /** Initialise rte_crypto_op mempool element */
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index 4099823f1..d88454f02 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -495,6 +495,14 @@ enum rte_cryptodev_event_type {
 /** Crypto device queue pair configuration structure. */
 struct rte_cryptodev_qp_conf {
 	uint32_t nb_descriptors; /**< Number of descriptors per queue pair */
+	struct rte_mempool *sess_pool;
+	/**< Pointer to crypto sessions mempool,
+	 * used for session-less operations.
+	 */
+	struct rte_mempool *priv_sess_pool;
+	/**< Pointer to device specific sessions mempool,
+	 * used for session-less operations.
+	 */
 };
 
 /**
@@ -680,17 +688,13 @@ rte_cryptodev_close(uint8_t dev_id);
  *				*SOCKET_ID_ANY* if there is no NUMA constraint
  *				for the DMA memory allocated for the receive
  *				queue pair.
- * @param	session_pool	Pointer to device session mempool, used
- *				for session-less operations.
- *
  * @return
  *   - 0: Success, queue pair correctly set up.
  *   - <0: Queue pair configuration failed
  */
 extern int
 rte_cryptodev_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
-		const struct rte_cryptodev_qp_conf *qp_conf, int socket_id,
-		struct rte_mempool *session_pool);
+		const struct rte_cryptodev_qp_conf *qp_conf, int socket_id);
 
 /**
  * Get the number of queue pairs on a specific crypto device
@@ -954,10 +958,43 @@ rte_cryptodev_enqueue_burst(uint8_t dev_id, uint16_t qp_id,
  * has a fixed algo, key, op-type, digest_len etc.
  */
 struct rte_cryptodev_sym_session {
-	__extension__ void *sess_private_data[0];
-	/**< Private symmetric session material */
+	uint64_t userdata;
+	/**< Can be used for external metadata */
+	uint16_t nb_drivers;
+	/**< number of elements in sess_data array */
+	uint16_t priv_size;
+	/**< session private data will be placed after sess_data */
+	__extension__ struct {
+		void *data;
+		uint16_t refcnt;
+	} sess_data[0];
+	/**< Driver specific session material, variable size */
 };
 
+static inline size_t
+rte_cryptodev_sym_session_data_size(const struct rte_cryptodev_sym_session *s)
+{
+	return (sizeof(s->sess_data[0]) * s->nb_drivers);
+}
+
+static inline size_t
+rte_cryptodev_sym_session_size(const struct rte_cryptodev_sym_session *s)
+{
+	return (sizeof(*s) + (s)->priv_size +
+		rte_cryptodev_sym_session_data_size(s));
+}
+
+unsigned int
+rte_cryptodev_sym_session_max_data_size(void);
+
+size_t
+rte_cryptodev_sym_session_max_size(uint16_t priv_size);
+
+struct rte_mempool *
+rte_cryptodev_sym_session_pool_create(const char *name,
+	uint32_t nb_elts, uint32_t cache_size, uint16_t priv_size,
+	int socket_id);
+
 /** Cryptodev asymmetric crypto session */
 struct rte_cryptodev_asym_session {
 	__extension__ void *sess_private_data[0];
@@ -1123,7 +1160,7 @@ rte_cryptodev_asym_get_header_session_size(void);
  *   symmetric session
  */
 unsigned int
-rte_cryptodev_sym_get_private_session_size(uint8_t dev_id);
+rte_cryptodev_sym_private_session_size(uint8_t dev_id);
 
 /**
  * Get the size of the private data for asymmetric session
diff --git a/lib/librte_cryptodev/rte_cryptodev_pmd.h b/lib/librte_cryptodev/rte_cryptodev_pmd.h
index 6ff49d64d..2f98f65d1 100644
--- a/lib/librte_cryptodev/rte_cryptodev_pmd.h
+++ b/lib/librte_cryptodev/rte_cryptodev_pmd.h
@@ -191,13 +191,12 @@ typedef void (*cryptodev_info_get_t)(struct rte_cryptodev *dev,
  * @param	qp_id		Queue Pair Index
  * @param	qp_conf		Queue configuration structure
  * @param	socket_id	Socket Index
- * @param	session_pool	Pointer to device session mempool
  *
  * @return	Returns 0 on success.
  */
 typedef int (*cryptodev_queue_pair_setup_t)(struct rte_cryptodev *dev,
 		uint16_t qp_id,	const struct rte_cryptodev_qp_conf *qp_conf,
-		int socket_id, struct rte_mempool *session_pool);
+		int socket_id);
 
 /**
  * Release memory resources allocated by given queue pair.
@@ -478,20 +477,25 @@ RTE_INIT(init_ ##driver_id)\
 
 static inline void *
 get_sym_session_private_data(const struct rte_cryptodev_sym_session *sess,
-		uint8_t driver_id) {
-	return sess->sess_private_data[driver_id];
+		uint8_t driver_id)
+{
+	if (driver_id < sess->nb_drivers)
+		return sess->sess_data[driver_id].data;
+	return NULL;
 }
 
 static inline void
 set_sym_session_private_data(struct rte_cryptodev_sym_session *sess,
 		uint8_t driver_id, void *private_data)
 {
-	sess->sess_private_data[driver_id] = private_data;
+	if (driver_id < sess->nb_drivers)
+		sess->sess_data[driver_id].data = private_data;
 }
 
 static inline void *
 get_asym_session_private_data(const struct rte_cryptodev_asym_session *sess,
-		uint8_t driver_id) {
+		uint8_t driver_id)
+{
 	return sess->sess_private_data[driver_id];
 }
 
-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-24 15:00  0%           ` Alejandro Lucero
@ 2018-08-24 15:10  0%             ` Yongseok Koh
  0 siblings, 0 replies; 200+ results
From: Yongseok Koh @ 2018-08-24 15:10 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dpdk stable, dev


On Aug 24, 2018, at 8:00 AM, Alejandro Lucero <alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>> wrote:



On Fri, Aug 24, 2018 at 4:31 PM, Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>> wrote:

> On Aug 24, 2018, at 1:51 AM, Alejandro Lucero <alejandro.lucero@netronome.com<mailto:alejandro.lucero@netronome.com>> wrote:
>
>
>
> On Thu, Aug 23, 2018 at 6:18 PM, Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>> wrote:
>
> > On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>> wrote:
> >
> > On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
> >> Hi Yonngseok,
> >>
> >> There is a patchset aimed at 17.11.x:
> >>
> >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&data=02%7C01%7Cyskoh%40mellanox.com%7Ce378358ad14e4cd9fa8008d609d24f74%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636707196241797155&sdata=wMVAaMMvZJ5UYUPkcwAhvko%2F8MorOTtz6S8VCqeVgZw%3D&reserved=0>
> >>
> >> It was not accepted for master because the memory code has changed a lot
> >> since 17.11, and I'm working on another patchset for adjusting to the those
> >> changes.
> >>
> >> I wonder if there is any issue with adding this patchset to stable 17.11.4.
> >> Note that this makes unlikely a known limitation with emulated IOMMU inside
> >> VMs.
> >
> > This patchset seems quite large for stable release and need to be well verified
> > before GA. In -rc1 stage, we don't usually take such a large patchset as people
> > have already started verification. And we don't usually release -rc2. If you're
> > trying to solve a very critical issue with this patchset, I have to release -rc2
> > and ask people to verify again. How critical is your issue?
> >
> > For the patchset,
> > - "mem: add function for checking memsegs IOVAs addresses"
> >  This is adding a new API, so I don't expect any API/ABI breakage, but want to
> >  double-confirm with Thomas. Thomas?
> >
> > - "bus/pci: use IOVAs check when setting IOVA mode"
> >  All the patches got ack except for this one but from looking at the threads in
> >  dev mailing list, it looks okay. I have a question though.
> >
> >> @@ -640,13 +643,17 @@
> >> {
> >>        struct rte_pci_device *dev = NULL;
> >>        struct rte_pci_driver *drv = NULL;
> >> +       int iommu_dma_mask_check_done = 0;
> >>
> >>        FOREACH_DRIVER_ON_PCIBUS(drv) {
> >>                FOREACH_DEVICE_ON_PCIBUS(dev) {
> >>                        if (!rte_pci_match(drv, dev))
> >>                                continue;
> >> -                       if (!pci_one_device_iommu_support_va(dev))
> >> -                               return false;
> >> +                       if (!iommu_dma_mask_check_done) {
> >> +                               if (pci_one_device_iommu_support_va(dev) < 0)
> >
> > pci_one_device_iommu_support_va() returns true/false(1/0), then why do you
> > expect to see a negative return value in case of failure?
>
>
> Emulated IOMMU has a 39 bits addressing limitation in some QEMU versions. With pci_one_device_iommu_support_va this is checked out, and if it does exist, IOMMU with VA is not supported.
>
> This patch avoids such coarse check using dma mask code added for allowing IOMMU with VA if allocated memory is below the addressing limitation. This is going to help for using IOMMU with VA is most of the systems out there, and even with systems with more than 512GB as long as the DPDK allocated memory is below that limit.

I was asking about this change:

from,
> >> -                       if (!pci_one_device_iommu_support_va(dev))

to,
> >> +                               if (pci_one_device_iommu_support_va(dev) < 0)


The original code checks zero but you changed it to check negative value.
But it looks pci_one_device_iommu_support_va() doesn't return negative value, right?

I thought this is buggy, please let me know.


Yes, you are right. I remember I initially changed pci_one_device_iommu_support_va for returning an int instead of boolean, but I did leave it as boolean at the end. It seems I forgot to modify the call. I will send another version.

Is it OK if I send it just to stable@dpdk.org<mailto:stable@dpdk.org> tagging fix for 17.11?

Yes, it is.

Thanks
Yongseok


Thanks

Thanks,
Yongseok

>
> Alejandro,
>
> As I will release -rc2, I can integrate your patchset but this should be
> addressed. Please let me know.
>
> Thanks,
> Yongseok
>
> >> +                                       return false;
> >> +                               iommu_dma_mask_check_done  = 1;
> >> +                       }
> >>                }
> >>        }
> >>        return true;
> >>
> >>
> >>
> >> Thanks
> >>
> >> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com<mailto:yskoh@mellanox.com>> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Here is a list of patches targeted for LTS release 17.11.4. Please help
> >>> review
> >>> and test. The planned date for the final release is August 23. Before that,
> >>> please shout if anyone has objections with these patches being applied.
> >>>
> >>> Also for the companies committed to running regression tests, please run
> >>> the
> >>> tests and report any issue before the release date.
> >>>
> >>> A release candidate tarball can be found at:
> >>>
> >>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=AMgyJMFIs512o5zfZ4aNSy1Ptp%2BhEIMUCVZ6HaL2F40%3D&amp;reserved=0<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%2F%3Fid%3Dv17.11.4-rc1&data=02%7C01%7Cyskoh%40mellanox.com%7Ce378358ad14e4cd9fa8008d609d24f74%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636707196241797155&sdata=HpDJXL3fMREfoXAFc3B9Mdorrdn5juINJNLXp1wdHmM%3D&reserved=0>
> >>>
> >>> These patches are located at branch 17.11 of dpdk-stable repo:
> >>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=hvOhEk502vVzboCbRbCZXqJXcsiI3DTtgQypQJi0Aro%3D&amp;reserved=0<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&data=02%7C01%7Cyskoh%40mellanox.com%7Ce378358ad14e4cd9fa8008d609d24f74%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636707196241797155&sdata=AAzBustIYRpjP67zVmFECFuAL3bcOrLEjoN1npVdY3A%3D&reserved=0>
> >>>
> >>> Thanks,
> >>> Yongseok
> >>>
> >>> ---
> >>> Adrien Mazarguil (2):
> >>>      maintainers: update for Mellanox PMDs
> >>>      net/mlx4: fix minor resource leak during init
> >>>
> >>> Ajit Khaparde (7):
> >>>      net/bnxt: fix HW Tx checksum offload check
> >>>      net/bnxt: fix set MTU
> >>>      net/bnxt: fix Rx ring count limitation
> >>>      net/bnxt: fix memory leaks in NVM commands
> >>>      net/bnxt: fix lock release on NVM write failure
> >>>      net/bnxt: check access denied for HWRM commands
> >>>      net/bnxt: fix RETA size
> >>>
> >>> Alejandro Lucero (1):
> >>>      net/nfp: fix field initialization in Tx descriptor
> >>>
> >>> Alok Makhariya (1):
> >>>      bus/dpaa: fix phandle support for Linux 4.16
> >>>
> >>> Anatoly Burakov (8):
> >>>      eal/linux: fix invalid syntax in interrupts
> >>>      eal/linux: fix uninitialized value
> >>>      test: fix EAL flags autotest on FreeBSD
> >>>      test: fix result printing
> >>>      test: fix code on report
> >>>      test: make autotest runner python 2/3 compliant
> >>>      test: print autotest categories
> >>>      test: improve filtering
> >>>
> >>> Andrew Rybchenko (2):
> >>>      net/sfc: cut non VLAN ID bits from TCI
> >>>      net/sfc: fix assert in set multicast address list
> >>>
> >>> Andy Green (1):
> >>>      ring: fix sign conversion warning
> >>>
> >>> Beilei Xing (3):
> >>>      net/i40e: fix shifts of 32-bit value
> >>>      net/i40e: fix packet type parsing with DDP
> >>>      net/i40e: fix setting TPID with AQ command
> >>>
> >>> Bruce Richardson (2):
> >>>      examples/exception_path: fix out-of-bounds read
> >>>      mk: fix permissions when using make install
> >>>
> >>> Chas Williams (2):
> >>>      net/bonding: always update bonding link status
> >>>      net/bonding: do not clear active slave count
> >>>
> >>> Dan Gora (1):
> >>>      kni: fix crash with null name
> >>>
> >>> Daria Kolistratova (1):
> >>>      net/ena: fix SIGFPE with 0 Rx queue
> >>>
> >>> Dariusz Stojaczyk (1):
> >>>      eal: fix return codes on thread naming failure
> >>>
> >>> David Marchand (1):
> >>>      net/bnxt: add missing ids in xstats
> >>>
> >>> Drocula Lambda (1):
> >>>      kni: fix build on RHEL 7.5
> >>>
> >>> Ferruh Yigit (2):
> >>>      kni: fix build with gcc 8.1
> >>>      net/thunderx: fix build with gcc optimization on
> >>>
> >>> Gavin Hu (3):
> >>>      mk: fix cross build
> >>>      net/dpaa2: remove loop for unused pool entries
> >>>      maintainers: claim maintainership for ARM v7 and v8
> >>>
> >>> Haiyue Wang (1):
> >>>      net/i40e: workaround performance degradation
> >>>
> >>> Harry van Haaren (1):
> >>>      event: fix ring init failure handling
> >>>
> >>> Hemant Agrawal (2):
> >>>      test/crypto: fix device id when stopping port
> >>>      bus/dpaa: fix buffer offset setting in FMAN
> >>>
> >>> Hyong Youb Kim (1):
> >>>      net/enic: do not overwrite admin Tx queue limit
> >>>
> >>> Ido Goshen (1):
> >>>      net/pcap: fix multiple queues
> >>>
> >>> Jananee Parthasarathy (1):
> >>>      mk: update targets for classified tests
> >>>
> >>> Jay Ding (1):
> >>>      net/bnxt: check for invalid vNIC id
> >>>
> >>> Jerin Jacob (2):
> >>>      ethdev: fix queue statistics mapping documentation
> >>>      eal: fix bitmap documentation
> >>>
> >>> Kiran Kumar (2):
> >>>      net/bonding: fix MAC address reset
> >>>      net/thunderx: avoid sq door bell write on zero packet
> >>>
> >>> Konstantin Ananyev (3):
> >>>      examples/ipsec-secgw: fix IPv4 checksum at Tx
> >>>      examples/ipsec-secgw: fix bypass rule processing
> >>>      app/testpmd: fix DCB config
> >>>
> >>> Maxime Coquelin (4):
> >>>      vhost: improve dirty pages logging performance
> >>>      vhost: fix missing increment of log cache count
> >>>      vhost: flush IOTLB cache on new mem table handling
> >>>      vhost: retranslate vring addr when memory table changes
> >>>
> >>> Moti Haimovsky (2):
> >>>      net/mlx5: fix build with old kernels
> >>>      net/mlx4: check RSS queues number limitation
> >>>
> >>> Nelio Laranjeiro (1):
> >>>      net/mlx5: fix TCI mask filter
> >>>
> >>> Nikhil Rao (5):
> >>>      eventdev: fix port in Rx adapter internal function
> >>>      eventdev: fix missing update to Rx adaper WRR position
> >>>      eventdev: add event buffer flush in Rx adapter
> >>>      eventdev: fix internal port logic in Rx adapter
> >>>      eventdev: fix Rx SW adapter stop
> >>>
> >>> Nithin Dabilpuram (1):
> >>>      app/testpmd: fix buffer leak in TM command
> >>>
> >>> Ophir Munk (1):
> >>>      net/mlx5: fix secondary process resource leakage
> >>>
> >>> Pablo de Lara (7):
> >>>      examples/l2fwd-crypto: fix digest with AEAD algo
> >>>      examples/l2fwd-crypto: check return value on IV size check
> >>>      examples/l2fwd-crypto: skip device not supporting operation
> >>>      test/hash: fix multiwriter with non consecutive cores
> >>>      test/hash: fix potential memory leak
> >>>      app/crypto-perf: fix auth IV offset
> >>>      hash: fix doxygen of return values
> >>>
> >>> Pavan Nikhilesh (2):
> >>>      event/octeontx: remove unnecessary port start and stop
> >>>      net/octeontx: fix stop clearing Rx/Tx functions
> >>>
> >>> Qi Zhang (1):
> >>>      vfio: fix PCI address comparison
> >>>
> >>> Radu Nicolau (3):
> >>>      security: fix crash on destroy null session
> >>>      test: fix uninitialized port configuration
> >>>      net/bonding: fix race condition
> >>>
> >>> Rafal Kozik (4):
> >>>      net/ena: fix GENMASK_ULL macro
> >>>      net/ena: set link speed as none
> >>>      net/ena: check pointer before memset
> >>>      net/ena: change memory type
> >>>
> >>> Rahul Lakkireddy (1):
> >>>      net/cxgbe: fix init failure due to new flash parts
> >>>
> >>> Rami Rosen (2):
> >>>      examples/l3fwd: remove useless include
> >>>      ethdev: fix a doxygen comment for port allocation
> >>>
> >>> Rasesh Mody (9):
> >>>      net/qede: fix VF MTU update
> >>>      net/qede: remove primary MAC removal
> >>>      net/qede: fix for devargs
> >>>      net/qede: fix default extended VLAN offload config
> >>>      doc: update qede management firmware guide
> >>>      net/qede/base: fix GRC attention callback
> >>>      net/bnx2x: fix FW command timeout during stop
> >>>      net/bnx2x: fix poll link status
> >>>      net/qede/base: fix to clear HW indication
> >>>
> >>> Remy Horton (4):
> >>>      bitrate: add sanity check on parameters
> >>>      metrics: add check for invalid key
> >>>      metrics: do not fail silently when uninitialised
> >>>      metrics: disallow null as metric name
> >>>
> >>> Reshma Pattan (2):
> >>>      test/flow_classify: fix return types
> >>>      mk: remove unnecessary test rules
> >>>
> >>> Rosen Xu (1):
> >>>      examples/flow_filtering: add flow director config for i40e
> >>>
> >>> Shahaf Shuler (1):
> >>>      net/mlx5: fix compilation for rdma-core v19
> >>>
> >>> Shahed Shaikh (7):
> >>>      net/qede: fix link change event notification
> >>>      net/qede: fix legacy interrupt mode
> >>>      net/qede: fix incorrect link status update
> >>>      net/qede: fix unicast MAC address handling in VF
> >>>      net/qede: fix interrupt handler unregister
> >>>      net/qede: fix MAC address removal failure message
> >>>      net/qede: fix ntuple filter configuration
> >>>
> >>> Shreyansh Jain (1):
> >>>      doc: fix bonding command in testpmd
> >>>
> >>> Somnath Kotur (3):
> >>>      net/bnxt: fix to move a flow to a different queue
> >>>      net/bnxt: use correct flags during VLAN configuration
> >>>      net/bnxt: fix filter freeing
> >>>
> >>> Thomas Monjalon (1):
> >>>      bus/dpaa: fix build
> >>>
> >>> Tomasz Duszynski (1):
> >>>      net/mvpp2: check pointer before using it
> >>>
> >>> Wei Zhao (7):
> >>>      net/ixgbe: add support for VLAN in IP mode FDIR
> >>>      net/ixgbe: fix tunnel id format error for FDIR
> >>>      net/ixgbe: fix tunnel type set error for FDIR
> >>>      net/ixgbe: fix mask bits register set error for FDIR
> >>>      app/testpmd: fix VLAN TCI mask set error for FDIR
> >>>      net/i40e: fix check of flow director programming status
> >>>      net/i40e: revert fix of flow director check
> >>>
> >>> Xiaoyun Li (1):
> >>>      net/i40e: fix link speed
> >>>
> >>> Xueming Li (1):
> >>>      net/mlx5: fix crash in device probe
> >>>
> >>> Yipeng Wang (3):
> >>>      hash: fix multiwriter lock memory allocation
> >>>      hash: fix a multi-writer race condition
> >>>      hash: fix key slot size accuracy
> >>>
> >>> Yongseok Koh (7):
> >>>      net/mlx5: fix Rx buffer replenishment threshold
> >>>      net/mlx5: add missing sanity checks for Tx completion queue
> >>>      net/mlx5: fix assert for Tx completion queue count
> >>>      net/mlx5: fix queue rollback when starting device
> >>>      net/mlx5: fix error number handling
> >>>      net/mlx5: preserve promiscuous flag for flow isolation mode
> >>>      net/mlx5: preserve allmulticast flag for flow isolation mode
> >>>
>
>



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-24 14:31  0%         ` Yongseok Koh
@ 2018-08-24 15:00  0%           ` Alejandro Lucero
  2018-08-24 15:10  0%             ` Yongseok Koh
  0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-08-24 15:00 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dpdk stable, dev

On Fri, Aug 24, 2018 at 4:31 PM, Yongseok Koh <yskoh@mellanox.com> wrote:

>
> > On Aug 24, 2018, at 1:51 AM, Alejandro Lucero <
> alejandro.lucero@netronome.com> wrote:
> >
> >
> >
> > On Thu, Aug 23, 2018 at 6:18 PM, Yongseok Koh <yskoh@mellanox.com>
> wrote:
> >
> > > On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > >
> > > On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
> > >> Hi Yonngseok,
> > >>
> > >> There is a patchset aimed at 17.11.x:
> > >>
> > >> https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;
> data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%
> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=
> 8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0
> > >>
> > >> It was not accepted for master because the memory code has changed a
> lot
> > >> since 17.11, and I'm working on another patchset for adjusting to the
> those
> > >> changes.
> > >>
> > >> I wonder if there is any issue with adding this patchset to stable
> 17.11.4.
> > >> Note that this makes unlikely a known limitation with emulated IOMMU
> inside
> > >> VMs.
> > >
> > > This patchset seems quite large for stable release and need to be well
> verified
> > > before GA. In -rc1 stage, we don't usually take such a large patchset
> as people
> > > have already started verification. And we don't usually release -rc2.
> If you're
> > > trying to solve a very critical issue with this patchset, I have to
> release -rc2
> > > and ask people to verify again. How critical is your issue?
> > >
> > > For the patchset,
> > > - "mem: add function for checking memsegs IOVAs addresses"
> > >  This is adding a new API, so I don't expect any API/ABI breakage, but
> want to
> > >  double-confirm with Thomas. Thomas?
> > >
> > > - "bus/pci: use IOVAs check when setting IOVA mode"
> > >  All the patches got ack except for this one but from looking at the
> threads in
> > >  dev mailing list, it looks okay. I have a question though.
> > >
> > >> @@ -640,13 +643,17 @@
> > >> {
> > >>        struct rte_pci_device *dev = NULL;
> > >>        struct rte_pci_driver *drv = NULL;
> > >> +       int iommu_dma_mask_check_done = 0;
> > >>
> > >>        FOREACH_DRIVER_ON_PCIBUS(drv) {
> > >>                FOREACH_DEVICE_ON_PCIBUS(dev) {
> > >>                        if (!rte_pci_match(drv, dev))
> > >>                                continue;
> > >> -                       if (!pci_one_device_iommu_support_va(dev))
> > >> -                               return false;
> > >> +                       if (!iommu_dma_mask_check_done) {
> > >> +                               if (pci_one_device_iommu_support_va(dev)
> < 0)
> > >
> > > pci_one_device_iommu_support_va() returns true/false(1/0), then why
> do you
> > > expect to see a negative return value in case of failure?
> >
> >
> > Emulated IOMMU has a 39 bits addressing limitation in some QEMU
> versions. With pci_one_device_iommu_support_va this is checked out, and
> if it does exist, IOMMU with VA is not supported.
> >
> > This patch avoids such coarse check using dma mask code added for
> allowing IOMMU with VA if allocated memory is below the addressing
> limitation. This is going to help for using IOMMU with VA is most of the
> systems out there, and even with systems with more than 512GB as long as
> the DPDK allocated memory is below that limit.
>
> I was asking about this change:
>
> from,
> > >> -                       if (!pci_one_device_iommu_support_va(dev))
>
> to,
> > >> +                               if (pci_one_device_iommu_support_va(dev)
> < 0)
>
>
> The original code checks zero but you changed it to check negative value.
> But it looks pci_one_device_iommu_support_va() doesn't return negative
> value, right?
>
> I thought this is buggy, please let me know.
>
>
Yes, you are right. I remember I initially changed
pci_one_device_iommu_support_va for returning an int instead of boolean,
but I did leave it as boolean at the end. It seems I forgot to modify the
call. I will send another version.

Is it OK if I send it just to stable@dpdk.org tagging fix for 17.11?

Thanks

Thanks,
> Yongseok
>
> >
> > Alejandro,
> >
> > As I will release -rc2, I can integrate your patchset but this should be
> > addressed. Please let me know.
> >
> > Thanks,
> > Yongseok
> >
> > >> +                                       return false;
> > >> +                               iommu_dma_mask_check_done  = 1;
> > >> +                       }
> > >>                }
> > >>        }
> > >>        return true;
> > >>
> > >>
> > >>
> > >> Thanks
> > >>
> > >> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com>
> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> Here is a list of patches targeted for LTS release 17.11.4. Please
> help
> > >>> review
> > >>> and test. The planned date for the final release is August 23.
> Before that,
> > >>> please shout if anyone has objections with these patches being
> applied.
> > >>>
> > >>> Also for the companies committed to running regression tests, please
> run
> > >>> the
> > >>> tests and report any issue before the release date.
> > >>>
> > >>> A release candidate tarball can be found at:
> > >>>
> > >>>    https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%
> 2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%
> 7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f
> 461b%7C0%7C0%7C636705803846548967&amp;sdata=AMgyJMFIs512o5zfZ4aNSy1Ptp%
> 2BhEIMUCVZ6HaL2F40%3D&amp;reserved=0
> > >>>
> > >>> These patches are located at branch 17.11 of dpdk-stable repo:
> > >>>    https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%
> 40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%
> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=
> hvOhEk502vVzboCbRbCZXqJXcsiI3DTtgQypQJi0Aro%3D&amp;reserved=0
> > >>>
> > >>> Thanks,
> > >>> Yongseok
> > >>>
> > >>> ---
> > >>> Adrien Mazarguil (2):
> > >>>      maintainers: update for Mellanox PMDs
> > >>>      net/mlx4: fix minor resource leak during init
> > >>>
> > >>> Ajit Khaparde (7):
> > >>>      net/bnxt: fix HW Tx checksum offload check
> > >>>      net/bnxt: fix set MTU
> > >>>      net/bnxt: fix Rx ring count limitation
> > >>>      net/bnxt: fix memory leaks in NVM commands
> > >>>      net/bnxt: fix lock release on NVM write failure
> > >>>      net/bnxt: check access denied for HWRM commands
> > >>>      net/bnxt: fix RETA size
> > >>>
> > >>> Alejandro Lucero (1):
> > >>>      net/nfp: fix field initialization in Tx descriptor
> > >>>
> > >>> Alok Makhariya (1):
> > >>>      bus/dpaa: fix phandle support for Linux 4.16
> > >>>
> > >>> Anatoly Burakov (8):
> > >>>      eal/linux: fix invalid syntax in interrupts
> > >>>      eal/linux: fix uninitialized value
> > >>>      test: fix EAL flags autotest on FreeBSD
> > >>>      test: fix result printing
> > >>>      test: fix code on report
> > >>>      test: make autotest runner python 2/3 compliant
> > >>>      test: print autotest categories
> > >>>      test: improve filtering
> > >>>
> > >>> Andrew Rybchenko (2):
> > >>>      net/sfc: cut non VLAN ID bits from TCI
> > >>>      net/sfc: fix assert in set multicast address list
> > >>>
> > >>> Andy Green (1):
> > >>>      ring: fix sign conversion warning
> > >>>
> > >>> Beilei Xing (3):
> > >>>      net/i40e: fix shifts of 32-bit value
> > >>>      net/i40e: fix packet type parsing with DDP
> > >>>      net/i40e: fix setting TPID with AQ command
> > >>>
> > >>> Bruce Richardson (2):
> > >>>      examples/exception_path: fix out-of-bounds read
> > >>>      mk: fix permissions when using make install
> > >>>
> > >>> Chas Williams (2):
> > >>>      net/bonding: always update bonding link status
> > >>>      net/bonding: do not clear active slave count
> > >>>
> > >>> Dan Gora (1):
> > >>>      kni: fix crash with null name
> > >>>
> > >>> Daria Kolistratova (1):
> > >>>      net/ena: fix SIGFPE with 0 Rx queue
> > >>>
> > >>> Dariusz Stojaczyk (1):
> > >>>      eal: fix return codes on thread naming failure
> > >>>
> > >>> David Marchand (1):
> > >>>      net/bnxt: add missing ids in xstats
> > >>>
> > >>> Drocula Lambda (1):
> > >>>      kni: fix build on RHEL 7.5
> > >>>
> > >>> Ferruh Yigit (2):
> > >>>      kni: fix build with gcc 8.1
> > >>>      net/thunderx: fix build with gcc optimization on
> > >>>
> > >>> Gavin Hu (3):
> > >>>      mk: fix cross build
> > >>>      net/dpaa2: remove loop for unused pool entries
> > >>>      maintainers: claim maintainership for ARM v7 and v8
> > >>>
> > >>> Haiyue Wang (1):
> > >>>      net/i40e: workaround performance degradation
> > >>>
> > >>> Harry van Haaren (1):
> > >>>      event: fix ring init failure handling
> > >>>
> > >>> Hemant Agrawal (2):
> > >>>      test/crypto: fix device id when stopping port
> > >>>      bus/dpaa: fix buffer offset setting in FMAN
> > >>>
> > >>> Hyong Youb Kim (1):
> > >>>      net/enic: do not overwrite admin Tx queue limit
> > >>>
> > >>> Ido Goshen (1):
> > >>>      net/pcap: fix multiple queues
> > >>>
> > >>> Jananee Parthasarathy (1):
> > >>>      mk: update targets for classified tests
> > >>>
> > >>> Jay Ding (1):
> > >>>      net/bnxt: check for invalid vNIC id
> > >>>
> > >>> Jerin Jacob (2):
> > >>>      ethdev: fix queue statistics mapping documentation
> > >>>      eal: fix bitmap documentation
> > >>>
> > >>> Kiran Kumar (2):
> > >>>      net/bonding: fix MAC address reset
> > >>>      net/thunderx: avoid sq door bell write on zero packet
> > >>>
> > >>> Konstantin Ananyev (3):
> > >>>      examples/ipsec-secgw: fix IPv4 checksum at Tx
> > >>>      examples/ipsec-secgw: fix bypass rule processing
> > >>>      app/testpmd: fix DCB config
> > >>>
> > >>> Maxime Coquelin (4):
> > >>>      vhost: improve dirty pages logging performance
> > >>>      vhost: fix missing increment of log cache count
> > >>>      vhost: flush IOTLB cache on new mem table handling
> > >>>      vhost: retranslate vring addr when memory table changes
> > >>>
> > >>> Moti Haimovsky (2):
> > >>>      net/mlx5: fix build with old kernels
> > >>>      net/mlx4: check RSS queues number limitation
> > >>>
> > >>> Nelio Laranjeiro (1):
> > >>>      net/mlx5: fix TCI mask filter
> > >>>
> > >>> Nikhil Rao (5):
> > >>>      eventdev: fix port in Rx adapter internal function
> > >>>      eventdev: fix missing update to Rx adaper WRR position
> > >>>      eventdev: add event buffer flush in Rx adapter
> > >>>      eventdev: fix internal port logic in Rx adapter
> > >>>      eventdev: fix Rx SW adapter stop
> > >>>
> > >>> Nithin Dabilpuram (1):
> > >>>      app/testpmd: fix buffer leak in TM command
> > >>>
> > >>> Ophir Munk (1):
> > >>>      net/mlx5: fix secondary process resource leakage
> > >>>
> > >>> Pablo de Lara (7):
> > >>>      examples/l2fwd-crypto: fix digest with AEAD algo
> > >>>      examples/l2fwd-crypto: check return value on IV size check
> > >>>      examples/l2fwd-crypto: skip device not supporting operation
> > >>>      test/hash: fix multiwriter with non consecutive cores
> > >>>      test/hash: fix potential memory leak
> > >>>      app/crypto-perf: fix auth IV offset
> > >>>      hash: fix doxygen of return values
> > >>>
> > >>> Pavan Nikhilesh (2):
> > >>>      event/octeontx: remove unnecessary port start and stop
> > >>>      net/octeontx: fix stop clearing Rx/Tx functions
> > >>>
> > >>> Qi Zhang (1):
> > >>>      vfio: fix PCI address comparison
> > >>>
> > >>> Radu Nicolau (3):
> > >>>      security: fix crash on destroy null session
> > >>>      test: fix uninitialized port configuration
> > >>>      net/bonding: fix race condition
> > >>>
> > >>> Rafal Kozik (4):
> > >>>      net/ena: fix GENMASK_ULL macro
> > >>>      net/ena: set link speed as none
> > >>>      net/ena: check pointer before memset
> > >>>      net/ena: change memory type
> > >>>
> > >>> Rahul Lakkireddy (1):
> > >>>      net/cxgbe: fix init failure due to new flash parts
> > >>>
> > >>> Rami Rosen (2):
> > >>>      examples/l3fwd: remove useless include
> > >>>      ethdev: fix a doxygen comment for port allocation
> > >>>
> > >>> Rasesh Mody (9):
> > >>>      net/qede: fix VF MTU update
> > >>>      net/qede: remove primary MAC removal
> > >>>      net/qede: fix for devargs
> > >>>      net/qede: fix default extended VLAN offload config
> > >>>      doc: update qede management firmware guide
> > >>>      net/qede/base: fix GRC attention callback
> > >>>      net/bnx2x: fix FW command timeout during stop
> > >>>      net/bnx2x: fix poll link status
> > >>>      net/qede/base: fix to clear HW indication
> > >>>
> > >>> Remy Horton (4):
> > >>>      bitrate: add sanity check on parameters
> > >>>      metrics: add check for invalid key
> > >>>      metrics: do not fail silently when uninitialised
> > >>>      metrics: disallow null as metric name
> > >>>
> > >>> Reshma Pattan (2):
> > >>>      test/flow_classify: fix return types
> > >>>      mk: remove unnecessary test rules
> > >>>
> > >>> Rosen Xu (1):
> > >>>      examples/flow_filtering: add flow director config for i40e
> > >>>
> > >>> Shahaf Shuler (1):
> > >>>      net/mlx5: fix compilation for rdma-core v19
> > >>>
> > >>> Shahed Shaikh (7):
> > >>>      net/qede: fix link change event notification
> > >>>      net/qede: fix legacy interrupt mode
> > >>>      net/qede: fix incorrect link status update
> > >>>      net/qede: fix unicast MAC address handling in VF
> > >>>      net/qede: fix interrupt handler unregister
> > >>>      net/qede: fix MAC address removal failure message
> > >>>      net/qede: fix ntuple filter configuration
> > >>>
> > >>> Shreyansh Jain (1):
> > >>>      doc: fix bonding command in testpmd
> > >>>
> > >>> Somnath Kotur (3):
> > >>>      net/bnxt: fix to move a flow to a different queue
> > >>>      net/bnxt: use correct flags during VLAN configuration
> > >>>      net/bnxt: fix filter freeing
> > >>>
> > >>> Thomas Monjalon (1):
> > >>>      bus/dpaa: fix build
> > >>>
> > >>> Tomasz Duszynski (1):
> > >>>      net/mvpp2: check pointer before using it
> > >>>
> > >>> Wei Zhao (7):
> > >>>      net/ixgbe: add support for VLAN in IP mode FDIR
> > >>>      net/ixgbe: fix tunnel id format error for FDIR
> > >>>      net/ixgbe: fix tunnel type set error for FDIR
> > >>>      net/ixgbe: fix mask bits register set error for FDIR
> > >>>      app/testpmd: fix VLAN TCI mask set error for FDIR
> > >>>      net/i40e: fix check of flow director programming status
> > >>>      net/i40e: revert fix of flow director check
> > >>>
> > >>> Xiaoyun Li (1):
> > >>>      net/i40e: fix link speed
> > >>>
> > >>> Xueming Li (1):
> > >>>      net/mlx5: fix crash in device probe
> > >>>
> > >>> Yipeng Wang (3):
> > >>>      hash: fix multiwriter lock memory allocation
> > >>>      hash: fix a multi-writer race condition
> > >>>      hash: fix key slot size accuracy
> > >>>
> > >>> Yongseok Koh (7):
> > >>>      net/mlx5: fix Rx buffer replenishment threshold
> > >>>      net/mlx5: add missing sanity checks for Tx completion queue
> > >>>      net/mlx5: fix assert for Tx completion queue count
> > >>>      net/mlx5: fix queue rollback when starting device
> > >>>      net/mlx5: fix error number handling
> > >>>      net/mlx5: preserve promiscuous flag for flow isolation mode
> > >>>      net/mlx5: preserve allmulticast flag for flow isolation mode
> > >>>
> >
> >
>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-24  8:51  0%       ` Alejandro Lucero
@ 2018-08-24 14:31  0%         ` Yongseok Koh
  2018-08-24 15:00  0%           ` Alejandro Lucero
  0 siblings, 1 reply; 200+ results
From: Yongseok Koh @ 2018-08-24 14:31 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dpdk stable, dev


> On Aug 24, 2018, at 1:51 AM, Alejandro Lucero <alejandro.lucero@netronome.com> wrote:
> 
> 
> 
> On Thu, Aug 23, 2018 at 6:18 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> > On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > 
> > On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
> >> Hi Yonngseok,
> >> 
> >> There is a patchset aimed at 17.11.x:
> >> 
> >> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0
> >> 
> >> It was not accepted for master because the memory code has changed a lot
> >> since 17.11, and I'm working on another patchset for adjusting to the those
> >> changes.
> >> 
> >> I wonder if there is any issue with adding this patchset to stable 17.11.4.
> >> Note that this makes unlikely a known limitation with emulated IOMMU inside
> >> VMs.
> > 
> > This patchset seems quite large for stable release and need to be well verified
> > before GA. In -rc1 stage, we don't usually take such a large patchset as people
> > have already started verification. And we don't usually release -rc2. If you're
> > trying to solve a very critical issue with this patchset, I have to release -rc2
> > and ask people to verify again. How critical is your issue?
> > 
> > For the patchset,
> > - "mem: add function for checking memsegs IOVAs addresses"
> >  This is adding a new API, so I don't expect any API/ABI breakage, but want to
> >  double-confirm with Thomas. Thomas?
> > 
> > - "bus/pci: use IOVAs check when setting IOVA mode"
> >  All the patches got ack except for this one but from looking at the threads in
> >  dev mailing list, it looks okay. I have a question though.
> > 
> >> @@ -640,13 +643,17 @@
> >> {
> >>        struct rte_pci_device *dev = NULL;
> >>        struct rte_pci_driver *drv = NULL;
> >> +       int iommu_dma_mask_check_done = 0;
> >> 
> >>        FOREACH_DRIVER_ON_PCIBUS(drv) {
> >>                FOREACH_DEVICE_ON_PCIBUS(dev) {
> >>                        if (!rte_pci_match(drv, dev))
> >>                                continue;
> >> -                       if (!pci_one_device_iommu_support_va(dev))
> >> -                               return false;
> >> +                       if (!iommu_dma_mask_check_done) {
> >> +                               if (pci_one_device_iommu_support_va(dev) < 0)
> > 
> > pci_one_device_iommu_support_va() returns true/false(1/0), then why do you
> > expect to see a negative return value in case of failure?
> 
> 
> Emulated IOMMU has a 39 bits addressing limitation in some QEMU versions. With pci_one_device_iommu_support_va this is checked out, and if it does exist, IOMMU with VA is not supported. 
> 
> This patch avoids such coarse check using dma mask code added for allowing IOMMU with VA if allocated memory is below the addressing limitation. This is going to help for using IOMMU with VA is most of the systems out there, and even with systems with more than 512GB as long as the DPDK allocated memory is below that limit. 

I was asking about this change:

from,
> >> -                       if (!pci_one_device_iommu_support_va(dev))

to,
> >> +                               if (pci_one_device_iommu_support_va(dev) < 0)


The original code checks zero but you changed it to check negative value.
But it looks pci_one_device_iommu_support_va() doesn't return negative value, right?

I thought this is buggy, please let me know.

Thanks,
Yongseok

> 
> Alejandro,
> 
> As I will release -rc2, I can integrate your patchset but this should be
> addressed. Please let me know.
> 
> Thanks,
> Yongseok
> 
> >> +                                       return false;
> >> +                               iommu_dma_mask_check_done  = 1;
> >> +                       }
> >>                }
> >>        }
> >>        return true;
> >> 
> >> 
> >> 
> >> Thanks
> >> 
> >> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >> 
> >>> Hi all,
> >>> 
> >>> Here is a list of patches targeted for LTS release 17.11.4. Please help
> >>> review
> >>> and test. The planned date for the final release is August 23. Before that,
> >>> please shout if anyone has objections with these patches being applied.
> >>> 
> >>> Also for the companies committed to running regression tests, please run
> >>> the
> >>> tests and report any issue before the release date.
> >>> 
> >>> A release candidate tarball can be found at:
> >>> 
> >>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=AMgyJMFIs512o5zfZ4aNSy1Ptp%2BhEIMUCVZ6HaL2F40%3D&amp;reserved=0
> >>> 
> >>> These patches are located at branch 17.11 of dpdk-stable repo:
> >>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=hvOhEk502vVzboCbRbCZXqJXcsiI3DTtgQypQJi0Aro%3D&amp;reserved=0
> >>> 
> >>> Thanks,
> >>> Yongseok
> >>> 
> >>> ---
> >>> Adrien Mazarguil (2):
> >>>      maintainers: update for Mellanox PMDs
> >>>      net/mlx4: fix minor resource leak during init
> >>> 
> >>> Ajit Khaparde (7):
> >>>      net/bnxt: fix HW Tx checksum offload check
> >>>      net/bnxt: fix set MTU
> >>>      net/bnxt: fix Rx ring count limitation
> >>>      net/bnxt: fix memory leaks in NVM commands
> >>>      net/bnxt: fix lock release on NVM write failure
> >>>      net/bnxt: check access denied for HWRM commands
> >>>      net/bnxt: fix RETA size
> >>> 
> >>> Alejandro Lucero (1):
> >>>      net/nfp: fix field initialization in Tx descriptor
> >>> 
> >>> Alok Makhariya (1):
> >>>      bus/dpaa: fix phandle support for Linux 4.16
> >>> 
> >>> Anatoly Burakov (8):
> >>>      eal/linux: fix invalid syntax in interrupts
> >>>      eal/linux: fix uninitialized value
> >>>      test: fix EAL flags autotest on FreeBSD
> >>>      test: fix result printing
> >>>      test: fix code on report
> >>>      test: make autotest runner python 2/3 compliant
> >>>      test: print autotest categories
> >>>      test: improve filtering
> >>> 
> >>> Andrew Rybchenko (2):
> >>>      net/sfc: cut non VLAN ID bits from TCI
> >>>      net/sfc: fix assert in set multicast address list
> >>> 
> >>> Andy Green (1):
> >>>      ring: fix sign conversion warning
> >>> 
> >>> Beilei Xing (3):
> >>>      net/i40e: fix shifts of 32-bit value
> >>>      net/i40e: fix packet type parsing with DDP
> >>>      net/i40e: fix setting TPID with AQ command
> >>> 
> >>> Bruce Richardson (2):
> >>>      examples/exception_path: fix out-of-bounds read
> >>>      mk: fix permissions when using make install
> >>> 
> >>> Chas Williams (2):
> >>>      net/bonding: always update bonding link status
> >>>      net/bonding: do not clear active slave count
> >>> 
> >>> Dan Gora (1):
> >>>      kni: fix crash with null name
> >>> 
> >>> Daria Kolistratova (1):
> >>>      net/ena: fix SIGFPE with 0 Rx queue
> >>> 
> >>> Dariusz Stojaczyk (1):
> >>>      eal: fix return codes on thread naming failure
> >>> 
> >>> David Marchand (1):
> >>>      net/bnxt: add missing ids in xstats
> >>> 
> >>> Drocula Lambda (1):
> >>>      kni: fix build on RHEL 7.5
> >>> 
> >>> Ferruh Yigit (2):
> >>>      kni: fix build with gcc 8.1
> >>>      net/thunderx: fix build with gcc optimization on
> >>> 
> >>> Gavin Hu (3):
> >>>      mk: fix cross build
> >>>      net/dpaa2: remove loop for unused pool entries
> >>>      maintainers: claim maintainership for ARM v7 and v8
> >>> 
> >>> Haiyue Wang (1):
> >>>      net/i40e: workaround performance degradation
> >>> 
> >>> Harry van Haaren (1):
> >>>      event: fix ring init failure handling
> >>> 
> >>> Hemant Agrawal (2):
> >>>      test/crypto: fix device id when stopping port
> >>>      bus/dpaa: fix buffer offset setting in FMAN
> >>> 
> >>> Hyong Youb Kim (1):
> >>>      net/enic: do not overwrite admin Tx queue limit
> >>> 
> >>> Ido Goshen (1):
> >>>      net/pcap: fix multiple queues
> >>> 
> >>> Jananee Parthasarathy (1):
> >>>      mk: update targets for classified tests
> >>> 
> >>> Jay Ding (1):
> >>>      net/bnxt: check for invalid vNIC id
> >>> 
> >>> Jerin Jacob (2):
> >>>      ethdev: fix queue statistics mapping documentation
> >>>      eal: fix bitmap documentation
> >>> 
> >>> Kiran Kumar (2):
> >>>      net/bonding: fix MAC address reset
> >>>      net/thunderx: avoid sq door bell write on zero packet
> >>> 
> >>> Konstantin Ananyev (3):
> >>>      examples/ipsec-secgw: fix IPv4 checksum at Tx
> >>>      examples/ipsec-secgw: fix bypass rule processing
> >>>      app/testpmd: fix DCB config
> >>> 
> >>> Maxime Coquelin (4):
> >>>      vhost: improve dirty pages logging performance
> >>>      vhost: fix missing increment of log cache count
> >>>      vhost: flush IOTLB cache on new mem table handling
> >>>      vhost: retranslate vring addr when memory table changes
> >>> 
> >>> Moti Haimovsky (2):
> >>>      net/mlx5: fix build with old kernels
> >>>      net/mlx4: check RSS queues number limitation
> >>> 
> >>> Nelio Laranjeiro (1):
> >>>      net/mlx5: fix TCI mask filter
> >>> 
> >>> Nikhil Rao (5):
> >>>      eventdev: fix port in Rx adapter internal function
> >>>      eventdev: fix missing update to Rx adaper WRR position
> >>>      eventdev: add event buffer flush in Rx adapter
> >>>      eventdev: fix internal port logic in Rx adapter
> >>>      eventdev: fix Rx SW adapter stop
> >>> 
> >>> Nithin Dabilpuram (1):
> >>>      app/testpmd: fix buffer leak in TM command
> >>> 
> >>> Ophir Munk (1):
> >>>      net/mlx5: fix secondary process resource leakage
> >>> 
> >>> Pablo de Lara (7):
> >>>      examples/l2fwd-crypto: fix digest with AEAD algo
> >>>      examples/l2fwd-crypto: check return value on IV size check
> >>>      examples/l2fwd-crypto: skip device not supporting operation
> >>>      test/hash: fix multiwriter with non consecutive cores
> >>>      test/hash: fix potential memory leak
> >>>      app/crypto-perf: fix auth IV offset
> >>>      hash: fix doxygen of return values
> >>> 
> >>> Pavan Nikhilesh (2):
> >>>      event/octeontx: remove unnecessary port start and stop
> >>>      net/octeontx: fix stop clearing Rx/Tx functions
> >>> 
> >>> Qi Zhang (1):
> >>>      vfio: fix PCI address comparison
> >>> 
> >>> Radu Nicolau (3):
> >>>      security: fix crash on destroy null session
> >>>      test: fix uninitialized port configuration
> >>>      net/bonding: fix race condition
> >>> 
> >>> Rafal Kozik (4):
> >>>      net/ena: fix GENMASK_ULL macro
> >>>      net/ena: set link speed as none
> >>>      net/ena: check pointer before memset
> >>>      net/ena: change memory type
> >>> 
> >>> Rahul Lakkireddy (1):
> >>>      net/cxgbe: fix init failure due to new flash parts
> >>> 
> >>> Rami Rosen (2):
> >>>      examples/l3fwd: remove useless include
> >>>      ethdev: fix a doxygen comment for port allocation
> >>> 
> >>> Rasesh Mody (9):
> >>>      net/qede: fix VF MTU update
> >>>      net/qede: remove primary MAC removal
> >>>      net/qede: fix for devargs
> >>>      net/qede: fix default extended VLAN offload config
> >>>      doc: update qede management firmware guide
> >>>      net/qede/base: fix GRC attention callback
> >>>      net/bnx2x: fix FW command timeout during stop
> >>>      net/bnx2x: fix poll link status
> >>>      net/qede/base: fix to clear HW indication
> >>> 
> >>> Remy Horton (4):
> >>>      bitrate: add sanity check on parameters
> >>>      metrics: add check for invalid key
> >>>      metrics: do not fail silently when uninitialised
> >>>      metrics: disallow null as metric name
> >>> 
> >>> Reshma Pattan (2):
> >>>      test/flow_classify: fix return types
> >>>      mk: remove unnecessary test rules
> >>> 
> >>> Rosen Xu (1):
> >>>      examples/flow_filtering: add flow director config for i40e
> >>> 
> >>> Shahaf Shuler (1):
> >>>      net/mlx5: fix compilation for rdma-core v19
> >>> 
> >>> Shahed Shaikh (7):
> >>>      net/qede: fix link change event notification
> >>>      net/qede: fix legacy interrupt mode
> >>>      net/qede: fix incorrect link status update
> >>>      net/qede: fix unicast MAC address handling in VF
> >>>      net/qede: fix interrupt handler unregister
> >>>      net/qede: fix MAC address removal failure message
> >>>      net/qede: fix ntuple filter configuration
> >>> 
> >>> Shreyansh Jain (1):
> >>>      doc: fix bonding command in testpmd
> >>> 
> >>> Somnath Kotur (3):
> >>>      net/bnxt: fix to move a flow to a different queue
> >>>      net/bnxt: use correct flags during VLAN configuration
> >>>      net/bnxt: fix filter freeing
> >>> 
> >>> Thomas Monjalon (1):
> >>>      bus/dpaa: fix build
> >>> 
> >>> Tomasz Duszynski (1):
> >>>      net/mvpp2: check pointer before using it
> >>> 
> >>> Wei Zhao (7):
> >>>      net/ixgbe: add support for VLAN in IP mode FDIR
> >>>      net/ixgbe: fix tunnel id format error for FDIR
> >>>      net/ixgbe: fix tunnel type set error for FDIR
> >>>      net/ixgbe: fix mask bits register set error for FDIR
> >>>      app/testpmd: fix VLAN TCI mask set error for FDIR
> >>>      net/i40e: fix check of flow director programming status
> >>>      net/i40e: revert fix of flow director check
> >>> 
> >>> Xiaoyun Li (1):
> >>>      net/i40e: fix link speed
> >>> 
> >>> Xueming Li (1):
> >>>      net/mlx5: fix crash in device probe
> >>> 
> >>> Yipeng Wang (3):
> >>>      hash: fix multiwriter lock memory allocation
> >>>      hash: fix a multi-writer race condition
> >>>      hash: fix key slot size accuracy
> >>> 
> >>> Yongseok Koh (7):
> >>>      net/mlx5: fix Rx buffer replenishment threshold
> >>>      net/mlx5: add missing sanity checks for Tx completion queue
> >>>      net/mlx5: fix assert for Tx completion queue count
> >>>      net/mlx5: fix queue rollback when starting device
> >>>      net/mlx5: fix error number handling
> >>>      net/mlx5: preserve promiscuous flag for flow isolation mode
> >>>      net/mlx5: preserve allmulticast flag for flow isolation mode
> >>> 
> 
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 1/2] ethdev: fix MAC changes when live change not supported
  @ 2018-08-24 14:25  4% ` Alejandro Lucero
  0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-08-24 14:25 UTC (permalink / raw)
  To: dev; +Cc: stable, ferruh.yigit

Current code assumes a MAC change can occur when the port has been
started. In fact, there are some NICs which require this port state
for being successful, but other NICs not always support MAC change
in that case.

This patch supports a new device flag for a device advertising this
limitation, and if the flag is set, the MAC is changed before the
port starts.

Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 doc/guides/rel_notes/release_18_11.rst |  6 +++++-
 lib/librte_ethdev/rte_ethdev.c         | 28 +++++++++++++++++++---------
 lib/librte_ethdev/rte_ethdev.h         |  6 ++++++
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f..b0c73bd 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,7 +67,11 @@ API Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =========================================================
-
+   * A new device flag, RTE_ETH_DEV_NOLIVE_MAC_ADDR, changes the order of
+     actions inside rte_eth_dev_start regarding MAC set. Some NICs do not
+     support MAC changes once the port has started and with this new device
+     flag the MAC can be properly configured in any case. This is particularly
+     important for bonding.
 
 ABI Changes
 -----------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index f32722f..16825bf 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1219,19 +1219,14 @@ struct rte_eth_dev *
 }
 
 static void
-rte_eth_dev_config_restore(uint16_t port_id)
+rte_eth_dev_mac_restore(struct rte_eth_dev *dev,
+			struct rte_eth_dev_info *dev_info)
 {
-	struct rte_eth_dev *dev;
-	struct rte_eth_dev_info dev_info;
 	struct ether_addr *addr;
 	uint16_t i;
 	uint32_t pool = 0;
 	uint64_t pool_mask;
 
-	dev = &rte_eth_devices[port_id];
-
-	rte_eth_dev_info_get(port_id, &dev_info);
-
 	/* replay MAC address configuration including default MAC */
 	addr = &dev->data->mac_addrs[0];
 	if (*dev->dev_ops->mac_addr_set != NULL)
@@ -1240,7 +1235,7 @@ struct rte_eth_dev *
 		(*dev->dev_ops->mac_addr_add)(dev, addr, 0, pool);
 
 	if (*dev->dev_ops->mac_addr_add != NULL) {
-		for (i = 1; i < dev_info.max_mac_addrs; i++) {
+		for (i = 1; i < dev_info->max_mac_addrs; i++) {
 			addr = &dev->data->mac_addrs[i];
 
 			/* skip zero address */
@@ -1259,6 +1254,14 @@ struct rte_eth_dev *
 			} while (pool_mask);
 		}
 	}
+}
+
+static void
+rte_eth_dev_config_restore(struct rte_eth_dev *dev,
+			   struct rte_eth_dev_info *dev_info, uint16_t port_id)
+{
+	if (!(*dev_info->dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR))
+		rte_eth_dev_mac_restore(dev, dev_info);
 
 	/* replay promiscuous configuration */
 	if (rte_eth_promiscuous_get(port_id) == 1)
@@ -1277,6 +1280,7 @@ struct rte_eth_dev *
 rte_eth_dev_start(uint16_t port_id)
 {
 	struct rte_eth_dev *dev;
+	struct rte_eth_dev_info dev_info;
 	int diag;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -1292,13 +1296,19 @@ struct rte_eth_dev *
 		return 0;
 	}
 
+	rte_eth_dev_info_get(port_id, &dev_info);
+
+	/* Lets restore MAC now if device does not support live change */
+	if (*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR)
+		rte_eth_dev_mac_restore(dev, &dev_info);
+
 	diag = (*dev->dev_ops->dev_start)(dev);
 	if (diag == 0)
 		dev->data->dev_started = 1;
 	else
 		return eth_err(port_id, diag);
 
-	rte_eth_dev_config_restore(port_id);
+	rte_eth_dev_config_restore(dev, &dev_info, port_id);
 
 	if (dev->data->dev_conf.intr_conf.lsc == 0) {
 		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->link_update, -ENOTSUP);
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 7070e9a..fa2812b 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1268,6 +1268,8 @@ struct rte_eth_dev_owner {
 #define RTE_ETH_DEV_INTR_RMV     0x0008
 /** Device is port representor */
 #define RTE_ETH_DEV_REPRESENTOR  0x0010
+/** Device does not support MAC change after started */
+#define RTE_ETH_DEV_NOLIVE_MAC_ADDR  0x0020
 
 /**
  * Iterates over valid ethdev ports owned by a specific owner.
@@ -1750,6 +1752,10 @@ int rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
  * The device start step is the last one and consists of setting the configured
  * offload features and in starting the transmit and the receive units of the
  * device.
+ *
+ * Device RTE_ETH_DEV_NOLIVE_MAC_ADDR flag causes MAC address to be set before
+ * PMD port start callback function is invoked.
+ *
  * On success, all basic functions exported by the Ethernet API (link status,
  * receive/transmit, and so on) can be invoked.
  *
-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state
@ 2018-08-24 14:51 11% Ferruh Yigit
  2018-08-27 15:00  0% ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-08-24 14:51 UTC (permalink / raw)
  To: Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, Ferruh Yigit, Thomas Monjalon, Andrew Rybchenko, Matan Azrad

Add a deprecation notice to remove RTE_ETH_DEV_DEFERRED state, but this
is mostly a reminder because of a missing target.
It doesn't worth to break the ABI because of this change and removal
can be done when ethdev ABI version increased.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Andrew Rybchenko <arybchenko@solarflare.com>
Cc: Matan Azrad <matan@mellanox.com>
---
 doc/guides/rel_notes/deprecation.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e2dbee317..9cd12ccd8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -95,3 +95,7 @@ Deprecation Notices
 
   This is due to a lack of flexibility and reliance on a type unusable with
   C++ programs (struct rte_flow_desc).
+
+* ethdev: remove deprecated RTE_ETH_DEV_DEFERRED device state.
+  Since this is an enum filed in the middle, removing this field will break
+  the ABI, so removing postponed to next ethdev ABI version increase.
-- 
2.17.1

^ permalink raw reply	[relevance 11%]

* [dpdk-dev] [PATCH v2 3/3] doc: comment rte_eth_dev_start change
  @ 2018-08-24 11:15  4% ` Alejandro Lucero
  0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-08-24 11:15 UTC (permalink / raw)
  To: dev; +Cc: stable, ferruh.yigit

The new device flag RTE_ETH_DEV_NOLIVE_MAC_ADDR modifies how MAC is set
when calling rte_eth_dev_start.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 doc/guides/rel_notes/release_18_11.rst | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f..b0c73bd 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,7 +67,11 @@ API Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =========================================================
-
+   * A new device flag, RTE_ETH_DEV_NOLIVE_MAC_ADDR, changes the order of
+     actions inside rte_eth_dev_start regarding MAC set. Some NICs do not
+     support MAC changes once the port has started and with this new device
+     flag the MAC can be properly configured in any case. This is particularly
+     important for bonding.
 
 ABI Changes
 -----------
-- 
1.9.1

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter
  2018-08-03 13:36  3% ` [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter Adrien Mazarguil
  2018-08-23 13:48  0%   ` Ferruh Yigit
@ 2018-08-24 10:58  0%   ` Ferruh Yigit
  2018-08-27 14:12  0%     ` Adrien Mazarguil
  2018-08-31  9:00  3%   ` [dpdk-dev] [PATCH v3 " Adrien Mazarguil
  2 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-08-24 10:58 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev, Jerin Jacob, Gavin Hu

On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> This is a follow up to the "Flow API helpers enhancements" series submitted
> almost a year ago [1]. The new title is due to the reduced scope of this
> version.
> 
> rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> temporary solution pending something better [2]. It replaces a lot of
> duplicated code found in testpmd and removes some of the maintenance burden
> that developers tend to forget (me included) when modifying pattern
> item or actions (updating app/test-pmd/config.c to be clear).
> 
> This series was unearthed in order to complete the implementation of
> RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> duplicate existing code once again.
> 
> See individual patches for specific changes in this version.
> 
> v2 changes:
> 
> - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> - Updated bonding PMD.
> - No more automatic generation of rte_flow_conv.h.
> 
> [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> [3] Currently the command-line parser (cmdline_flow.c) is aware of these
>     actions, however config.c isn't. Flow rules with such actions cannot
>     be created and cannot be validated with PMDs that implement them.
> 
> Adrien Mazarguil (7):
>   ethdev: add flow API object converter
>   ethdev: add flow API item/action name conversion
>   app/testpmd: rely on flow API conversion function
>   net/failsafe: switch to flow API object conversion function
>   net/bonding: switch to flow API object conversion function
>   ethdev: deprecate rte_flow_copy function
>   ethdev: add missing item/actions to flow object converter

Causing build error for arm, it looks like related to rte_memcpy macro:

.../lib/librte_ethdev/rte_flow.c: In function ‘rte_flow_conv_item_spec’:
.../lib/librte_ethdev/rte_flow.c:373:58: error: macro "rte_memcpy" passed 9
arguments, but takes just 3
       (size > sizeof(*dst.raw) ? sizeof(*dst.raw) : size));
                                                          ^

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-23 16:18  0%     ` Yongseok Koh
@ 2018-08-24  8:51  0%       ` Alejandro Lucero
  2018-08-24 14:31  0%         ` Yongseok Koh
  0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-08-24  8:51 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: dpdk stable, dev

On Thu, Aug 23, 2018 at 6:18 PM, Yongseok Koh <yskoh@mellanox.com> wrote:

>
> > On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >
> > On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
> >> Hi Yonngseok,
> >>
> >> There is a patchset aimed at 17.11.x:
> >>
> >> https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;
> data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%
> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=
> 8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0
> >>
> >> It was not accepted for master because the memory code has changed a lot
> >> since 17.11, and I'm working on another patchset for adjusting to the
> those
> >> changes.
> >>
> >> I wonder if there is any issue with adding this patchset to stable
> 17.11.4.
> >> Note that this makes unlikely a known limitation with emulated IOMMU
> inside
> >> VMs.
> >
> > This patchset seems quite large for stable release and need to be well
> verified
> > before GA. In -rc1 stage, we don't usually take such a large patchset as
> people
> > have already started verification. And we don't usually release -rc2. If
> you're
> > trying to solve a very critical issue with this patchset, I have to
> release -rc2
> > and ask people to verify again. How critical is your issue?
> >
> > For the patchset,
> > - "mem: add function for checking memsegs IOVAs addresses"
> >  This is adding a new API, so I don't expect any API/ABI breakage, but
> want to
> >  double-confirm with Thomas. Thomas?
> >
> > - "bus/pci: use IOVAs check when setting IOVA mode"
> >  All the patches got ack except for this one but from looking at the
> threads in
> >  dev mailing list, it looks okay. I have a question though.
> >
> >> @@ -640,13 +643,17 @@
> >> {
> >>        struct rte_pci_device *dev = NULL;
> >>        struct rte_pci_driver *drv = NULL;
> >> +       int iommu_dma_mask_check_done = 0;
> >>
> >>        FOREACH_DRIVER_ON_PCIBUS(drv) {
> >>                FOREACH_DEVICE_ON_PCIBUS(dev) {
> >>                        if (!rte_pci_match(drv, dev))
> >>                                continue;
> >> -                       if (!pci_one_device_iommu_support_va(dev))
> >> -                               return false;
> >> +                       if (!iommu_dma_mask_check_done) {
> >> +                               if (pci_one_device_iommu_support_va(dev)
> < 0)
> >
> > pci_one_device_iommu_support_va() returns true/false(1/0), then why do
> you
> > expect to see a negative return value in case of failure?
>
>
Emulated IOMMU has a 39 bits addressing limitation in some QEMU versions.
With pci_one_device_iommu_support_va this is checked out, and if it does
exist, IOMMU with VA is not supported.

This patch avoids such coarse check using dma mask code added for allowing
IOMMU with VA if allocated memory is below the addressing limitation. This
is going to help for using IOMMU with VA is most of the systems out there,
and even with systems with more than 512GB as long as the DPDK allocated
memory is below that limit.

Alejandro,
>
> As I will release -rc2, I can integrate your patchset but this should be
> addressed. Please let me know.
>
> Thanks,
> Yongseok
>
> >> +                                       return false;
> >> +                               iommu_dma_mask_check_done  = 1;
> >> +                       }
> >>                }
> >>        }
> >>        return true;
> >>
> >>
> >>
> >> Thanks
> >>
> >> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com>
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Here is a list of patches targeted for LTS release 17.11.4. Please help
> >>> review
> >>> and test. The planned date for the final release is August 23. Before
> that,
> >>> please shout if anyone has objections with these patches being applied.
> >>>
> >>> Also for the companies committed to running regression tests, please
> run
> >>> the
> >>> tests and report any issue before the release date.
> >>>
> >>> A release candidate tarball can be found at:
> >>>
> >>>    https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%
> 2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%
> 7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f
> 461b%7C0%7C0%7C636705803846548967&amp;sdata=AMgyJMFIs512o5zfZ4aNSy1Ptp%
> 2BhEIMUCVZ6HaL2F40%3D&amp;reserved=0
> >>>
> >>> These patches are located at branch 17.11 of dpdk-stable repo:
> >>>    https://emea01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%
> 40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%
> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=
> hvOhEk502vVzboCbRbCZXqJXcsiI3DTtgQypQJi0Aro%3D&amp;reserved=0
> >>>
> >>> Thanks,
> >>> Yongseok
> >>>
> >>> ---
> >>> Adrien Mazarguil (2):
> >>>      maintainers: update for Mellanox PMDs
> >>>      net/mlx4: fix minor resource leak during init
> >>>
> >>> Ajit Khaparde (7):
> >>>      net/bnxt: fix HW Tx checksum offload check
> >>>      net/bnxt: fix set MTU
> >>>      net/bnxt: fix Rx ring count limitation
> >>>      net/bnxt: fix memory leaks in NVM commands
> >>>      net/bnxt: fix lock release on NVM write failure
> >>>      net/bnxt: check access denied for HWRM commands
> >>>      net/bnxt: fix RETA size
> >>>
> >>> Alejandro Lucero (1):
> >>>      net/nfp: fix field initialization in Tx descriptor
> >>>
> >>> Alok Makhariya (1):
> >>>      bus/dpaa: fix phandle support for Linux 4.16
> >>>
> >>> Anatoly Burakov (8):
> >>>      eal/linux: fix invalid syntax in interrupts
> >>>      eal/linux: fix uninitialized value
> >>>      test: fix EAL flags autotest on FreeBSD
> >>>      test: fix result printing
> >>>      test: fix code on report
> >>>      test: make autotest runner python 2/3 compliant
> >>>      test: print autotest categories
> >>>      test: improve filtering
> >>>
> >>> Andrew Rybchenko (2):
> >>>      net/sfc: cut non VLAN ID bits from TCI
> >>>      net/sfc: fix assert in set multicast address list
> >>>
> >>> Andy Green (1):
> >>>      ring: fix sign conversion warning
> >>>
> >>> Beilei Xing (3):
> >>>      net/i40e: fix shifts of 32-bit value
> >>>      net/i40e: fix packet type parsing with DDP
> >>>      net/i40e: fix setting TPID with AQ command
> >>>
> >>> Bruce Richardson (2):
> >>>      examples/exception_path: fix out-of-bounds read
> >>>      mk: fix permissions when using make install
> >>>
> >>> Chas Williams (2):
> >>>      net/bonding: always update bonding link status
> >>>      net/bonding: do not clear active slave count
> >>>
> >>> Dan Gora (1):
> >>>      kni: fix crash with null name
> >>>
> >>> Daria Kolistratova (1):
> >>>      net/ena: fix SIGFPE with 0 Rx queue
> >>>
> >>> Dariusz Stojaczyk (1):
> >>>      eal: fix return codes on thread naming failure
> >>>
> >>> David Marchand (1):
> >>>      net/bnxt: add missing ids in xstats
> >>>
> >>> Drocula Lambda (1):
> >>>      kni: fix build on RHEL 7.5
> >>>
> >>> Ferruh Yigit (2):
> >>>      kni: fix build with gcc 8.1
> >>>      net/thunderx: fix build with gcc optimization on
> >>>
> >>> Gavin Hu (3):
> >>>      mk: fix cross build
> >>>      net/dpaa2: remove loop for unused pool entries
> >>>      maintainers: claim maintainership for ARM v7 and v8
> >>>
> >>> Haiyue Wang (1):
> >>>      net/i40e: workaround performance degradation
> >>>
> >>> Harry van Haaren (1):
> >>>      event: fix ring init failure handling
> >>>
> >>> Hemant Agrawal (2):
> >>>      test/crypto: fix device id when stopping port
> >>>      bus/dpaa: fix buffer offset setting in FMAN
> >>>
> >>> Hyong Youb Kim (1):
> >>>      net/enic: do not overwrite admin Tx queue limit
> >>>
> >>> Ido Goshen (1):
> >>>      net/pcap: fix multiple queues
> >>>
> >>> Jananee Parthasarathy (1):
> >>>      mk: update targets for classified tests
> >>>
> >>> Jay Ding (1):
> >>>      net/bnxt: check for invalid vNIC id
> >>>
> >>> Jerin Jacob (2):
> >>>      ethdev: fix queue statistics mapping documentation
> >>>      eal: fix bitmap documentation
> >>>
> >>> Kiran Kumar (2):
> >>>      net/bonding: fix MAC address reset
> >>>      net/thunderx: avoid sq door bell write on zero packet
> >>>
> >>> Konstantin Ananyev (3):
> >>>      examples/ipsec-secgw: fix IPv4 checksum at Tx
> >>>      examples/ipsec-secgw: fix bypass rule processing
> >>>      app/testpmd: fix DCB config
> >>>
> >>> Maxime Coquelin (4):
> >>>      vhost: improve dirty pages logging performance
> >>>      vhost: fix missing increment of log cache count
> >>>      vhost: flush IOTLB cache on new mem table handling
> >>>      vhost: retranslate vring addr when memory table changes
> >>>
> >>> Moti Haimovsky (2):
> >>>      net/mlx5: fix build with old kernels
> >>>      net/mlx4: check RSS queues number limitation
> >>>
> >>> Nelio Laranjeiro (1):
> >>>      net/mlx5: fix TCI mask filter
> >>>
> >>> Nikhil Rao (5):
> >>>      eventdev: fix port in Rx adapter internal function
> >>>      eventdev: fix missing update to Rx adaper WRR position
> >>>      eventdev: add event buffer flush in Rx adapter
> >>>      eventdev: fix internal port logic in Rx adapter
> >>>      eventdev: fix Rx SW adapter stop
> >>>
> >>> Nithin Dabilpuram (1):
> >>>      app/testpmd: fix buffer leak in TM command
> >>>
> >>> Ophir Munk (1):
> >>>      net/mlx5: fix secondary process resource leakage
> >>>
> >>> Pablo de Lara (7):
> >>>      examples/l2fwd-crypto: fix digest with AEAD algo
> >>>      examples/l2fwd-crypto: check return value on IV size check
> >>>      examples/l2fwd-crypto: skip device not supporting operation
> >>>      test/hash: fix multiwriter with non consecutive cores
> >>>      test/hash: fix potential memory leak
> >>>      app/crypto-perf: fix auth IV offset
> >>>      hash: fix doxygen of return values
> >>>
> >>> Pavan Nikhilesh (2):
> >>>      event/octeontx: remove unnecessary port start and stop
> >>>      net/octeontx: fix stop clearing Rx/Tx functions
> >>>
> >>> Qi Zhang (1):
> >>>      vfio: fix PCI address comparison
> >>>
> >>> Radu Nicolau (3):
> >>>      security: fix crash on destroy null session
> >>>      test: fix uninitialized port configuration
> >>>      net/bonding: fix race condition
> >>>
> >>> Rafal Kozik (4):
> >>>      net/ena: fix GENMASK_ULL macro
> >>>      net/ena: set link speed as none
> >>>      net/ena: check pointer before memset
> >>>      net/ena: change memory type
> >>>
> >>> Rahul Lakkireddy (1):
> >>>      net/cxgbe: fix init failure due to new flash parts
> >>>
> >>> Rami Rosen (2):
> >>>      examples/l3fwd: remove useless include
> >>>      ethdev: fix a doxygen comment for port allocation
> >>>
> >>> Rasesh Mody (9):
> >>>      net/qede: fix VF MTU update
> >>>      net/qede: remove primary MAC removal
> >>>      net/qede: fix for devargs
> >>>      net/qede: fix default extended VLAN offload config
> >>>      doc: update qede management firmware guide
> >>>      net/qede/base: fix GRC attention callback
> >>>      net/bnx2x: fix FW command timeout during stop
> >>>      net/bnx2x: fix poll link status
> >>>      net/qede/base: fix to clear HW indication
> >>>
> >>> Remy Horton (4):
> >>>      bitrate: add sanity check on parameters
> >>>      metrics: add check for invalid key
> >>>      metrics: do not fail silently when uninitialised
> >>>      metrics: disallow null as metric name
> >>>
> >>> Reshma Pattan (2):
> >>>      test/flow_classify: fix return types
> >>>      mk: remove unnecessary test rules
> >>>
> >>> Rosen Xu (1):
> >>>      examples/flow_filtering: add flow director config for i40e
> >>>
> >>> Shahaf Shuler (1):
> >>>      net/mlx5: fix compilation for rdma-core v19
> >>>
> >>> Shahed Shaikh (7):
> >>>      net/qede: fix link change event notification
> >>>      net/qede: fix legacy interrupt mode
> >>>      net/qede: fix incorrect link status update
> >>>      net/qede: fix unicast MAC address handling in VF
> >>>      net/qede: fix interrupt handler unregister
> >>>      net/qede: fix MAC address removal failure message
> >>>      net/qede: fix ntuple filter configuration
> >>>
> >>> Shreyansh Jain (1):
> >>>      doc: fix bonding command in testpmd
> >>>
> >>> Somnath Kotur (3):
> >>>      net/bnxt: fix to move a flow to a different queue
> >>>      net/bnxt: use correct flags during VLAN configuration
> >>>      net/bnxt: fix filter freeing
> >>>
> >>> Thomas Monjalon (1):
> >>>      bus/dpaa: fix build
> >>>
> >>> Tomasz Duszynski (1):
> >>>      net/mvpp2: check pointer before using it
> >>>
> >>> Wei Zhao (7):
> >>>      net/ixgbe: add support for VLAN in IP mode FDIR
> >>>      net/ixgbe: fix tunnel id format error for FDIR
> >>>      net/ixgbe: fix tunnel type set error for FDIR
> >>>      net/ixgbe: fix mask bits register set error for FDIR
> >>>      app/testpmd: fix VLAN TCI mask set error for FDIR
> >>>      net/i40e: fix check of flow director programming status
> >>>      net/i40e: revert fix of flow director check
> >>>
> >>> Xiaoyun Li (1):
> >>>      net/i40e: fix link speed
> >>>
> >>> Xueming Li (1):
> >>>      net/mlx5: fix crash in device probe
> >>>
> >>> Yipeng Wang (3):
> >>>      hash: fix multiwriter lock memory allocation
> >>>      hash: fix a multi-writer race condition
> >>>      hash: fix key slot size accuracy
> >>>
> >>> Yongseok Koh (7):
> >>>      net/mlx5: fix Rx buffer replenishment threshold
> >>>      net/mlx5: add missing sanity checks for Tx completion queue
> >>>      net/mlx5: fix assert for Tx completion queue count
> >>>      net/mlx5: fix queue rollback when starting device
> >>>      net/mlx5: fix error number handling
> >>>      net/mlx5: preserve promiscuous flag for flow isolation mode
> >>>      net/mlx5: preserve allmulticast flag for flow isolation mode
> >>>
>
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-23  0:19  3%   ` Yongseok Koh
  2018-08-23  1:23  0%     ` Yongseok Koh
  2018-08-23 11:29  3%     ` Thomas Monjalon
@ 2018-08-23 16:18  0%     ` Yongseok Koh
  2018-08-24  8:51  0%       ` Alejandro Lucero
  2 siblings, 1 reply; 200+ results
From: Yongseok Koh @ 2018-08-23 16:18 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dpdk stable, dev


> On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
>> Hi Yonngseok,
>> 
>> There is a patchset aimed at 17.11.x:
>> 
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0
>> 
>> It was not accepted for master because the memory code has changed a lot
>> since 17.11, and I'm working on another patchset for adjusting to the those
>> changes.
>> 
>> I wonder if there is any issue with adding this patchset to stable 17.11.4.
>> Note that this makes unlikely a known limitation with emulated IOMMU inside
>> VMs.
> 
> This patchset seems quite large for stable release and need to be well verified
> before GA. In -rc1 stage, we don't usually take such a large patchset as people
> have already started verification. And we don't usually release -rc2. If you're
> trying to solve a very critical issue with this patchset, I have to release -rc2
> and ask people to verify again. How critical is your issue?
> 
> For the patchset,
> - "mem: add function for checking memsegs IOVAs addresses"
>  This is adding a new API, so I don't expect any API/ABI breakage, but want to
>  double-confirm with Thomas. Thomas?
> 
> - "bus/pci: use IOVAs check when setting IOVA mode"
>  All the patches got ack except for this one but from looking at the threads in
>  dev mailing list, it looks okay. I have a question though.
> 
>> @@ -640,13 +643,17 @@
>> {
>>        struct rte_pci_device *dev = NULL;
>>        struct rte_pci_driver *drv = NULL;
>> +       int iommu_dma_mask_check_done = 0;
>> 
>>        FOREACH_DRIVER_ON_PCIBUS(drv) {
>>                FOREACH_DEVICE_ON_PCIBUS(dev) {
>>                        if (!rte_pci_match(drv, dev))
>>                                continue;
>> -                       if (!pci_one_device_iommu_support_va(dev))
>> -                               return false;
>> +                       if (!iommu_dma_mask_check_done) {
>> +                               if (pci_one_device_iommu_support_va(dev) < 0)
> 
> pci_one_device_iommu_support_va() returns true/false(1/0), then why do you
> expect to see a negative return value in case of failure?

Alejandro,

As I will release -rc2, I can integrate your patchset but this should be
addressed. Please let me know.

Thanks,
Yongseok

>> +                                       return false;
>> +                               iommu_dma_mask_check_done  = 1;
>> +                       }
>>                }
>>        }
>>        return true;
>> 
>> 
>> 
>> Thanks
>> 
>> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>> 
>>> Hi all,
>>> 
>>> Here is a list of patches targeted for LTS release 17.11.4. Please help
>>> review
>>> and test. The planned date for the final release is August 23. Before that,
>>> please shout if anyone has objections with these patches being applied.
>>> 
>>> Also for the companies committed to running regression tests, please run
>>> the
>>> tests and report any issue before the release date.
>>> 
>>> A release candidate tarball can be found at:
>>> 
>>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=AMgyJMFIs512o5zfZ4aNSy1Ptp%2BhEIMUCVZ6HaL2F40%3D&amp;reserved=0
>>> 
>>> These patches are located at branch 17.11 of dpdk-stable repo:
>>>    https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=hvOhEk502vVzboCbRbCZXqJXcsiI3DTtgQypQJi0Aro%3D&amp;reserved=0
>>> 
>>> Thanks,
>>> Yongseok
>>> 
>>> ---
>>> Adrien Mazarguil (2):
>>>      maintainers: update for Mellanox PMDs
>>>      net/mlx4: fix minor resource leak during init
>>> 
>>> Ajit Khaparde (7):
>>>      net/bnxt: fix HW Tx checksum offload check
>>>      net/bnxt: fix set MTU
>>>      net/bnxt: fix Rx ring count limitation
>>>      net/bnxt: fix memory leaks in NVM commands
>>>      net/bnxt: fix lock release on NVM write failure
>>>      net/bnxt: check access denied for HWRM commands
>>>      net/bnxt: fix RETA size
>>> 
>>> Alejandro Lucero (1):
>>>      net/nfp: fix field initialization in Tx descriptor
>>> 
>>> Alok Makhariya (1):
>>>      bus/dpaa: fix phandle support for Linux 4.16
>>> 
>>> Anatoly Burakov (8):
>>>      eal/linux: fix invalid syntax in interrupts
>>>      eal/linux: fix uninitialized value
>>>      test: fix EAL flags autotest on FreeBSD
>>>      test: fix result printing
>>>      test: fix code on report
>>>      test: make autotest runner python 2/3 compliant
>>>      test: print autotest categories
>>>      test: improve filtering
>>> 
>>> Andrew Rybchenko (2):
>>>      net/sfc: cut non VLAN ID bits from TCI
>>>      net/sfc: fix assert in set multicast address list
>>> 
>>> Andy Green (1):
>>>      ring: fix sign conversion warning
>>> 
>>> Beilei Xing (3):
>>>      net/i40e: fix shifts of 32-bit value
>>>      net/i40e: fix packet type parsing with DDP
>>>      net/i40e: fix setting TPID with AQ command
>>> 
>>> Bruce Richardson (2):
>>>      examples/exception_path: fix out-of-bounds read
>>>      mk: fix permissions when using make install
>>> 
>>> Chas Williams (2):
>>>      net/bonding: always update bonding link status
>>>      net/bonding: do not clear active slave count
>>> 
>>> Dan Gora (1):
>>>      kni: fix crash with null name
>>> 
>>> Daria Kolistratova (1):
>>>      net/ena: fix SIGFPE with 0 Rx queue
>>> 
>>> Dariusz Stojaczyk (1):
>>>      eal: fix return codes on thread naming failure
>>> 
>>> David Marchand (1):
>>>      net/bnxt: add missing ids in xstats
>>> 
>>> Drocula Lambda (1):
>>>      kni: fix build on RHEL 7.5
>>> 
>>> Ferruh Yigit (2):
>>>      kni: fix build with gcc 8.1
>>>      net/thunderx: fix build with gcc optimization on
>>> 
>>> Gavin Hu (3):
>>>      mk: fix cross build
>>>      net/dpaa2: remove loop for unused pool entries
>>>      maintainers: claim maintainership for ARM v7 and v8
>>> 
>>> Haiyue Wang (1):
>>>      net/i40e: workaround performance degradation
>>> 
>>> Harry van Haaren (1):
>>>      event: fix ring init failure handling
>>> 
>>> Hemant Agrawal (2):
>>>      test/crypto: fix device id when stopping port
>>>      bus/dpaa: fix buffer offset setting in FMAN
>>> 
>>> Hyong Youb Kim (1):
>>>      net/enic: do not overwrite admin Tx queue limit
>>> 
>>> Ido Goshen (1):
>>>      net/pcap: fix multiple queues
>>> 
>>> Jananee Parthasarathy (1):
>>>      mk: update targets for classified tests
>>> 
>>> Jay Ding (1):
>>>      net/bnxt: check for invalid vNIC id
>>> 
>>> Jerin Jacob (2):
>>>      ethdev: fix queue statistics mapping documentation
>>>      eal: fix bitmap documentation
>>> 
>>> Kiran Kumar (2):
>>>      net/bonding: fix MAC address reset
>>>      net/thunderx: avoid sq door bell write on zero packet
>>> 
>>> Konstantin Ananyev (3):
>>>      examples/ipsec-secgw: fix IPv4 checksum at Tx
>>>      examples/ipsec-secgw: fix bypass rule processing
>>>      app/testpmd: fix DCB config
>>> 
>>> Maxime Coquelin (4):
>>>      vhost: improve dirty pages logging performance
>>>      vhost: fix missing increment of log cache count
>>>      vhost: flush IOTLB cache on new mem table handling
>>>      vhost: retranslate vring addr when memory table changes
>>> 
>>> Moti Haimovsky (2):
>>>      net/mlx5: fix build with old kernels
>>>      net/mlx4: check RSS queues number limitation
>>> 
>>> Nelio Laranjeiro (1):
>>>      net/mlx5: fix TCI mask filter
>>> 
>>> Nikhil Rao (5):
>>>      eventdev: fix port in Rx adapter internal function
>>>      eventdev: fix missing update to Rx adaper WRR position
>>>      eventdev: add event buffer flush in Rx adapter
>>>      eventdev: fix internal port logic in Rx adapter
>>>      eventdev: fix Rx SW adapter stop
>>> 
>>> Nithin Dabilpuram (1):
>>>      app/testpmd: fix buffer leak in TM command
>>> 
>>> Ophir Munk (1):
>>>      net/mlx5: fix secondary process resource leakage
>>> 
>>> Pablo de Lara (7):
>>>      examples/l2fwd-crypto: fix digest with AEAD algo
>>>      examples/l2fwd-crypto: check return value on IV size check
>>>      examples/l2fwd-crypto: skip device not supporting operation
>>>      test/hash: fix multiwriter with non consecutive cores
>>>      test/hash: fix potential memory leak
>>>      app/crypto-perf: fix auth IV offset
>>>      hash: fix doxygen of return values
>>> 
>>> Pavan Nikhilesh (2):
>>>      event/octeontx: remove unnecessary port start and stop
>>>      net/octeontx: fix stop clearing Rx/Tx functions
>>> 
>>> Qi Zhang (1):
>>>      vfio: fix PCI address comparison
>>> 
>>> Radu Nicolau (3):
>>>      security: fix crash on destroy null session
>>>      test: fix uninitialized port configuration
>>>      net/bonding: fix race condition
>>> 
>>> Rafal Kozik (4):
>>>      net/ena: fix GENMASK_ULL macro
>>>      net/ena: set link speed as none
>>>      net/ena: check pointer before memset
>>>      net/ena: change memory type
>>> 
>>> Rahul Lakkireddy (1):
>>>      net/cxgbe: fix init failure due to new flash parts
>>> 
>>> Rami Rosen (2):
>>>      examples/l3fwd: remove useless include
>>>      ethdev: fix a doxygen comment for port allocation
>>> 
>>> Rasesh Mody (9):
>>>      net/qede: fix VF MTU update
>>>      net/qede: remove primary MAC removal
>>>      net/qede: fix for devargs
>>>      net/qede: fix default extended VLAN offload config
>>>      doc: update qede management firmware guide
>>>      net/qede/base: fix GRC attention callback
>>>      net/bnx2x: fix FW command timeout during stop
>>>      net/bnx2x: fix poll link status
>>>      net/qede/base: fix to clear HW indication
>>> 
>>> Remy Horton (4):
>>>      bitrate: add sanity check on parameters
>>>      metrics: add check for invalid key
>>>      metrics: do not fail silently when uninitialised
>>>      metrics: disallow null as metric name
>>> 
>>> Reshma Pattan (2):
>>>      test/flow_classify: fix return types
>>>      mk: remove unnecessary test rules
>>> 
>>> Rosen Xu (1):
>>>      examples/flow_filtering: add flow director config for i40e
>>> 
>>> Shahaf Shuler (1):
>>>      net/mlx5: fix compilation for rdma-core v19
>>> 
>>> Shahed Shaikh (7):
>>>      net/qede: fix link change event notification
>>>      net/qede: fix legacy interrupt mode
>>>      net/qede: fix incorrect link status update
>>>      net/qede: fix unicast MAC address handling in VF
>>>      net/qede: fix interrupt handler unregister
>>>      net/qede: fix MAC address removal failure message
>>>      net/qede: fix ntuple filter configuration
>>> 
>>> Shreyansh Jain (1):
>>>      doc: fix bonding command in testpmd
>>> 
>>> Somnath Kotur (3):
>>>      net/bnxt: fix to move a flow to a different queue
>>>      net/bnxt: use correct flags during VLAN configuration
>>>      net/bnxt: fix filter freeing
>>> 
>>> Thomas Monjalon (1):
>>>      bus/dpaa: fix build
>>> 
>>> Tomasz Duszynski (1):
>>>      net/mvpp2: check pointer before using it
>>> 
>>> Wei Zhao (7):
>>>      net/ixgbe: add support for VLAN in IP mode FDIR
>>>      net/ixgbe: fix tunnel id format error for FDIR
>>>      net/ixgbe: fix tunnel type set error for FDIR
>>>      net/ixgbe: fix mask bits register set error for FDIR
>>>      app/testpmd: fix VLAN TCI mask set error for FDIR
>>>      net/i40e: fix check of flow director programming status
>>>      net/i40e: revert fix of flow director check
>>> 
>>> Xiaoyun Li (1):
>>>      net/i40e: fix link speed
>>> 
>>> Xueming Li (1):
>>>      net/mlx5: fix crash in device probe
>>> 
>>> Yipeng Wang (3):
>>>      hash: fix multiwriter lock memory allocation
>>>      hash: fix a multi-writer race condition
>>>      hash: fix key slot size accuracy
>>> 
>>> Yongseok Koh (7):
>>>      net/mlx5: fix Rx buffer replenishment threshold
>>>      net/mlx5: add missing sanity checks for Tx completion queue
>>>      net/mlx5: fix assert for Tx completion queue count
>>>      net/mlx5: fix queue rollback when starting device
>>>      net/mlx5: fix error number handling
>>>      net/mlx5: preserve promiscuous flag for flow isolation mode
>>>      net/mlx5: preserve allmulticast flag for flow isolation mode
>>> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter
  2018-08-03 13:36  3% ` [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter Adrien Mazarguil
@ 2018-08-23 13:48  0%   ` Ferruh Yigit
  2018-08-27 15:14  0%     ` Adrien Mazarguil
  2018-08-24 10:58  0%   ` Ferruh Yigit
  2018-08-31  9:00  3%   ` [dpdk-dev] [PATCH v3 " Adrien Mazarguil
  2 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-08-23 13:48 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev

On 8/3/2018 2:36 PM, Adrien Mazarguil wrote:
> This is a follow up to the "Flow API helpers enhancements" series submitted
> almost a year ago [1]. The new title is due to the reduced scope of this
> version.
> 
> rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
> temporary solution pending something better [2]. It replaces a lot of
> duplicated code found in testpmd and removes some of the maintenance burden
> that developers tend to forget (me included) when modifying pattern
> item or actions (updating app/test-pmd/config.c to be clear).
> 
> This series was unearthed in order to complete the implementation of
> RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
> duplicate existing code once again.
> 
> See individual patches for specific changes in this version.
> 
> v2 changes:
> 
> - rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
> - Updated bonding PMD.
> - No more automatic generation of rte_flow_conv.h.
> 
> [1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
> [2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
> [3] Currently the command-line parser (cmdline_flow.c) is aware of these
>     actions, however config.c isn't. Flow rules with such actions cannot
>     be created and cannot be validated with PMDs that implement them.
> 
> Adrien Mazarguil (7):
>   ethdev: add flow API object converter
>   ethdev: add flow API item/action name conversion
>   app/testpmd: rely on flow API conversion function
>   net/failsafe: switch to flow API object conversion function
>   net/bonding: switch to flow API object conversion function
>   ethdev: deprecate rte_flow_copy function
>   ethdev: add missing item/actions to flow object converter

Patch needs to be rebased to target v18.11 (in map file), and indeed new APIs
(rte_flow_conv) needs to be experimental.

And needs to remove deprecation notice in this patchset.
Also do you think does make sense to announce this change in release notes?

Apart from above, any volunteer for reviewing actual implementation?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-23  0:19  3%   ` Yongseok Koh
  2018-08-23  1:23  0%     ` Yongseok Koh
@ 2018-08-23 11:29  3%     ` Thomas Monjalon
  2018-08-23 16:18  0%     ` Yongseok Koh
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-23 11:29 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: Alejandro Lucero, dpdk stable, dev

23/08/2018 02:19, Yongseok Koh:
> For the patchset,
> - "mem: add function for checking memsegs IOVAs addresses"
>   This is adding a new API, so I don't expect any API/ABI breakage, but want to
>   double-confirm with Thomas. Thomas?

Yes, adding a function is not breaking API/ABI.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  2018-08-23  0:19  3%   ` Yongseok Koh
@ 2018-08-23  1:23  0%     ` Yongseok Koh
  2018-08-23 11:29  3%     ` Thomas Monjalon
  2018-08-23 16:18  0%     ` Yongseok Koh
  2 siblings, 0 replies; 200+ results
From: Yongseok Koh @ 2018-08-23  1:23 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dpdk stable, dev, Thomas Monjalon


> On Aug 22, 2018, at 5:19 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
>> Hi Yonngseok,
>> 
>> There is a patchset aimed at 17.11.x:
>> 
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cb1b0e3eff71c499ff3fb08d6088e1ede%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636705803846548967&amp;sdata=8f12c1IuUe4mw2EaTZ18vVTuLTjXOD2cSe%2B%2B7f6OFfk%3D&amp;reserved=0
>> 
>> It was not accepted for master because the memory code has changed a lot
>> since 17.11, and I'm working on another patchset for adjusting to the those
>> changes.
>> 
>> I wonder if there is any issue with adding this patchset to stable 17.11.4.
>> Note that this makes unlikely a known limitation with emulated IOMMU inside
>> VMs.
> 
> This patchset seems quite large for stable release and need to be well verified
> before GA. In -rc1 stage, we don't usually take such a large patchset as people
> have already started verification. And we don't usually release -rc2. If you're
> trying to solve a very critical issue with this patchset, I have to release -rc2
> and ask people to verify again. How critical is your issue?

Looks like you have sent a mail to stable@dpdk.org in July.
http://mails.dpdk.org/archives/stable/2018-July/008589.html

I don't know why some of emails to stable@dpdk.org haven't arrived at my mailbox.
I'm still trying to figure out the reason with IT depart in my company and
re-subscribed to the mailing list as well.

My apologies for that.

Yongseok

> For the patchset,
> - "mem: add function for checking memsegs IOVAs addresses"
>  This is adding a new API, so I don't expect any API/ABI breakage, but want to
>  double-confirm with Thomas. Thomas?
> 
> - "bus/pci: use IOVAs check when setting IOVA mode"
>  All the patches got ack except for this one but from looking at the threads in
>  dev mailing list, it looks okay. I have a question though.
> 
>> @@ -640,13 +643,17 @@
>> {
>>        struct rte_pci_device *dev = NULL;
>>        struct rte_pci_driver *drv = NULL;
>> +       int iommu_dma_mask_check_done = 0;
>> 
>>        FOREACH_DRIVER_ON_PCIBUS(drv) {
>>                FOREACH_DEVICE_ON_PCIBUS(dev) {
>>                        if (!rte_pci_match(drv, dev))
>>                                continue;
>> -                       if (!pci_one_device_iommu_support_va(dev))
>> -                               return false;
>> +                       if (!iommu_dma_mask_check_done) {
>> +                               if (pci_one_device_iommu_support_va(dev) < 0)
> 
> pci_one_device_iommu_support_va() returns true/false(1/0), then why do you
> expect to see a negative return value in case of failure?
> 
>> +                                       return false;
>> +                               iommu_dma_mask_check_done  = 1;
>> +                       }
>>                }
>>        }
>>        return true;

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 17.11.4 patches review and test
  @ 2018-08-23  0:19  3%   ` Yongseok Koh
  2018-08-23  1:23  0%     ` Yongseok Koh
                       ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Yongseok Koh @ 2018-08-23  0:19 UTC (permalink / raw)
  To: Alejandro Lucero; +Cc: dpdk stable, dev, Thomas Monjalon

On Tue, Aug 21, 2018 at 12:07:49PM +0200, Alejandro Lucero wrote:
> Hi Yonngseok,
> 
> There is a patchset aimed at 17.11.x:
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F42741%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C1d7083071364473c772208d6074df481%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636704428739193212&amp;sdata=DKyQFRtsmZbeW46XX53jN2j8IBmqu%2FPM5ndsUjcidiw%3D&amp;reserved=0
> 
> It was not accepted for master because the memory code has changed a lot
> since 17.11, and I'm working on another patchset for adjusting to the those
> changes.
> 
> I wonder if there is any issue with adding this patchset to stable 17.11.4.
> Note that this makes unlikely a known limitation with emulated IOMMU inside
> VMs.

This patchset seems quite large for stable release and need to be well verified
before GA. In -rc1 stage, we don't usually take such a large patchset as people
have already started verification. And we don't usually release -rc2. If you're
trying to solve a very critical issue with this patchset, I have to release -rc2
and ask people to verify again. How critical is your issue?

For the patchset,
- "mem: add function for checking memsegs IOVAs addresses"
  This is adding a new API, so I don't expect any API/ABI breakage, but want to
  double-confirm with Thomas. Thomas?

- "bus/pci: use IOVAs check when setting IOVA mode"
  All the patches got ack except for this one but from looking at the threads in
  dev mailing list, it looks okay. I have a question though.

> @@ -640,13 +643,17 @@
>  {
>         struct rte_pci_device *dev = NULL;
>         struct rte_pci_driver *drv = NULL;
> +       int iommu_dma_mask_check_done = 0;
> 
>         FOREACH_DRIVER_ON_PCIBUS(drv) {
>                 FOREACH_DEVICE_ON_PCIBUS(dev) {
>                         if (!rte_pci_match(drv, dev))
>                                 continue;
> -                       if (!pci_one_device_iommu_support_va(dev))
> -                               return false;
> +                       if (!iommu_dma_mask_check_done) {
> +                               if (pci_one_device_iommu_support_va(dev) < 0)

pci_one_device_iommu_support_va() returns true/false(1/0), then why do you
expect to see a negative return value in case of failure?

> +                                       return false;
> +                               iommu_dma_mask_check_done  = 1;
> +                       }
                }
        }
        return true;


> 
> Thanks
> 
> On Thu, Aug 16, 2018 at 8:18 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> > Hi all,
> >
> > Here is a list of patches targeted for LTS release 17.11.4. Please help
> > review
> > and test. The planned date for the final release is August 23. Before that,
> > please shout if anyone has objections with these patches being applied.
> >
> > Also for the companies committed to running regression tests, please run
> > the
> > tests and report any issue before the release date.
> >
> > A release candidate tarball can be found at:
> >
> >     https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2Ftag%2F%3Fid%3Dv17.11.4-rc1&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C1d7083071364473c772208d6074df481%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636704428739193212&amp;sdata=Bjxki%2FUtzqkJvBZr6pZg8yXscG%2BKc%2BKyoSsNsV5R2Ag%3D&amp;reserved=0
> >
> > These patches are located at branch 17.11 of dpdk-stable repo:
> >     https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdpdk.org%2Fbrowse%2Fdpdk-stable%2F&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C1d7083071364473c772208d6074df481%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636704428739193212&amp;sdata=KQpBKiCL1y%2BODQhYoiqkLSdE0PH7Amz8ryjUiWf3O70%3D&amp;reserved=0
> >
> > Thanks,
> > Yongseok
> >
> > ---
> > Adrien Mazarguil (2):
> >       maintainers: update for Mellanox PMDs
> >       net/mlx4: fix minor resource leak during init
> >
> > Ajit Khaparde (7):
> >       net/bnxt: fix HW Tx checksum offload check
> >       net/bnxt: fix set MTU
> >       net/bnxt: fix Rx ring count limitation
> >       net/bnxt: fix memory leaks in NVM commands
> >       net/bnxt: fix lock release on NVM write failure
> >       net/bnxt: check access denied for HWRM commands
> >       net/bnxt: fix RETA size
> >
> > Alejandro Lucero (1):
> >       net/nfp: fix field initialization in Tx descriptor
> >
> > Alok Makhariya (1):
> >       bus/dpaa: fix phandle support for Linux 4.16
> >
> > Anatoly Burakov (8):
> >       eal/linux: fix invalid syntax in interrupts
> >       eal/linux: fix uninitialized value
> >       test: fix EAL flags autotest on FreeBSD
> >       test: fix result printing
> >       test: fix code on report
> >       test: make autotest runner python 2/3 compliant
> >       test: print autotest categories
> >       test: improve filtering
> >
> > Andrew Rybchenko (2):
> >       net/sfc: cut non VLAN ID bits from TCI
> >       net/sfc: fix assert in set multicast address list
> >
> > Andy Green (1):
> >       ring: fix sign conversion warning
> >
> > Beilei Xing (3):
> >       net/i40e: fix shifts of 32-bit value
> >       net/i40e: fix packet type parsing with DDP
> >       net/i40e: fix setting TPID with AQ command
> >
> > Bruce Richardson (2):
> >       examples/exception_path: fix out-of-bounds read
> >       mk: fix permissions when using make install
> >
> > Chas Williams (2):
> >       net/bonding: always update bonding link status
> >       net/bonding: do not clear active slave count
> >
> > Dan Gora (1):
> >       kni: fix crash with null name
> >
> > Daria Kolistratova (1):
> >       net/ena: fix SIGFPE with 0 Rx queue
> >
> > Dariusz Stojaczyk (1):
> >       eal: fix return codes on thread naming failure
> >
> > David Marchand (1):
> >       net/bnxt: add missing ids in xstats
> >
> > Drocula Lambda (1):
> >       kni: fix build on RHEL 7.5
> >
> > Ferruh Yigit (2):
> >       kni: fix build with gcc 8.1
> >       net/thunderx: fix build with gcc optimization on
> >
> > Gavin Hu (3):
> >       mk: fix cross build
> >       net/dpaa2: remove loop for unused pool entries
> >       maintainers: claim maintainership for ARM v7 and v8
> >
> > Haiyue Wang (1):
> >       net/i40e: workaround performance degradation
> >
> > Harry van Haaren (1):
> >       event: fix ring init failure handling
> >
> > Hemant Agrawal (2):
> >       test/crypto: fix device id when stopping port
> >       bus/dpaa: fix buffer offset setting in FMAN
> >
> > Hyong Youb Kim (1):
> >       net/enic: do not overwrite admin Tx queue limit
> >
> > Ido Goshen (1):
> >       net/pcap: fix multiple queues
> >
> > Jananee Parthasarathy (1):
> >       mk: update targets for classified tests
> >
> > Jay Ding (1):
> >       net/bnxt: check for invalid vNIC id
> >
> > Jerin Jacob (2):
> >       ethdev: fix queue statistics mapping documentation
> >       eal: fix bitmap documentation
> >
> > Kiran Kumar (2):
> >       net/bonding: fix MAC address reset
> >       net/thunderx: avoid sq door bell write on zero packet
> >
> > Konstantin Ananyev (3):
> >       examples/ipsec-secgw: fix IPv4 checksum at Tx
> >       examples/ipsec-secgw: fix bypass rule processing
> >       app/testpmd: fix DCB config
> >
> > Maxime Coquelin (4):
> >       vhost: improve dirty pages logging performance
> >       vhost: fix missing increment of log cache count
> >       vhost: flush IOTLB cache on new mem table handling
> >       vhost: retranslate vring addr when memory table changes
> >
> > Moti Haimovsky (2):
> >       net/mlx5: fix build with old kernels
> >       net/mlx4: check RSS queues number limitation
> >
> > Nelio Laranjeiro (1):
> >       net/mlx5: fix TCI mask filter
> >
> > Nikhil Rao (5):
> >       eventdev: fix port in Rx adapter internal function
> >       eventdev: fix missing update to Rx adaper WRR position
> >       eventdev: add event buffer flush in Rx adapter
> >       eventdev: fix internal port logic in Rx adapter
> >       eventdev: fix Rx SW adapter stop
> >
> > Nithin Dabilpuram (1):
> >       app/testpmd: fix buffer leak in TM command
> >
> > Ophir Munk (1):
> >       net/mlx5: fix secondary process resource leakage
> >
> > Pablo de Lara (7):
> >       examples/l2fwd-crypto: fix digest with AEAD algo
> >       examples/l2fwd-crypto: check return value on IV size check
> >       examples/l2fwd-crypto: skip device not supporting operation
> >       test/hash: fix multiwriter with non consecutive cores
> >       test/hash: fix potential memory leak
> >       app/crypto-perf: fix auth IV offset
> >       hash: fix doxygen of return values
> >
> > Pavan Nikhilesh (2):
> >       event/octeontx: remove unnecessary port start and stop
> >       net/octeontx: fix stop clearing Rx/Tx functions
> >
> > Qi Zhang (1):
> >       vfio: fix PCI address comparison
> >
> > Radu Nicolau (3):
> >       security: fix crash on destroy null session
> >       test: fix uninitialized port configuration
> >       net/bonding: fix race condition
> >
> > Rafal Kozik (4):
> >       net/ena: fix GENMASK_ULL macro
> >       net/ena: set link speed as none
> >       net/ena: check pointer before memset
> >       net/ena: change memory type
> >
> > Rahul Lakkireddy (1):
> >       net/cxgbe: fix init failure due to new flash parts
> >
> > Rami Rosen (2):
> >       examples/l3fwd: remove useless include
> >       ethdev: fix a doxygen comment for port allocation
> >
> > Rasesh Mody (9):
> >       net/qede: fix VF MTU update
> >       net/qede: remove primary MAC removal
> >       net/qede: fix for devargs
> >       net/qede: fix default extended VLAN offload config
> >       doc: update qede management firmware guide
> >       net/qede/base: fix GRC attention callback
> >       net/bnx2x: fix FW command timeout during stop
> >       net/bnx2x: fix poll link status
> >       net/qede/base: fix to clear HW indication
> >
> > Remy Horton (4):
> >       bitrate: add sanity check on parameters
> >       metrics: add check for invalid key
> >       metrics: do not fail silently when uninitialised
> >       metrics: disallow null as metric name
> >
> > Reshma Pattan (2):
> >       test/flow_classify: fix return types
> >       mk: remove unnecessary test rules
> >
> > Rosen Xu (1):
> >       examples/flow_filtering: add flow director config for i40e
> >
> > Shahaf Shuler (1):
> >       net/mlx5: fix compilation for rdma-core v19
> >
> > Shahed Shaikh (7):
> >       net/qede: fix link change event notification
> >       net/qede: fix legacy interrupt mode
> >       net/qede: fix incorrect link status update
> >       net/qede: fix unicast MAC address handling in VF
> >       net/qede: fix interrupt handler unregister
> >       net/qede: fix MAC address removal failure message
> >       net/qede: fix ntuple filter configuration
> >
> > Shreyansh Jain (1):
> >       doc: fix bonding command in testpmd
> >
> > Somnath Kotur (3):
> >       net/bnxt: fix to move a flow to a different queue
> >       net/bnxt: use correct flags during VLAN configuration
> >       net/bnxt: fix filter freeing
> >
> > Thomas Monjalon (1):
> >       bus/dpaa: fix build
> >
> > Tomasz Duszynski (1):
> >       net/mvpp2: check pointer before using it
> >
> > Wei Zhao (7):
> >       net/ixgbe: add support for VLAN in IP mode FDIR
> >       net/ixgbe: fix tunnel id format error for FDIR
> >       net/ixgbe: fix tunnel type set error for FDIR
> >       net/ixgbe: fix mask bits register set error for FDIR
> >       app/testpmd: fix VLAN TCI mask set error for FDIR
> >       net/i40e: fix check of flow director programming status
> >       net/i40e: revert fix of flow director check
> >
> > Xiaoyun Li (1):
> >       net/i40e: fix link speed
> >
> > Xueming Li (1):
> >       net/mlx5: fix crash in device probe
> >
> > Yipeng Wang (3):
> >       hash: fix multiwriter lock memory allocation
> >       hash: fix a multi-writer race condition
> >       hash: fix key slot size accuracy
> >
> > Yongseok Koh (7):
> >       net/mlx5: fix Rx buffer replenishment threshold
> >       net/mlx5: add missing sanity checks for Tx completion queue
> >       net/mlx5: fix assert for Tx completion queue count
> >       net/mlx5: fix queue rollback when starting device
> >       net/mlx5: fix error number handling
> >       net/mlx5: preserve promiscuous flag for flow isolation mode
> >       net/mlx5: preserve allmulticast flag for flow isolation mode
> >

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] 18.05.1 patches review and test
@ 2018-08-22  7:25  2% Christian Ehrhardt
  2018-08-27  9:29  0% ` Christian Ehrhardt
  0 siblings, 1 reply; 200+ results
From: Christian Ehrhardt @ 2018-08-22  7:25 UTC (permalink / raw)
  To: christian.ehrhardt, dpdk stable; +Cc: dev

Hi all,

Here is a list of patches targeted for stable release 18.05.1. Please
help review and test. The planned date for the final release is August,
29th. Before that, please shout if anyone has objections with these
patches being applied.

Also for the companies committed to running regression tests,
please run the tests and report any issue before the release date.

A release candidate tarball can be found at:

    https://dpdk.org/browse/dpdk-stable/tag/?id=v18.05.1-rc1

These patches are located at branch 18.05 of dpdk-stable repo:

    https://git.dpdk.org/dpdk-stable/log/?h=18.05

Thanks.

Christian Ehrhardt <christian.ehrhardt@canonical.com>

---
Adrien Mazarguil (8):
      app/testpmd: fix crash when attaching a device
      net/mlx4: fix minor resource leak during init
      net/mlx5: fix errno object in probe function
      net/mlx5: fix missing errno in probe function
      net/mlx5: fix error message in probe function
      net/mlx5: fix invalid error check
      maintainers: update for Mellanox PMDs
      net/mlx5: fix invalid network interface index

Ajit Khaparde (11):
      net/bnxt: fix clear port stats
      net/bnxt: fix close operation
      net/bnxt: fix HW Tx checksum offload check
      net/bnxt: check filter type before clearing it
      net/bnxt: fix set MTU
      net/bnxt: fix incorrect IO address handling in Tx
      net/bnxt: fix Rx ring count limitation
      net/bnxt: fix memory leaks in NVM commands
      net/bnxt: fix lock release on NVM write failure
      net/bnxt: check access denied for HWRM commands
      net/bnxt: fix RETA size

Alejandro Lucero (2):
      net/nfp: fix unused header reference
      net/nfp: fix field initialization in Tx descriptor

Alok Makhariya (1):
      bus/dpaa: fix phandle support for Linux 4.16

Anatoly Burakov (14):
      ipc: fix locking while sending messages
      mem: fix alignment of requested virtual areas
      eal/bsd: fix memory segment index display
      malloc: fix pad erasing
      eal/linux: fix invalid syntax in interrupts
      eal/linux: fix uninitialized value
      vfio: fix uninitialized variable
      malloc: do not skip pad on free
      test: fix EAL flags autotest on FreeBSD
      test: fix result printing
      test: fix code on report
      test: make autotest runner python 2/3 compliant
      test: print autotest categories
      test: improve filtering

Andrew Rybchenko (7):
      net/sfc: cut non VLAN ID bits from TCI
      net/sfc: discard packets with bad CRC on EF10 ESSB Rx
      net/sfc: fix double-free in EF10 ESSB Rx queue purge
      net/sfc: move Rx checksum offload check to device level
      net/sfc: fix Rx queue offloads reporting in queue info
      net/sfc: fix assert in set multicast address list
      net/sfc: handle unknown L3 packet class in EF10 event parser

Andy Green (2):
      ring: fix declaration after statement
      ring: fix sign conversion warning

Beilei Xing (5):
      net/i40e: fix shifts of 32-bit value
      net/i40e: fix PPPoL2TP packet type parsing
      net/i40e: fix packet type parsing with DDP
      net/i40e: fix setting TPID with AQ command
      net/i40e: fix device parameter parsing

Bruce Richardson (3):
      eal: fix error message for unsupported platforms
      examples/exception_path: fix out-of-bounds read
      mk: fix permissions when using make install

Chas Williams (2):
      net/bonding: always update bonding link status
      net/bonding: do not clear active slave count

Christian Ehrhardt (2):
      FIXUP: net/mlx5: fix invalid network interface index
      version: 18.05.1-rc1

Damjan Marion (1):
      net/i40e: do not reset device info data

Dan Gora (1):
      kni: fix crash with null name

Daria Kolistratova (1):
      net/ena: fix SIGFPE with 0 Rx queue

Dariusz Stojaczyk (7):
      mem: do not leave unmapped holes in EAL memory area
      mem: do not unmap overlapping region on mmap failure
      mem: avoid crash on memseg query with invalid address
      mem: fix alignment requested with --base-virtaddr
      mem: do not use --base-virtaddr in secondary processes
      eal: fix return codes on thread naming failure
      eal: fix return codes on control thread failure

David Marchand (1):
      net/bnxt: add missing ids in xstats

Drocula Lambda (1):
      kni: fix build on RHEL 7.5

Fan Zhang (1):
      crypto/virtio: fix IV physical address

Ferruh Yigit (4):
      kni: fix build with gcc 8.1
      net/thunderx: fix build with gcc optimization on
      app/testpmd: fix typo in setting Tx offload command
      drivers/net: fix crash in secondary process

Gage Eads (1):
      net: rename u16 to fix shadowed declaration

Gavin Hu (5):
      mk: fix cross build
      devtools: fix ninja command in build test
      build: fix for host clang and cross gcc
      net/dpaa2: remove loop for unused pool entries
      maintainers: claim maintainership for ARM v7 and v8

Haiyue Wang (1):
      net/i40e: workaround performance degradation

Harry van Haaren (2):
      net/i40e: fix rearm check in AVX2 Rx
      event: fix ring init failure handling

Hemant Agrawal (8):
      doc: fix limitations for dpaa crypto
      doc: fix limitations for dpaa2 crypto
      test/crypto: fix device id when stopping port
      bus/dpaa: fix SVR id fetch location
      bus/dpaa: fix buffer offset setting in FMAN
      net/dpaa: fix queue error handling and logs
      net/dpaa2: fix prefetch Rx to honor number of packets
      raw/dpaa2_qdma: fix IOVA as VA flag

Hyong Youb Kim (4):
      net/enic: fix receive packet types
      net/enic: update the UDP RSS detection mechanism
      net/enic: do not overwrite admin Tx queue limit
      net/enic: initialize RQ fetch index before enabling RQ

Ido Goshen (1):
      net/pcap: fix multiple queues

Igor Romanov (1):
      net/sfc: fix filter exceptions logic

Jananee Parthasarathy (1):
      mk: update targets for classified tests

Jay Ding (1):
      net/bnxt: check for invalid vNIC id

Jerin Jacob (3):
      doc: fix octeontx eventdev selftest argument
      ethdev: fix queue statistics mapping documentation
      eal: fix bitmap documentation

Kiran Kumar (3):
      net/bonding: fix MAC address reset
      ethdev: check queue stats mapping input arguments
      net/thunderx: avoid sq door bell write on zero packet

Konstantin Ananyev (3):
      examples/ipsec-secgw: fix IPv4 checksum at Tx
      examples/ipsec-secgw: fix bypass rule processing
      app/testpmd: fix DCB config

Krzysztof Kanas (2):
      app/testpmd: fix crash on TM command error
      app/testpmd: fix help for TM commit command

Lee Daly (1):
      compress/isal: fix offset usage

Matan Azrad (1):
      net/tap: fix zeroed flow mask configurations

Maxime Coquelin (2):
      vhost: fix missing increment of log cache count
      vhost: flush IOTLB cache on new mem table handling

Moti Haimovsky (2):
      net/mlx4: check RSS queues number limitation
      net/mlx4: advertise Rx jumbo frame support

Nelio Laranjeiro (3):
      net/mlx5: clean-up developer logs
      app/testpmd: fix missing count action fields
      net/mlx5: fix TCI mask filter

Nikhil Rao (5):
      eventdev: fix port in Rx adapter internal function
      eventdev: fix missing update to Rx adaper WRR position
      eventdev: add event buffer flush in Rx adapter
      eventdev: fix internal port logic in Rx adapter
      eventdev: fix Rx SW adapter stop

Nithin Dabilpuram (1):
      app/testpmd: fix buffer leak in TM command

Ophir Munk (1):
      net/mlx5: fix secondary process resource leakage

Pablo de Lara (13):
      cryptodev: fix ABI breakage
      net/ixgbe: fix crash on detach
      compress/isal: fix log type name
      compress/isal: set null pointer after freeing
      compress/isal: fix memory leak
      examples/l2fwd-crypto: fix digest with AEAD algo
      examples/l2fwd-crypto: check return value on IV size check
      examples/l2fwd-crypto: skip device not supporting operation
      devtools: remove already enabled nfp from build test
      test/hash: fix multiwriter with non consecutive cores
      test/hash: fix potential memory leak
      app/crypto-perf: fix auth IV offset
      hash: fix doxygen of return values

Pavan Nikhilesh (5):
      event/octeontx: fix flush callback
      mempool/octeontx: fix pool to aura mapping
      app/eventdev: fix order test service init
      event/octeontx: remove unnecessary port start and stop
      net/octeontx: fix stop clearing Rx/Tx functions

Qi Zhang (4):
      eal: fix hotplug add and remove
      vfio: fix PCI address comparison
      vfio: remove uneccessary IPC for group fd clear
      net/ixgbe: fix missing null check on detach

Radu Nicolau (4):
      security: fix crash on destroy null session
      net/bonding: fix invalid port id
      test: fix uninitialized port configuration
      net/bonding: fix race condition

Rafal Kozik (4):
      net/ena: check pointer before memset
      net/ena: change memory type
      net/ena: fix GENMASK_ULL macro
      net/ena: set link speed as none

Rahul Lakkireddy (4):
      net/cxgbe: report configured link auto-negotiation
      net/cxgbe: fix Rx channel map and queue type
      net/cxgbevf: add missing Tx byte counters
      net/cxgbe: fix init failure due to new flash parts

Rami Rosen (2):
      examples/l3fwd: remove useless include
      ethdev: fix a doxygen comment for port allocation

Rasesh Mody (11):
      net/qede: fix VF MTU update
      net/qede: fix for devargs
      net/qede: fix L2-handles used for RSS hash update
      net/qede: fix memory alloc for multiple port reconfig
      net/qede: remove primary MAC removal
      doc: update qede management firmware guide
      net/qede: fix default extended VLAN offload config
      net/qede/base: fix to clear HW indication
      net/qede/base: fix GRC attention callback
      net/bnx2x: fix FW command timeout during stop
      net/bnx2x: fix poll link status

Remy Horton (4):
      bitrate: add sanity check on parameters
      metrics: add check for invalid key
      metrics: do not fail silently when uninitialised
      metrics: disallow null as metric name

Reshma Pattan (3):
      test/flow_classify: fix return types
      mk: remove unnecessary test rules
      latency: free up the memzone

Rosen Xu (1):
      examples/flow_filtering: add flow director config for i40e

Shahaf Shuler (2):
      net/mlx5: separate generic tunnel TSO from the standard one
      net/mlx5: fix build with rdma-core v19

Shahed Shaikh (8):
      net/qede: fix incorrect link status update
      net/qede: fix link change event notification
      net/qede: fix unicast MAC address handling in VF
      net/qede: fix legacy interrupt mode
      net/qede: fix Rx/Tx offload flags
      net/qede: fix interrupt handler unregister
      net/qede: fix MAC address removal failure message
      net/qede: fix ntuple filter configuration

Shaopeng He (1):
      net/i40e: fix Tx queue setup after stop

Shreyansh Jain (1):
      doc: fix bonding command in testpmd

Somnath Kotur (4):
      net/bnxt: revert reset of L2 filter id
      net/bnxt: fix to move a flow to a different queue
      net/bnxt: use correct flags during VLAN configuration
      net/bnxt: fix filter freeing

Stephen Hemminger (2):
      net/mlx5: fix log initialization
      doc: fix typo in vdev_netvsc guide

Thomas Monjalon (2):
      bus/dpaa: fix build
      net/fm10k: remove unused constant

Timothy Redaelli (2):
      net/mlx4: avoid stripping the glue library
      net/mlx5: avoid stripping the glue library

Tiwei Bie (1):
      vhost: release locks on RARP packet failure

Tomasz Duszynski (1):
      net/mvpp2: check pointer before using it

Wei Zhao (7):
      net/ixgbe: add support for VLAN in IP mode FDIR
      net/ixgbe: fix tunnel id format error for FDIR
      net/ixgbe: fix tunnel type set error for FDIR
      net/ixgbe: fix mask bits register set error for FDIR
      app/testpmd: fix VLAN TCI mask set error for FDIR
      net/i40e: fix check of flow director programming status
      net/i40e: revert fix of flow director check

Xiaoxin Peng (1):
      net/bnxt: fix Tx with multiple mbuf

Xiaoyun Li (3):
      net/i40e: fix link speed
      app/testpmd: fix little performance drop
      net/avf: fix offload capabilities

Xueming Li (1):
      net/mlx5: fix crash in device probe

Yaroslav Brustinov (1):
      net/mlx5: fix linkage of glue lib with gcc 4.7.2

Yipeng Wang (3):
      hash: fix multiwriter lock memory allocation
      hash: fix a multi-writer race condition
      hash: fix key slot size accuracy

Yongseok Koh (6):
      net/mlx5: fix error number handling
      net/mlx5: fix Rx buffer replenishment threshold
      net/mlx5: fix assert for Tx completion queue count
      net/mlx5: fix queue rollback when starting device
      net/mlx5: preserve promiscuous flag for flow isolation mode
      net/mlx5: preserve allmulticast flag for flow isolation mode

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: make rte_eth_is_valid_owner_id return bool
  2018-08-21 15:48  0%         ` Matan Azrad
@ 2018-08-21 18:31  0%           ` Stephen Hemminger
  2018-08-26  7:49  0%             ` Matan Azrad
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-08-21 18:31 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Stephen Hemminger

On Tue, 21 Aug 2018 15:48:19 +0000
Matan Azrad <matan@mellanox.com> wrote:

> Hi
> 
> From: Stephen Hemminger
> > On Tue, 21 Aug 2018 10:20:43 +0000
> > Matan Azrad <matan@mellanox.com> wrote:
> >   
> > > From: Stephen Hemminger  
> > > > Function is boolean so use that.  
> > >
> > > Ethdev is not using bool type, see also:
> > > rte_eth_dev_is_valid_port
> > > rte_eth_dev_is_removed
> > > rte_eth_dev_pool_ops_supported
> > >
> > > I think it should be a full solution to all.
> > >  
> > > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>  
> > 
> > I didn't want change type of visible (exported by ABI) functions.
> >   
> Since ethdev now is not using bool type I think it's better not to change it only for this API.

I hate to pick nits but there is already a bool usage in internal
function (static) in ethdev.


static bool
is_allocated(const struct rte_eth_dev *ethdev)
{
	return ethdev->data->name[0] != '\0';
}

Using bool functions doesn't really generate different code. It is is more
about using modern C conventions.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: make rte_eth_is_valid_owner_id return bool
  2018-08-21 15:06  3%       ` Stephen Hemminger
@ 2018-08-21 15:48  0%         ` Matan Azrad
  2018-08-21 18:31  0%           ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Matan Azrad @ 2018-08-21 15:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Stephen Hemminger

Hi

From: Stephen Hemminger
> On Tue, 21 Aug 2018 10:20:43 +0000
> Matan Azrad <matan@mellanox.com> wrote:
> 
> > From: Stephen Hemminger
> > > Function is boolean so use that.
> >
> > Ethdev is not using bool type, see also:
> > rte_eth_dev_is_valid_port
> > rte_eth_dev_is_removed
> > rte_eth_dev_pool_ops_supported
> >
> > I think it should be a full solution to all.
> >
> > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> 
> I didn't want change type of visible (exported by ABI) functions.
> 
Since ethdev now is not using bool type I think it's better not to change it only for this API.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] ethdev: make rte_eth_is_valid_owner_id return bool
  @ 2018-08-21 15:06  3%       ` Stephen Hemminger
  2018-08-21 15:48  0%         ` Matan Azrad
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-08-21 15:06 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev, Stephen Hemminger

On Tue, 21 Aug 2018 10:20:43 +0000
Matan Azrad <matan@mellanox.com> wrote:

> From: Stephen Hemminger
> > Function is boolean so use that.  
> 
> Ethdev is not using bool type, see also:
> rte_eth_dev_is_valid_port
> rte_eth_dev_is_removed
> rte_eth_dev_pool_ops_supported
> 
> I think it should be a full solution to all.
>  
> > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>

I didn't want change type of visible (exported by ABI) functions.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  2018-08-16  6:19  4%           ` Rao, Nikhil
@ 2018-08-16 10:42  4%             ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-08-16 10:42 UTC (permalink / raw)
  To: Rao, Nikhil
  Cc: dev, thomas, Mcnamara, John, Richardson, Bruce, Yigit, Ferruh,
	stephen, toggle-mailboxes

On Thu, Aug 16, 2018 at 06:19:40AM +0000, Rao, Nikhil wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Wednesday, August 15, 2018 4:19 PM
> > To: Rao, Nikhil <nikhil.rao@intel.com>
> > Cc: dev@dpdk.org; thomas@monjalon.net; Mcnamara, John
> > <john.mcnamara@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> > stephen@networkplumber.org; toggle-mailboxes@hmswarspite.think-
> > freely.org
> > Subject: Re: [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
> > 
> > 
> > 
> > Thanks, I think I made a mistake in how I detect section names in the awk
> > script.  The rule assumes that the entire section is getting added (i.e. we are
> > adding the EXPERIMENTAL section as a whole unit, hence the starting a line
> > with
> > + to id the section name, and thats not the case here.  I think the rule
> > + needs
> > to be any line in a map file that ends with a { (based on our coding practice), is
> > a section start, and the section name is the next to the last field in the line (i.e.
> > $(NF-1) ).  Please apply the patch below and confirm that this works for you.
> > 
> > Best
> > Neil
> > 
> > 
> Thanks for the fix, it works.
> 
> Nikhil
Thanks, I'll propose it as a fix here shortly.
Neil

> 
> > 
> > diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-
> > change.sh index daaf45e14..cf9cfc745 100755
> > --- a/devtools/check-symbol-change.sh
> > +++ b/devtools/check-symbol-change.sh
> > @@ -25,14 +25,14 @@ build_map_changes()
> >  		# supresses the subordonate rules below
> >  		/[-+] a\/.*\.^(map)/ {in_map=0}
> > 
> > -		# Triggering this rule, which starts a line with a + and ends it
> > +		# Triggering this rule, which starts a line and ends it
> >  		# with a { identifies a versioned section.  The section name is
> >  		# the rest of the line with the + and { symbols remvoed.
> >  		# Triggering this rule sets in_sec to 1, which actives the
> >  		# symbol rule below
> > -		/+.*{/ {gsub("+","");
> > +		/^.*{/ {
> >  			if (in_map == 1) {
> > -				sec=$1; in_sec=1;
> > +				sec=$(NF-1); in_sec=1;
> >  			}
> >  		}
> > 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  2018-08-15 10:48  4%         ` Neil Horman
@ 2018-08-16  6:19  4%           ` Rao, Nikhil
  2018-08-16 10:42  4%             ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Rao, Nikhil @ 2018-08-16  6:19 UTC (permalink / raw)
  To: Neil Horman
  Cc: dev, thomas, Mcnamara, John, Richardson, Bruce, Yigit, Ferruh,
	stephen, toggle-mailboxes



> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Wednesday, August 15, 2018 4:19 PM
> To: Rao, Nikhil <nikhil.rao@intel.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; Mcnamara, John
> <john.mcnamara@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>;
> stephen@networkplumber.org; toggle-mailboxes@hmswarspite.think-
> freely.org
> Subject: Re: [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
> 
> 
> 
> Thanks, I think I made a mistake in how I detect section names in the awk
> script.  The rule assumes that the entire section is getting added (i.e. we are
> adding the EXPERIMENTAL section as a whole unit, hence the starting a line
> with
> + to id the section name, and thats not the case here.  I think the rule
> + needs
> to be any line in a map file that ends with a { (based on our coding practice), is
> a section start, and the section name is the next to the last field in the line (i.e.
> $(NF-1) ).  Please apply the patch below and confirm that this works for you.
> 
> Best
> Neil
> 
> 
Thanks for the fix, it works.

Nikhil

> 
> diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-
> change.sh index daaf45e14..cf9cfc745 100755
> --- a/devtools/check-symbol-change.sh
> +++ b/devtools/check-symbol-change.sh
> @@ -25,14 +25,14 @@ build_map_changes()
>  		# supresses the subordonate rules below
>  		/[-+] a\/.*\.^(map)/ {in_map=0}
> 
> -		# Triggering this rule, which starts a line with a + and ends it
> +		# Triggering this rule, which starts a line and ends it
>  		# with a { identifies a versioned section.  The section name is
>  		# the rest of the line with the + and { symbols remvoed.
>  		# Triggering this rule sets in_sec to 1, which actives the
>  		# symbol rule below
> -		/+.*{/ {gsub("+","");
> +		/^.*{/ {
>  			if (in_map == 1) {
> -				sec=$1; in_sec=1;
> +				sec=$(NF-1); in_sec=1;
>  			}
>  		}
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v15 7/7] doc: update release notes for mulit-process hotplug
  2018-08-16  3:04  1% ` [dpdk-dev] [PATCH v15 0/7] enable hotplug on multi-process Qi Zhang
@ 2018-08-16  3:04  4%   ` Qi Zhang
  0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-08-16  3:04 UTC (permalink / raw)
  To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

Update release notes for the new multi-process hotplug feature.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/rel_notes/release_18_11.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..f08793c17 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -54,6 +54,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Support device multi-process hotplug.**
+
+  Hotplug and hot-unplug for devices will now be supported in multiprocessing
+  scenario. Any ethdev devices created in the primary process will be regarded
+  as shared and will be available for all DPDK processes. Synchronization between
+  processes will be done using DPDK IPC.
 
 API Changes
 -----------
@@ -68,6 +74,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+  In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+  that device be attached on all processes, while ``rte_eal_hotplug_remove``
+  will guarantee device be detached on all processes.
 
 ABI Changes
 -----------
-- 
2.13.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v15 0/7] enable hotplug on multi-process
                     ` (6 preceding siblings ...)
  2018-08-10  0:42  1% ` [dpdk-dev] [PATCH v14 0/6] " Qi Zhang
@ 2018-08-16  3:04  1% ` Qi Zhang
  2018-08-16  3:04  4%   ` [dpdk-dev] [PATCH v15 7/7] doc: update release notes for mulit-process hotplug Qi Zhang
  2018-09-28  4:23  1% ` [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process Qi Zhang
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-08-16  3:04 UTC (permalink / raw)
  To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.

v14:
- rebase.
- All changes belongs to patch 1/6.
  1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
     since it is only used by secondary process.
  2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
     I think its better to call rte_eth_dev_release_port_secondary after
     dev_uninit since it is possible that secondary process need to release
     some local resources in dev_uninit before release the port and return.
     Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
     because there is no special handle in all exist dev_uninit for secondary
     process.
  3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
     general step, so we don't need patches for i40e and ixgbe.
  4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
  when detaching a device. It is possible that device is not synced during
  some situation, so attach an exist device in primary still need to sync
  with secondary. Also, it's not necessary to rollback if we fail to
  attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11: - move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (7):
  ethdev: add function to release port in secondary process
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  drivers/net: enable hotplug on secondary process
  drivers/net: enable device detach on secondary
  examples/multi_process: add hotplug sample
  doc: update release notes for mulit-process hotplug

 doc/guides/rel_notes/release_18_11.rst       |  11 +
 drivers/net/af_packet/rte_eth_af_packet.c    |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c               |   6 +-
 drivers/net/bonding/rte_eth_bond_pmd.c       |   6 +-
 drivers/net/ena/ena_ethdev.c                 |   2 +-
 drivers/net/kni/rte_eth_kni.c                |   6 +-
 drivers/net/liquidio/lio_ethdev.c            |   2 +-
 drivers/net/null/rte_eth_null.c              |   6 +-
 drivers/net/octeontx/octeontx_ethdev.c       |   8 +
 drivers/net/pcap/rte_eth_pcap.c              |   6 +-
 drivers/net/tap/rte_eth_tap.c                |   8 +-
 drivers/net/vhost/rte_eth_vhost.c            |   6 +-
 drivers/net/virtio/virtio_ethdev.c           |   2 +-
 examples/multi_process/Makefile              |   1 +
 examples/multi_process/hotplug_mp/Makefile   |  23 ++
 examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++++
 examples/multi_process/hotplug_mp/commands.h |  10 +
 examples/multi_process/hotplug_mp/main.c     |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile           |   1 +
 lib/librte_eal/common/eal_common_dev.c       | 198 ++++++++++++++-
 lib/librte_eal/common/eal_private.h          |  37 +++
 lib/librte_eal/common/hotplug_mp.c           | 363 +++++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h           |  48 ++++
 lib/librte_eal/common/include/rte_dev.h      |   6 +
 lib/librte_eal/common/meson.build            |   1 +
 lib/librte_eal/linuxapp/eal/Makefile         |   1 +
 lib/librte_eal/linuxapp/eal/eal.c            |   6 +
 lib/librte_ethdev/rte_ethdev.c               |  17 +-
 lib/librte_ethdev/rte_ethdev_driver.h        |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h           |  10 +-
 lib/librte_ethdev/rte_ethdev_version.map     |   7 +
 31 files changed, 1046 insertions(+), 29 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  2018-08-15  6:10  2%       ` Nikhil Rao
@ 2018-08-15 10:48  4%         ` Neil Horman
  2018-08-16  6:19  4%           ` Rao, Nikhil
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-08-15 10:48 UTC (permalink / raw)
  To: Nikhil Rao
  Cc: dev, thomas, john.mcnamara, bruce.richardson, ferruh.yigit,
	stephen, toggle-mailboxes

On Wed, Aug 15, 2018 at 11:40:42AM +0530, Nikhil Rao wrote:
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > I was about to say that its because you've not got enough context to let the
> > awk file figure out what your section name is, but that doesn't appear to be
> > the case. Can you provide the exact command line you are running to do your
> > symbol check, as well as the full patch that you are checking?  I'd like to try
> > recreate the issue here
> > 
> > Best
> > Neil
> > 
> 
> Complete patch is below
> 
> 

Thanks, I think I made a mistake in how I detect section names in the awk
script.  The rule assumes that the entire section is getting added (i.e. we are
adding the EXPERIMENTAL section as a whole unit, hence the starting a line with
+ to id the section name, and thats not the case here.  I think the rule needs
to be any line in a map file that ends with a { (based on our coding practice),
is a section start, and the section name is the next to the last field in the
line (i.e. $(NF-1) ).  Please apply the patch below and confirm that this works
for you.

Best
Neil



diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
index daaf45e14..cf9cfc745 100755
--- a/devtools/check-symbol-change.sh
+++ b/devtools/check-symbol-change.sh
@@ -25,14 +25,14 @@ build_map_changes()
 		# supresses the subordonate rules below
 		/[-+] a\/.*\.^(map)/ {in_map=0}
 
-		# Triggering this rule, which starts a line with a + and ends it
+		# Triggering this rule, which starts a line and ends it
 		# with a { identifies a versioned section.  The section name is
 		# the rest of the line with the + and { symbols remvoed.
 		# Triggering this rule sets in_sec to 1, which actives the
 		# symbol rule below
-		/+.*{/ {gsub("+","");
+		/^.*{/ {
 			if (in_map == 1) {
-				sec=$1; in_sec=1;
+				sec=$(NF-1); in_sec=1;
 			}
 		}
 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  2018-08-14 11:04  4%     ` Neil Horman
@ 2018-08-15  6:10  2%       ` Nikhil Rao
  2018-08-15 10:48  4%         ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Nikhil Rao @ 2018-08-15  6:10 UTC (permalink / raw)
  To: nhorman
  Cc: dev, thomas, john.mcnamara, bruce.richardson, ferruh.yigit,
	stephen, toggle-mailboxes, nikhil.rao

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> I was about to say that its because you've not got enough context to let the
> awk file figure out what your section name is, but that doesn't appear to be
> the case. Can you provide the exact command line you are running to do your
> symbol check, as well as the full patch that you are checking?  I'd like to try
> recreate the issue here
> 
> Best
> Neil
> 

Complete patch is below

Thanks,
Nikhil

From: Nikhil Rao <nikhil.rao@intel.com>
Date: Thu, 5 Jul 2018 14:17:16 +0530
Subject: [PATCH v2 3/4] eventdev: add eth Tx adapter implementation

This patch implements the Tx adapter APIs by invoking the
corresponding eventdev PMD callbacks and also provides
the common rte_service function based implementation when
the eventdev PMD support is absent.

Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
---
 config/rte_config.h                            |    1 +
 lib/librte_eventdev/rte_event_eth_tx_adapter.c | 1120 ++++++++++++++++++++++++
 config/common_base                             |    2 +-
 lib/librte_eventdev/Makefile                   |    2 +
 lib/librte_eventdev/meson.build                |    6 +-
 lib/librte_eventdev/rte_eventdev_version.map   |   13 +
 6 files changed, 1141 insertions(+), 3 deletions(-)
 create mode 100644 lib/librte_eventdev/rte_event_eth_tx_adapter.c

diff --git a/config/rte_config.h b/config/rte_config.h
index b1fb8cd..63f51a9 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -66,6 +66,7 @@
 #define RTE_EVENT_TIMER_ADAPTER_NUM_MAX 32
 #define RTE_EVENT_ETH_INTR_RING_SIZE 1024
 #define RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE 32
+#define RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE 32
 
 /* rawdev defines */
 #define RTE_RAWDEV_MAX_DEVS 10
diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.c b/lib/librte_eventdev/rte_event_eth_tx_adapter.c
new file mode 100644
index 0000000..05253d4
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.c
@@ -0,0 +1,1120 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation.
+ */
+#include <rte_spinlock.h>
+#include <rte_service_component.h>
+#include <rte_ethdev.h>
+
+#include "rte_eventdev_pmd.h"
+#include "rte_event_eth_tx_adapter.h"
+
+#define TXA_BATCH_SIZE		32
+#define TXA_SERVICE_NAME_LEN	32
+#define TXA_MEM_NAME_LEN	32
+#define TXA_FLUSH_THRESHOLD	1024
+#define TXA_RETRY_CNT		100
+#define TXA_MAX_NB_TX		128
+#define TXA_INVALID_DEV_ID	INT32_C(-1)
+#define TXA_INVALID_SERVICE_ID	INT64_C(-1)
+
+#define txa_evdev(id) (&rte_eventdevs[txa_dev_id_array[(id)]])
+
+#define txa_dev_caps_get(id) txa_evdev((id))->dev_ops->eth_tx_adapter_caps_get
+
+#define txa_dev_adapter_create(t) txa_evdev(t)->dev_ops->eth_tx_adapter_create
+
+#define txa_dev_adapter_create_ext(t) \
+				txa_evdev(t)->dev_ops->eth_tx_adapter_create
+
+#define txa_dev_adapter_free(t) txa_evdev(t)->dev_ops->eth_tx_adapter_free
+
+#define txa_dev_queue_add(id) txa_evdev(id)->dev_ops->eth_tx_adapter_queue_add
+
+#define txa_dev_queue_del(t) txa_evdev(t)->dev_ops->eth_tx_adapter_queue_del
+
+#define txa_dev_start(t) txa_evdev(t)->dev_ops->eth_tx_adapter_start
+
+#define txa_dev_stop(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stop
+
+#define txa_dev_stats_reset(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stats_reset
+
+#define txa_dev_stats_get(t) txa_evdev(t)->dev_ops->eth_tx_adapter_stats_get
+
+#define RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET(id, retval) \
+do { \
+	if (!txa_valid_id(id)) { \
+		RTE_EDEV_LOG_ERR("Invalid eth Rx adapter id = %d", id); \
+		return retval; \
+	} \
+} while (0)
+
+#define TXA_CHECK_OR_ERR_RET(id) \
+do {\
+	int ret; \
+	RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET((id), -EINVAL); \
+	ret = txa_init(); \
+	if (ret != 0) \
+		return ret; \
+	if (!txa_adapter_exist((id))) \
+		return -EINVAL; \
+} while (0)
+
+/* Tx retry callback structure */
+struct txa_retry {
+	/* Ethernet port id */
+	uint16_t port_id;
+	/* Tx queue */
+	uint16_t tx_queue;
+	/* Adapter ID */
+	uint8_t id;
+};
+
+/* Per queue structure */
+struct txa_service_queue_info {
+	/* Queue has been added */
+	uint8_t added;
+	/* Retry callback argument */
+	struct txa_retry txa_retry;
+	/* Tx buffer */
+	struct rte_eth_dev_tx_buffer *tx_buf;
+};
+
+/* PMD private structure */
+struct txa_service_data {
+	/* Max mbufs processed in any service function invocation */
+	uint32_t max_nb_tx;
+	/* Number of Tx queues in adapter */
+	uint32_t nb_queues;
+	/*  Synchronization with data path */
+	rte_spinlock_t tx_lock;
+	/* Event port ID */
+	uint8_t port_id;
+	/* Event device identifier */
+	uint8_t eventdev_id;
+	/* Highest port id supported + 1 */
+	uint16_t dev_count;
+	/* Loop count to flush Tx buffers */
+	int loop_cnt;
+	/* Per ethernet device structure */
+	struct txa_service_ethdev *txa_ethdev;
+	/* Statistics */
+	struct rte_event_eth_tx_adapter_stats stats;
+	/* Adapter Identifier */
+	uint8_t id;
+	/* Conf arg must be freed */
+	uint8_t conf_free;
+	/* Configuration callback */
+	rte_event_eth_tx_adapter_conf_cb conf_cb;
+	/* Configuration callback argument */
+	void *conf_arg;
+	/* socket id */
+	int socket_id;
+	/* Per adapter EAL service */
+	int64_t service_id;
+	/* Memory allocation name */
+	char mem_name[TXA_MEM_NAME_LEN];
+} __rte_cache_aligned;
+
+/* Per eth device structure */
+struct txa_service_ethdev {
+	/* Pointer to ethernet device */
+	struct rte_eth_dev *dev;
+	/* Number of queues added */
+	uint16_t nb_queues;
+	/* PMD specific queue data */
+	void *queues;
+};
+
+/* Array of adapter instances, initialized with event device id
+ * when adapter is created
+ */
+static int *txa_dev_id_array;
+
+/* Array of pointers to service implementation data */
+static struct txa_service_data **txa_service_data_array;
+
+static int32_t txa_service_func(void *args);
+static int txa_service_adapter_create_ext(uint8_t id,
+			struct rte_eventdev *dev,
+			rte_event_eth_tx_adapter_conf_cb conf_cb,
+			void *conf_arg);
+static int txa_service_queue_del(uint8_t id,
+				const struct rte_eth_dev *dev,
+				int32_t tx_queue_id);
+
+static int
+txa_adapter_exist(uint8_t id)
+{
+	return txa_dev_id_array[id] != TXA_INVALID_DEV_ID;
+}
+
+static inline int
+txa_valid_id(uint8_t id)
+{
+	return id < RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE;
+}
+
+static void *
+txa_memzone_array_get(const char *name, unsigned int elt_size, int nb_elems)
+{
+	const struct rte_memzone *mz;
+	unsigned int sz;
+
+	sz = elt_size * nb_elems;
+	sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		mz = rte_memzone_reserve_aligned(name, sz, rte_socket_id(), 0,
+						 RTE_CACHE_LINE_SIZE);
+		if (mz == NULL) {
+			RTE_EDEV_LOG_ERR("failed to reserve memzone"
+					" name = %s err = %"
+					PRId32, name, rte_errno);
+			return NULL;
+		}
+	}
+
+	return  mz->addr;
+}
+
+static int
+txa_dev_id_array_init(void)
+{
+	if (txa_dev_id_array == NULL) {
+		int i;
+
+		txa_dev_id_array = txa_memzone_array_get("txa_adapter_array",
+					sizeof(int),
+					RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE);
+		if (txa_dev_id_array == NULL)
+			return -ENOMEM;
+
+		for (i = 0; i < RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE; i++)
+			txa_dev_id_array[i] = TXA_INVALID_DEV_ID;
+	}
+
+	return 0;
+}
+
+static int
+txa_init(void)
+{
+	return txa_dev_id_array_init();
+}
+
+static int
+txa_service_data_init(void)
+{
+	if (txa_service_data_array == NULL) {
+		txa_service_data_array =
+				txa_memzone_array_get("txa_service_data_array",
+					sizeof(int),
+					RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE);
+		if (txa_service_data_array == NULL)
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static inline struct txa_service_data *
+txa_service_id_to_data(uint8_t id)
+{
+	return txa_service_data_array[id];
+}
+
+static inline struct txa_service_queue_info *
+txa_service_queue(struct txa_service_data *txa, uint16_t port_id,
+		uint16_t tx_queue_id)
+{
+	struct txa_service_queue_info *tqi;
+
+	if (unlikely(txa->txa_ethdev == NULL || txa->dev_count < port_id + 1))
+		return NULL;
+
+	tqi = txa->txa_ethdev[port_id].queues;
+
+	return likely(tqi != NULL) ? tqi + tx_queue_id : NULL;
+}
+
+static int
+txa_service_conf_cb(uint8_t __rte_unused id, uint8_t dev_id,
+		struct rte_event_eth_tx_adapter_conf *conf, void *arg)
+{
+	int ret;
+	struct rte_eventdev *dev;
+	struct rte_event_port_conf *pc;
+	struct rte_event_dev_config dev_conf;
+	int started;
+	uint8_t port_id;
+
+	pc = arg;
+	dev = &rte_eventdevs[dev_id];
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret) {
+		RTE_EDEV_LOG_ERR("failed to configure event dev %u",
+						dev_id);
+		if (started) {
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+		}
+		return ret;
+	}
+
+	pc->disable_implicit_release = 0;
+	ret = rte_event_port_setup(dev_id, port_id, pc);
+	if (ret) {
+		RTE_EDEV_LOG_ERR("failed to setup event port %u\n",
+					port_id);
+		if (started) {
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+		}
+		return ret;
+	}
+
+	conf->event_port_id = port_id;
+	conf->max_nb_tx = TXA_MAX_NB_TX;
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+	return ret;
+}
+
+static int
+txa_service_ethdev_alloc(struct txa_service_data *txa)
+{
+	struct txa_service_ethdev *txa_ethdev;
+	uint16_t i, dev_count;
+
+	dev_count = rte_eth_dev_count_avail();
+	if (txa->txa_ethdev && dev_count == txa->dev_count)
+		return 0;
+
+	txa_ethdev = rte_zmalloc_socket(txa->mem_name,
+					dev_count * sizeof(*txa_ethdev),
+					0,
+					txa->socket_id);
+	if (txa_ethdev == NULL) {
+		RTE_EDEV_LOG_ERR("Failed to alloc txa::txa_ethdev ");
+		return -ENOMEM;
+	}
+
+	if (txa->dev_count)
+		memcpy(txa_ethdev, txa->txa_ethdev,
+			txa->dev_count * sizeof(*txa_ethdev));
+
+	RTE_ETH_FOREACH_DEV(i) {
+		if (i == dev_count)
+			break;
+		txa_ethdev[i].dev = &rte_eth_devices[i];
+	}
+
+	txa->txa_ethdev = txa_ethdev;
+	txa->dev_count = dev_count;
+	return 0;
+}
+
+static int
+txa_service_queue_array_alloc(struct txa_service_data *txa,
+			uint16_t port_id)
+{
+	struct txa_service_queue_info *tqi;
+	uint16_t nb_queue;
+	int ret;
+
+	ret = txa_service_ethdev_alloc(txa);
+	if (ret != 0)
+		return ret;
+
+	if (txa->txa_ethdev[port_id].queues)
+		return 0;
+
+	nb_queue = txa->txa_ethdev[port_id].dev->data->nb_tx_queues;
+	tqi = rte_zmalloc_socket(txa->mem_name,
+				nb_queue *
+				sizeof(struct txa_service_queue_info), 0,
+				txa->socket_id);
+	if (tqi == NULL)
+		return -ENOMEM;
+	txa->txa_ethdev[port_id].queues = tqi;
+	return 0;
+}
+
+static void
+txa_service_queue_array_free(struct txa_service_data *txa,
+			uint16_t port_id)
+{
+	struct txa_service_ethdev *txa_ethdev;
+	struct txa_service_queue_info *tqi;
+
+	txa_ethdev = &txa->txa_ethdev[port_id];
+	if (txa->txa_ethdev == NULL || txa_ethdev->nb_queues != 0)
+		return;
+
+	tqi = txa_ethdev->queues;
+	txa_ethdev->queues = NULL;
+	rte_free(tqi);
+
+	if (txa->nb_queues == 0) {
+		rte_free(txa->txa_ethdev);
+		txa->txa_ethdev = NULL;
+	}
+}
+
+static void
+txa_service_unregister(struct txa_service_data *txa)
+{
+	if (txa->service_id != TXA_INVALID_SERVICE_ID) {
+		rte_service_component_runstate_set(txa->service_id, 0);
+		while (rte_service_may_be_active(txa->service_id))
+			rte_pause();
+		rte_service_component_unregister(txa->service_id);
+	}
+	txa->service_id = TXA_INVALID_SERVICE_ID;
+}
+
+static int
+txa_service_register(struct txa_service_data *txa)
+{
+	int ret;
+	struct rte_service_spec service;
+	struct rte_event_eth_tx_adapter_conf conf;
+
+	if (txa->service_id != TXA_INVALID_SERVICE_ID)
+		return 0;
+
+	memset(&service, 0, sizeof(service));
+	snprintf(service.name, TXA_SERVICE_NAME_LEN, "txa_%d", txa->id);
+	service.socket_id = txa->socket_id;
+	service.callback = txa_service_func;
+	service.callback_userdata = txa;
+	service.capabilities = RTE_SERVICE_CAP_MT_SAFE;
+	ret = rte_service_component_register(&service,
+					(uint32_t *)&txa->service_id);
+	if (ret) {
+		RTE_EDEV_LOG_ERR("failed to register service %s err = %"
+				 PRId32, service.name, ret);
+		return ret;
+	}
+
+	ret = txa->conf_cb(txa->id, txa->eventdev_id, &conf, txa->conf_arg);
+	if (ret) {
+		txa_service_unregister(txa);
+		return ret;
+	}
+
+	rte_service_component_runstate_set(txa->service_id, 1);
+	txa->port_id = conf.event_port_id;
+	txa->max_nb_tx = conf.max_nb_tx;
+	return 0;
+}
+
+static struct rte_eth_dev_tx_buffer *
+txa_service_tx_buf_alloc(struct txa_service_data *txa,
+			const struct rte_eth_dev *dev)
+{
+	struct rte_eth_dev_tx_buffer *tb;
+	uint16_t port_id;
+
+	port_id = dev->data->port_id;
+	tb = rte_zmalloc_socket(txa->mem_name,
+				RTE_ETH_TX_BUFFER_SIZE(TXA_BATCH_SIZE),
+				0,
+				rte_eth_dev_socket_id(port_id));
+	if (tb == NULL)
+		RTE_EDEV_LOG_ERR("Failed to allocate memory for tx buffer");
+	return tb;
+}
+
+static int
+txa_service_is_queue_added(struct txa_service_data *txa,
+			const struct rte_eth_dev *dev,
+			uint16_t tx_queue_id)
+{
+	struct txa_service_queue_info *tqi;
+
+	tqi = txa_service_queue(txa, dev->data->port_id, tx_queue_id);
+	return tqi && tqi->added;
+}
+
+static int
+txa_service_ctrl(uint8_t id, int start)
+{
+	int ret;
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	if (txa->service_id == TXA_INVALID_SERVICE_ID)
+		return 0;
+
+	ret = rte_service_runstate_set(txa->service_id, start);
+	if (ret == 0 && !start) {
+		while (rte_service_may_be_active(txa->service_id))
+			rte_pause();
+	}
+	return ret;
+}
+
+static void
+txa_service_buffer_retry(struct rte_mbuf **pkts, uint16_t unsent,
+			void *userdata)
+{
+	struct txa_retry *tr;
+	struct txa_service_data *data;
+	struct rte_event_eth_tx_adapter_stats *stats;
+	uint16_t sent = 0;
+	unsigned int retry = 0;
+	uint16_t i, n;
+
+	tr = (struct txa_retry *)(uintptr_t)userdata;
+	data = txa_service_id_to_data(tr->id);
+	stats = &data->stats;
+
+	do {
+		n = rte_eth_tx_burst(tr->port_id, tr->tx_queue,
+			       &pkts[sent], unsent - sent);
+
+		sent += n;
+	} while (sent != unsent && retry++ < TXA_RETRY_CNT);
+
+	for (i = sent; i < unsent; i++)
+		rte_pktmbuf_free(pkts[i]);
+
+	stats->tx_retry += retry;
+	stats->tx_packets += sent;
+	stats->tx_dropped += unsent - sent;
+}
+
+static void
+txa_service_tx(struct txa_service_data *txa, struct rte_event *ev,
+	uint32_t n)
+{
+	uint32_t i;
+	uint16_t nb_tx;
+	struct rte_event_eth_tx_adapter_stats *stats;
+
+	stats = &txa->stats;
+
+	nb_tx = 0;
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m;
+		uint16_t port;
+		uint16_t queue;
+		struct txa_service_queue_info *tqi;
+
+		m = ev[i].mbuf;
+		port = m->port;
+		queue = rte_event_eth_tx_adapter_txq_get(m);
+
+		tqi = txa_service_queue(txa, port, queue);
+		if (unlikely(tqi == NULL || !tqi->added)) {
+			rte_pktmbuf_free(m);
+			continue;
+		}
+
+		nb_tx += rte_eth_tx_buffer(port, queue, tqi->tx_buf, m);
+	}
+
+	stats->tx_packets += nb_tx;
+}
+
+static int32_t
+txa_service_func(void *args)
+{
+	struct txa_service_data *txa = args;
+	uint8_t dev_id;
+	uint8_t port;
+	uint16_t n;
+	uint32_t nb_tx, max_nb_tx;
+	struct rte_event ev[TXA_BATCH_SIZE];
+
+	dev_id = txa->eventdev_id;
+	max_nb_tx = txa->max_nb_tx;
+	port = txa->port_id;
+
+	if (txa->nb_queues == 0)
+		return 0;
+
+	if (!rte_spinlock_trylock(&txa->tx_lock))
+		return 0;
+
+	for (nb_tx = 0; nb_tx < max_nb_tx; nb_tx += n) {
+
+		n = rte_event_dequeue_burst(dev_id, port, ev, RTE_DIM(ev), 0);
+		if (!n)
+			break;
+		txa_service_tx(txa, ev, n);
+	}
+
+	if ((txa->loop_cnt++ & (TXA_FLUSH_THRESHOLD - 1)) == 0) {
+
+		struct txa_service_ethdev *tdi;
+		struct txa_service_queue_info *tqi;
+		struct rte_eth_dev *dev;
+		uint16_t i;
+
+		tdi = txa->txa_ethdev;
+		nb_tx = 0;
+
+		RTE_ETH_FOREACH_DEV(i) {
+			uint16_t q;
+
+			if (i == txa->dev_count)
+				break;
+
+			dev = tdi[i].dev;
+			if (tdi[i].nb_queues == 0)
+				continue;
+			for (q = 0; q < dev->data->nb_tx_queues; q++) {
+
+				tqi = txa_service_queue(txa, i, q);
+				if (unlikely(tqi == NULL || !tqi->added))
+					continue;
+
+				nb_tx += rte_eth_tx_buffer_flush(i, q,
+							tqi->tx_buf);
+			}
+		}
+
+		txa->stats.tx_packets += nb_tx;
+	}
+	rte_spinlock_unlock(&txa->tx_lock);
+	return 0;
+}
+
+static int
+txa_service_adapter_create(uint8_t id, struct rte_eventdev *dev,
+			struct rte_event_port_conf *port_conf)
+{
+	struct txa_service_data *txa;
+	struct rte_event_port_conf *cb_conf;
+	int ret;
+
+	cb_conf = rte_malloc(NULL, sizeof(*cb_conf), 0);
+	if (cb_conf == NULL)
+		return -ENOMEM;
+
+	*cb_conf = *port_conf;
+	ret = txa_service_adapter_create_ext(id, dev, txa_service_conf_cb,
+					cb_conf);
+	if (ret) {
+		rte_free(cb_conf);
+		return ret;
+	}
+
+	txa = txa_service_id_to_data(id);
+	txa->conf_free = 1;
+	return ret;
+}
+
+static int
+txa_service_adapter_create_ext(uint8_t id, struct rte_eventdev *dev,
+			rte_event_eth_tx_adapter_conf_cb conf_cb,
+			void *conf_arg)
+{
+	struct txa_service_data *txa;
+	int socket_id;
+	char mem_name[TXA_SERVICE_NAME_LEN];
+	int ret;
+
+	if (conf_cb == NULL)
+		return -EINVAL;
+
+	socket_id = dev->data->socket_id;
+	snprintf(mem_name, TXA_MEM_NAME_LEN,
+		"rte_event_eth_txa_%d",
+		id);
+
+	ret = txa_service_data_init();
+	if (ret != 0)
+		return ret;
+
+	txa = rte_zmalloc_socket(mem_name,
+				sizeof(*txa),
+				RTE_CACHE_LINE_SIZE, socket_id);
+	if (txa == NULL) {
+		RTE_EDEV_LOG_ERR("failed to get mem for tx adapter");
+		return -ENOMEM;
+	}
+
+	txa->id = id;
+	txa->eventdev_id = dev->data->dev_id;
+	txa->socket_id = socket_id;
+	strncpy(txa->mem_name, mem_name, TXA_SERVICE_NAME_LEN);
+	txa->conf_cb = conf_cb;
+	txa->conf_arg = conf_arg;
+	txa->service_id = TXA_INVALID_SERVICE_ID;
+	rte_spinlock_init(&txa->tx_lock);
+	txa_service_data_array[id] = txa;
+
+	return 0;
+}
+
+static int
+txa_service_event_port_get(uint8_t id, uint8_t *port)
+{
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	if (txa->service_id == TXA_INVALID_SERVICE_ID)
+		return -ENODEV;
+
+	*port = txa->port_id;
+	return 0;
+}
+
+static int
+txa_service_adapter_free(uint8_t id)
+{
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	if (txa->nb_queues) {
+		RTE_EDEV_LOG_ERR("%" PRIu16 " Tx queues not deleted",
+				txa->nb_queues);
+		return -EBUSY;
+	}
+
+	if (txa->conf_free)
+		rte_free(txa->conf_arg);
+	rte_free(txa);
+	return 0;
+}
+
+static int
+txa_service_queue_add(uint8_t id,
+		__rte_unused struct rte_eventdev *dev,
+		const struct rte_eth_dev *eth_dev,
+		int32_t tx_queue_id)
+{
+	struct txa_service_data *txa;
+	struct txa_service_ethdev *tdi;
+	struct txa_service_queue_info *tqi;
+	struct rte_eth_dev_tx_buffer *tb;
+	struct txa_retry *txa_retry;
+	int ret;
+
+	txa = txa_service_id_to_data(id);
+
+	if (tx_queue_id == -1) {
+		int nb_queues;
+		uint16_t i, j;
+		uint16_t *qdone;
+
+		nb_queues = eth_dev->data->nb_tx_queues;
+		if (txa->dev_count > eth_dev->data->port_id) {
+			tdi = &txa->txa_ethdev[eth_dev->data->port_id];
+			nb_queues -= tdi->nb_queues;
+		}
+
+		qdone = rte_zmalloc(txa->mem_name,
+				nb_queues * sizeof(*qdone), 0);
+		j = 0;
+		for (i = 0; i < nb_queues; i++) {
+			if (txa_service_is_queue_added(txa, eth_dev, i))
+				continue;
+			ret = txa_service_queue_add(id, dev, eth_dev, i);
+			if (ret == 0)
+				qdone[j++] = i;
+			else
+				break;
+		}
+
+		if (i != nb_queues) {
+			for (i = 0; i < j; i++)
+				txa_service_queue_del(id, eth_dev, qdone[i]);
+		}
+		rte_free(qdone);
+		return ret;
+	}
+
+	ret = txa_service_register(txa);
+	if (ret)
+		return ret;
+
+	rte_spinlock_lock(&txa->tx_lock);
+
+	if (txa_service_is_queue_added(txa, eth_dev, tx_queue_id)) {
+		rte_spinlock_unlock(&txa->tx_lock);
+		return 0;
+	}
+
+	ret = txa_service_queue_array_alloc(txa, eth_dev->data->port_id);
+	if (ret)
+		goto err_unlock;
+
+	tb = txa_service_tx_buf_alloc(txa, eth_dev);
+	if (tb == NULL)
+		goto err_unlock;
+
+	tdi = &txa->txa_ethdev[eth_dev->data->port_id];
+	tqi = txa_service_queue(txa, eth_dev->data->port_id, tx_queue_id);
+
+	txa_retry = &tqi->txa_retry;
+	txa_retry->id = txa->id;
+	txa_retry->port_id = eth_dev->data->port_id;
+	txa_retry->tx_queue = tx_queue_id;
+
+	rte_eth_tx_buffer_init(tb, TXA_BATCH_SIZE);
+	rte_eth_tx_buffer_set_err_callback(tb,
+		txa_service_buffer_retry, txa_retry);
+
+	tqi->tx_buf = tb;
+	tqi->added = 1;
+	tdi->nb_queues++;
+	txa->nb_queues++;
+
+err_unlock:
+	if (txa->nb_queues == 0) {
+		txa_service_queue_array_free(txa,
+					eth_dev->data->port_id);
+		txa_service_unregister(txa);
+	}
+
+	rte_spinlock_unlock(&txa->tx_lock);
+	return 0;
+}
+
+static int
+txa_service_queue_del(uint8_t id,
+		const struct rte_eth_dev *dev,
+		int32_t tx_queue_id)
+{
+	struct txa_service_data *txa;
+	struct txa_service_queue_info *tqi;
+	struct rte_eth_dev_tx_buffer *tb;
+	uint16_t port_id;
+
+	if (tx_queue_id == -1) {
+		uint16_t i;
+		int ret;
+
+		for (i = 0; i < dev->data->nb_tx_queues; i++) {
+			ret = txa_service_queue_del(id, dev, i);
+			if (ret != 0)
+				break;
+		}
+		return ret;
+	}
+
+	txa = txa_service_id_to_data(id);
+	port_id = dev->data->port_id;
+
+	tqi = txa_service_queue(txa, port_id, tx_queue_id);
+	if (tqi == NULL || !tqi->added)
+		return 0;
+
+	tb = tqi->tx_buf;
+	tqi->added = 0;
+	tqi->tx_buf = NULL;
+	rte_free(tb);
+	txa->nb_queues--;
+	txa->txa_ethdev[port_id].nb_queues--;
+
+	txa_service_queue_array_free(txa, port_id);
+	return 0;
+}
+
+static int
+txa_service_id_get(uint8_t id, uint32_t *service_id)
+{
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	if (txa->service_id == TXA_INVALID_SERVICE_ID)
+		return -EINVAL;
+
+	*service_id = txa->service_id;
+	return 0;
+}
+
+static int
+txa_service_start(uint8_t id)
+{
+	return txa_service_ctrl(id, 1);
+}
+
+static int
+txa_service_stats_get(uint8_t id,
+		struct rte_event_eth_tx_adapter_stats *stats)
+{
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	*stats = txa->stats;
+	return 0;
+}
+
+static int
+txa_service_stats_reset(uint8_t id)
+{
+	struct txa_service_data *txa;
+
+	txa = txa_service_id_to_data(id);
+	memset(&txa->stats, 0, sizeof(txa->stats));
+	return 0;
+}
+
+static int
+txa_service_stop(uint8_t id)
+{
+	return txa_service_ctrl(id, 0);
+}
+
+
+int __rte_experimental
+rte_event_eth_tx_adapter_create(uint8_t id, uint8_t dev_id,
+				struct rte_event_port_conf *port_conf)
+{
+	struct rte_eventdev *dev;
+	int ret;
+
+	if (port_conf == NULL)
+		return -EINVAL;
+
+	RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET(id, -EINVAL);
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	dev = &rte_eventdevs[dev_id];
+
+	ret = txa_init();
+	if (ret != 0)
+		return ret;
+
+	if (txa_adapter_exist(id))
+		return -EEXIST;
+
+	txa_dev_id_array[id] = dev_id;
+	if (txa_dev_adapter_create(id))
+		ret = txa_dev_adapter_create(id)(id, dev);
+
+	if (ret != 0) {
+		txa_dev_id_array[id] = TXA_INVALID_DEV_ID;
+		return ret;
+	}
+
+	ret = txa_service_adapter_create(id, dev, port_conf);
+	if (ret != 0) {
+		if (txa_dev_adapter_free(id))
+			txa_dev_adapter_free(id)(id, dev);
+		txa_dev_id_array[id] = TXA_INVALID_DEV_ID;
+		return ret;
+	}
+
+	txa_dev_id_array[id] = dev_id;
+	return 0;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_create_ext(uint8_t id, uint8_t dev_id,
+				rte_event_eth_tx_adapter_conf_cb conf_cb,
+				void *conf_arg)
+{
+	struct rte_eventdev *dev;
+	int ret;
+
+	RTE_EVENT_ETH_TX_ADAPTER_ID_VALID_OR_ERR_RET(id, -EINVAL);
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	ret = txa_init();
+	if (ret != 0)
+		return ret;
+
+	if (txa_adapter_exist(id))
+		return -EINVAL;
+
+	dev = &rte_eventdevs[dev_id];
+
+	txa_dev_id_array[id] = dev_id;
+	if (txa_dev_adapter_create_ext(id))
+		ret = txa_dev_adapter_create_ext(id)(id, dev);
+
+	if (ret != 0) {
+		txa_dev_id_array[id] = TXA_INVALID_DEV_ID;
+		return ret;
+	}
+
+	ret = txa_service_adapter_create_ext(id, dev, conf_cb, conf_arg);
+	if (ret != 0) {
+		if (txa_dev_adapter_free(id))
+			txa_dev_adapter_free(id)(id, dev);
+		txa_dev_id_array[id] = TXA_INVALID_DEV_ID;
+		return ret;
+	}
+
+	txa_dev_id_array[id] = dev_id;
+	return 0;
+}
+
+
+int __rte_experimental
+rte_event_eth_tx_adapter_event_port_get(uint8_t id, uint8_t *event_port_id)
+{
+	TXA_CHECK_OR_ERR_RET(id);
+
+	return txa_service_event_port_get(id, event_port_id);
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_free(uint8_t id)
+{
+	int ret;
+
+	TXA_CHECK_OR_ERR_RET(id);
+
+	ret = txa_dev_adapter_free(id) ?
+		txa_dev_adapter_free(id)(id, txa_evdev(id)) :
+		0;
+
+	if (ret == 0)
+		ret = txa_service_adapter_free(id);
+	txa_dev_id_array[id] = TXA_INVALID_DEV_ID;
+
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_queue_add(uint8_t id,
+				uint16_t eth_dev_id,
+				int32_t queue)
+{
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	uint32_t caps;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(eth_dev_id, -EINVAL);
+	TXA_CHECK_OR_ERR_RET(id);
+
+	eth_dev = &rte_eth_devices[eth_dev_id];
+	if (queue != -1 && (uint16_t)queue >= eth_dev->data->nb_tx_queues) {
+		RTE_EDEV_LOG_ERR("Invalid tx queue_id %" PRIu16,
+				(uint16_t)queue);
+		return -EINVAL;
+	}
+
+	caps = 0;
+	if (txa_dev_caps_get(id))
+		txa_dev_caps_get(id)(txa_evdev(id), eth_dev, &caps);
+
+	if (caps & RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT)
+		ret =  txa_dev_queue_add(id) ?
+					txa_dev_queue_add(id)(id,
+							txa_evdev(id),
+							eth_dev,
+							queue) : 0;
+	else
+		ret = txa_service_queue_add(id, txa_evdev(id), eth_dev, queue);
+
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_queue_del(uint8_t id,
+				uint16_t eth_dev_id,
+				int32_t queue)
+{
+	struct rte_eth_dev *eth_dev;
+	int ret;
+	uint32_t caps;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(eth_dev_id, -EINVAL);
+	TXA_CHECK_OR_ERR_RET(id);
+
+	eth_dev = &rte_eth_devices[eth_dev_id];
+	if (queue != -1 && (uint16_t)queue >= eth_dev->data->nb_tx_queues) {
+		RTE_EDEV_LOG_ERR("Invalid tx queue_id %" PRIu16,
+				(uint16_t)queue);
+		return -EINVAL;
+	}
+
+	caps = 0;
+
+	if (txa_dev_caps_get(id))
+		txa_dev_caps_get(id)(txa_evdev(id), eth_dev, &caps);
+
+	if (caps & RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT)
+		ret =  txa_dev_queue_del(id) ?
+					txa_dev_queue_del(id)(id, txa_evdev(id),
+							eth_dev,
+							queue) : 0;
+	else
+		ret = txa_service_queue_del(id, eth_dev, queue);
+
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_service_id_get(uint8_t id, uint32_t *service_id)
+{
+	TXA_CHECK_OR_ERR_RET(id);
+
+	return txa_service_id_get(id, service_id);
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_start(uint8_t id)
+{
+	int ret;
+
+	TXA_CHECK_OR_ERR_RET(id);
+
+	ret = txa_dev_start(id) ? txa_dev_start(id)(id, txa_evdev(id)) : 0;
+	if (ret == 0)
+		ret = txa_service_start(id);
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_stats_get(uint8_t id,
+				struct rte_event_eth_tx_adapter_stats *stats)
+{
+	int ret;
+
+	TXA_CHECK_OR_ERR_RET(id);
+
+	if (stats == NULL)
+		return -EINVAL;
+
+	ret = txa_dev_stats_get(id) ?
+			txa_dev_stats_get(id)(id, txa_evdev(id), stats) : 0;
+	if (ret == 0)
+		ret = txa_service_stats_get(id, stats);
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_stats_reset(uint8_t id)
+{
+	int ret;
+
+	TXA_CHECK_OR_ERR_RET(id);
+
+	ret = txa_dev_stats_reset(id) ?
+		txa_dev_stats_reset(id)(id, txa_evdev(id)) : 0;
+	if (ret == 0)
+		ret = txa_service_stats_reset(id);
+	return ret;
+}
+
+int __rte_experimental
+rte_event_eth_tx_adapter_stop(uint8_t id)
+{
+	int ret;
+
+	TXA_CHECK_OR_ERR_RET(id);
+
+	ret = txa_dev_stop(id) ? txa_dev_stop(id)(id,  txa_evdev(id)) : 0;
+	if (ret == 0)
+		ret = txa_service_stop(id);
+	return ret;
+}
diff --git a/config/common_base b/config/common_base
index 4a51aa9..ffef6f4 100644
--- a/config/common_base
+++ b/config/common_base
@@ -594,7 +594,7 @@ CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
 CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 CONFIG_RTE_EVENT_ETH_INTR_RING_SIZE=1024
 CONFIG_RTE_EVENT_CRYPTO_ADAPTER_MAX_INSTANCE=32
-
+CONFIG_RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE=32
 #
 # Compile PMD for skeleton event device
 #
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..424ff35 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -28,6 +28,7 @@ SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
 SRCS-y += rte_event_timer_adapter.c
 SRCS-y += rte_event_crypto_adapter.c
+SRCS-y += rte_event_eth_tx_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -39,6 +40,7 @@ SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 SYMLINK-y-include += rte_event_crypto_adapter.h
+SYMLINK-y-include += rte_event_eth_tx_adapter.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/meson.build b/lib/librte_eventdev/meson.build
index 3cbaf29..47989e7 100644
--- a/lib/librte_eventdev/meson.build
+++ b/lib/librte_eventdev/meson.build
@@ -14,7 +14,8 @@ sources = files('rte_eventdev.c',
 		'rte_event_ring.c',
 		'rte_event_eth_rx_adapter.c',
 		'rte_event_timer_adapter.c',
-		'rte_event_crypto_adapter.c')
+		'rte_event_crypto_adapter.c',
+		'rte_event_eth_tx_adapter.c')
 headers = files('rte_eventdev.h',
 		'rte_eventdev_pmd.h',
 		'rte_eventdev_pmd_pci.h',
@@ -23,5 +24,6 @@ headers = files('rte_eventdev.h',
 		'rte_event_eth_rx_adapter.h',
 		'rte_event_timer_adapter.h',
 		'rte_event_timer_adapter_pmd.h',
-		'rte_event_crypto_adapter.h')
+		'rte_event_crypto_adapter.h',
+		'rte_event_eth_tx_adapter.h')
 deps += ['ring', 'ethdev', 'hash', 'mempool', 'mbuf', 'timer', 'cryptodev']
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 12835e9..47e2898 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -96,6 +96,18 @@ EXPERIMENTAL {
 	rte_event_crypto_adapter_stats_reset;
 	rte_event_crypto_adapter_stop;
 	rte_event_eth_rx_adapter_cb_register;
+	rte_event_eth_tx_adapter_caps_get;
+	rte_event_eth_tx_adapter_create;
+	rte_event_eth_tx_adapter_create_ext;
+	rte_event_eth_tx_adapter_event_port_get;
+	rte_event_eth_tx_adapter_free;
+	rte_event_eth_tx_adapter_queue_add;
+	rte_event_eth_tx_adapter_queue_del;
+	rte_event_eth_tx_adapter_service_id_get;
+	rte_event_eth_tx_adapter_start;
+	rte_event_eth_tx_adapter_stats_get;
+	rte_event_eth_tx_adapter_stats_reset;
+	rte_event_eth_tx_adapter_stop;
 	rte_event_timer_adapter_caps_get;
 	rte_event_timer_adapter_create;
 	rte_event_timer_adapter_create_ext;
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v14 4/6] drivers/net: enable hotplug on secondary process
  @ 2018-08-15  1:14  3%       ` Zhang, Qi Z
  0 siblings, 0 replies; 200+ results
From: Zhang, Qi Z @ 2018-08-15  1:14 UTC (permalink / raw)
  To: Andrew Rybchenko, thomas, gaetan.rivet, Burakov, Anatoly
  Cc: Ananyev, Konstantin, dev, Richardson, Bruce, Yigit, Ferruh,
	Shelton, Benjamin H, Vangati, Narender



> -----Original Message-----
> From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]
> Sent: Sunday, August 12, 2018 7:00 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net;
> gaetan.rivet@6wind.com; Burakov, Anatoly <anatoly.burakov@intel.com>;
> arybchenko@solarflare.com
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Richardson, Bruce <bruce.richardson@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v14 4/6] drivers/net: enable hotplug on
> secondary process
> 
> On 10.08.2018 03:42, Qi Zhang wrote:
> > Attach port from secondary should ignore devargs since the private
> > device is not necessary to support. Also previously, detach port on a
> > secondary process will mess primary process and cause the same device
> > can't be attached back again. A secondary process should use
> > rte_eth_dev_release_port_secondary to release a port.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> 
> For me, it looks like duplication of the same code logic in all vdev drivers. I'd
> say that remove should not be called at all in the case of secondary process.

But based on current framework, rte_eth_dev_release_port_secondary is required to be called in PMD, and driver->remove is the place to call it as what I see.

> Also I'd consider to introduce separate callback for probe in the case of
> secondary process: it would make it clear if secondary is supported and
> enforce authors to think about secondary process specifics on probe. As far
> as I can see it is always absolutely different branch with own code.

I like this idea. We should give the driver the flexibility to decide to expose device on a secondary process or not.
And this is not for vdev only, maybe another option is adding a flag in rte_driver to indicate if it supports secondary process or not, so we don't need to add callback for all sub bus drivers separately, but in that case we still have to handle secondary in the same probe/remove function if a driver support secondary.

Btw, this looks like involve a lot of change and break ABI. Also, it exceeds the scope of hotplug. I would like to see this in a separate patchset ( better a RFC first), what do you think?

Regards
Qi



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  2018-08-14  3:53  4%   ` Rao, Nikhil
@ 2018-08-14 11:04  4%     ` Neil Horman
  2018-08-15  6:10  2%       ` Nikhil Rao
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-08-14 11:04 UTC (permalink / raw)
  To: Rao, Nikhil
  Cc: dev, thomas, john.mcnamara, bruce.richardson, Ferruh Yigit,
	Stephen Hemminger, hange-folder>?

On Tue, Aug 14, 2018 at 09:23:59AM +0530, Rao, Nikhil wrote:
> On 6/27/2018 11:31 PM, Neil Horman wrote:
> > diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
> > new file mode 100755
> > index 000000000..17d123cf4
> > --- /dev/null
> > +++ b/devtools/check-symbol-change.sh
> > @@ -0,0 +1,159 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: BSD-3-Clause
> > +# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
> > +
> > +build_map_changes()
> > +{
> > +	local fname=$1
> > +	local mapdb=$2
> > +
> > +	cat $fname | awk '
> > +		# Initialize our variables
> > +		BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
> > +
> > +		# Anything that starts with + or -, followed by an a
> > +		# and ends in the string .map is the name of our map file
> > +		# This may appear multiple times in a patch if multiple
> > +		# map files are altered, and all section/symbol names
> > +		# appearing between a triggering of this rule and the
> > +		# next trigger of this rule are associated with this file
> > +		/[-+] a\/.*\.map/ {map=$2; in_map=1}
> > +
> > +		# Same pattern as above, only it matches on anything that
> > +		# doesnt end in 'map', indicating we have left the map chunk.
> > +		# When we hit this, turn off the in_map variable, which
> > +		# supresses the subordonate rules below
> > +		/[-+] a\/.*\.^(map)/ {in_map=0}
> > +
> > +		# Triggering this rule, which starts a line with a + and ends it
> > +		# with a { identifies a versioned section.  The section name is
> > +		# the rest of the line with the + and { symbols remvoed.
> > +		# Triggering this rule sets in_sec to 1, which actives the
> > +		# symbol rule below
> > +		/+.*{/ {gsub("+","");
> > +			if (in_map == 1) {
> > +				sec=$1; in_sec=1;
> > +			}
> > +		}
> > +
> 
> I am adding a symbol as shown below, however the rule above fails to detect
> that the new symbol is being added to a pre-existing EXPERIMENTAL block
> (picks up the section name as @@ instead).
> 
> Any suggestions ?
> 
I was about to say that its because you've not got enough context to let the awk
file figure out what your section name is, but that doesn't appear to be the
case. Can you provide the exact command line you are running to do your symbol
check, as well as the full patch that you are checking?  I'd like to try
recreate the issue here

Best
Neil

> diff --git a/lib/librte_eventdev/rte_eventdev_version.map
> b/lib/librte_eventdev/rte_eventdev_version.map
> index 12835e9..4b8c55d 100644
> --- a/lib/librte_eventdev/rte_eventdev_version.map
> +++ b/lib/librte_eventdev/rte_eventdev_version.map
> @@ -96,6 +96,7 @@ EXPERIMENTAL {
>   	rte_event_crypto_adapter_stats_reset;
>   	rte_event_crypto_adapter_stop;
>   	rte_event_eth_rx_adapter_cb_register;
> +	rte_event_eth_tx_adapter_caps_get;
>   	rte_event_timer_adapter_caps_get;
>   	rte_event_timer_adapter_create;
>   	rte_event_timer_adapter_create_ext;
> 
> Thanks,
> Nikhil
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC 0/3] Make device mapping more reliable
  @ 2018-08-14 10:13  3% ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-08-14 10:13 UTC (permalink / raw)
  To: dev
  Cc: thomas, hemant.agrawal, bruce.richardson, ferruh.yigit,
	konstantin.ananyev, jerin.jacob, olivier.matz, stephen, nhorman,
	david.marchand, gowrishankar.m

On 31-May-18 11:57 AM, Anatoly Burakov wrote:
> Currently, memory for device maps is allocated ad-hoc, by calculating
> end of VA space allocated for hugepages and crossing fingers in hopes that
> those addresses will be free in primary and secondary processes. This leads
> to situations such as this:
> 
> EAL: Detected 88 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_178323_8af2229603de4
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: PCI device 0000:81:00.0 on NUMA socket 1
> EAL:   probe driver: 8086:1563 net_ixgbe
> EAL: Cannot mmap device resource file /sys/bus/pci/devices/0000:81:00.0/resource0 to address: 0x7ff7f5800000
> EAL: Requested device 0000:81:00.0 cannot be used
> EAL: Error - exiting with code: 1
>    Cause: No Ethernet ports - bye
> 
> As can be seen from the above log, secondary process has initialized
> successfully, but device BAR mapping has failed, which resulted in missing ports
> in the secondary process.
> 
> This patchset is an attempt to fix this problem once and for all, by using
> the same method we use for memory to do device mappings as well. That is,
> by preallocating all of the device memory in advance, so that initialization
> either succeeds and allows for device mappings, or it fails outright (whereas
> currently we may be in an in-between kind of situation, where init has
> succeeded but device mappings have failed).
> 
> This change breaks the ABI, so it is not for this release. However, i'd like
> to hear feedback on the approach and whether there are potential problems with
> other buses/use cases that i didn't think of.
> 

I would like to draw attention to the RFC, now that the release has gone 
out. We have a deprecation notice for ABI break due to external memory 
work, so we now have an opportunity to implement this for 18.11.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
    2018-07-15 23:12  4%   ` Thomas Monjalon
@ 2018-08-14  3:53  4%   ` Rao, Nikhil
  2018-08-14 11:04  4%     ` Neil Horman
  1 sibling, 1 reply; 200+ results
From: Rao, Nikhil @ 2018-08-14  3:53 UTC (permalink / raw)
  To: Neil Horman, dev
  Cc: thomas, john.mcnamara, bruce.richardson, Ferruh Yigit,
	Stephen Hemminger, nikhil.rao

On 6/27/2018 11:31 PM, Neil Horman wrote:
> diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
> new file mode 100755
> index 000000000..17d123cf4
> --- /dev/null
> +++ b/devtools/check-symbol-change.sh
> @@ -0,0 +1,159 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
> +
> +build_map_changes()
> +{
> +	local fname=$1
> +	local mapdb=$2
> +
> +	cat $fname | awk '
> +		# Initialize our variables
> +		BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
> +
> +		# Anything that starts with + or -, followed by an a
> +		# and ends in the string .map is the name of our map file
> +		# This may appear multiple times in a patch if multiple
> +		# map files are altered, and all section/symbol names
> +		# appearing between a triggering of this rule and the
> +		# next trigger of this rule are associated with this file
> +		/[-+] a\/.*\.map/ {map=$2; in_map=1}
> +
> +		# Same pattern as above, only it matches on anything that
> +		# doesnt end in 'map', indicating we have left the map chunk.
> +		# When we hit this, turn off the in_map variable, which
> +		# supresses the subordonate rules below
> +		/[-+] a\/.*\.^(map)/ {in_map=0}
> +
> +		# Triggering this rule, which starts a line with a + and ends it
> +		# with a { identifies a versioned section.  The section name is
> +		# the rest of the line with the + and { symbols remvoed.
> +		# Triggering this rule sets in_sec to 1, which actives the
> +		# symbol rule below
> +		/+.*{/ {gsub("+","");
> +			if (in_map == 1) {
> +				sec=$1; in_sec=1;
> +			}
> +		}
> +

I am adding a symbol as shown below, however the rule above fails to 
detect that the new symbol is being added to a pre-existing EXPERIMENTAL 
block (picks up the section name as @@ instead).

Any suggestions ?

diff --git a/lib/librte_eventdev/rte_eventdev_version.map 
b/lib/librte_eventdev/rte_eventdev_version.map
index 12835e9..4b8c55d 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -96,6 +96,7 @@ EXPERIMENTAL {
   	rte_event_crypto_adapter_stats_reset;
   	rte_event_crypto_adapter_stop;
   	rte_event_eth_rx_adapter_cb_register;
+	rte_event_eth_tx_adapter_caps_get;
   	rte_event_timer_adapter_caps_get;
   	rte_event_timer_adapter_create;
   	rte_event_timer_adapter_create_ext;

Thanks,
Nikhil

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management
  @ 2018-08-13 19:23  3% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-08-13 19:23 UTC (permalink / raw)
  To: Rosen Xu
  Cc: dev, cristian.dumitrescu, wenzhuo.lu, jasvinder.singh, ferruh.yigit

On Mon, 13 Aug 2018 15:53:32 +0800
Rosen Xu <rosen.xu@intel.com> wrote:

> @@ -1028,6 +1094,8 @@ enum rte_tm_error_type {
>  	RTE_TM_ERROR_TYPE_WRED_PROFILE_YELLOW,
>  	RTE_TM_ERROR_TYPE_WRED_PROFILE_RED,
>  	RTE_TM_ERROR_TYPE_WRED_PROFILE_ID,
> +	RTE_TM_ERROR_TYPE_TDROP_PROFILE,
> +	RTE_TM_ERROR_TYPE_TDROP_PROFILE_ID,
>  	RTE_TM_ERROR_TYPE_SHARED_WRED_CONTEXT_ID,
>  	RTE_TM_ERROR_TYPE_SHAPER_PROFILE,
>  	RTE_TM_ERROR_TYPE_SHAPER_PROFILE_COMMITTED_RATE,
> @@ -1279,6 +1347,110 @@ struct rte_tm_error {

Be careful, adding a new enum in middle of list will potentially break ABI.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] version: 18.11-rc0
@ 2018-08-11 22:11  6% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-11 22:11 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, john.mcnamara

Start version numbering for a new release cycle,
and introduce a template file for release notes.

The release notes comments have a new block to suggest
the order of items, inspired by Ferruh's proposal.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 doc/guides/rel_notes/release_18_11.rst      | 207 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_version.h |   6 +-
 meson.build                                 |   2 +-
 3 files changed, 211 insertions(+), 4 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_18_11.rst

diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
new file mode 100644
index 000000000..3ae6b3f58
--- /dev/null
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -0,0 +1,207 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2018 The DPDK contributors
+
+DPDK Release 18.11
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      xdg-open build/doc/html/guides/rel_notes/release_18_11.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significative)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change.
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change
+     that was announced in the previous releases and made in this release.
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release
+   and prepend with a ``+`` sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. Do not overwrite or remove it.
+   =========================================================
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_bbdev.so.1
+     librte_bitratestats.so.2
+     librte_bpf.so.1
+     librte_bus_dpaa.so.1
+     librte_bus_fslmc.so.1
+     librte_bus_pci.so.1
+     librte_bus_vdev.so.1
+     librte_bus_vmbus.so.1
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_common_octeontx.so.1
+     librte_compressdev.so.1
+     librte_cryptodev.so.5
+     librte_distributor.so.1
+     librte_eal.so.8
+     librte_ethdev.so.10
+     librte_eventdev.so.4
+     librte_flow_classify.so.1
+     librte_gro.so.1
+     librte_gso.so.1
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_latencystats.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.4
+     librte_mempool.so.5
+     librte_meter.so.2
+     librte_metrics.so.1
+     librte_net.so.1
+     librte_pci.so.1
+     librte_pdump.so.2
+     librte_pipeline.so.3
+     librte_pmd_bnxt.so.2
+     librte_pmd_bond.so.2
+     librte_pmd_i40e.so.2
+     librte_pmd_ixgbe.so.2
+     librte_pmd_dpaa2_cmdif.so.1
+     librte_pmd_dpaa2_qdma.so.1
+     librte_pmd_ring.so.2
+     librte_pmd_softnic.so.1
+     librte_pmd_vhost.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_rawdev.so.1
+     librte_reorder.so.1
+     librte_ring.so.2
+     librte_sched.so.1
+     librte_security.so.1
+     librte_table.so.3
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
diff --git a/lib/librte_eal/common/include/rte_version.h b/lib/librte_eal/common/include/rte_version.h
index 7c6714a25..8399ebdfe 100644
--- a/lib/librte_eal/common/include/rte_version.h
+++ b/lib/librte_eal/common/include/rte_version.h
@@ -32,7 +32,7 @@ extern "C" {
 /**
  * Minor version/month number i.e. the mm in yy.mm.z
  */
-#define RTE_VER_MONTH 8
+#define RTE_VER_MONTH 11
 
 /**
  * Patch level number i.e. the z in yy.mm.z
@@ -42,14 +42,14 @@ extern "C" {
 /**
  * Extra string to be appended to version number
  */
-#define RTE_VER_SUFFIX ""
+#define RTE_VER_SUFFIX "-rc"
 
 /**
  * Patch release number
  *   0-15 = release candidates
  *   16   = release
  */
-#define RTE_VER_RELEASE 16
+#define RTE_VER_RELEASE 0
 
 /**
  * Macro to compute a version number usable for comparisons
diff --git a/meson.build b/meson.build
index e718972f1..84af32ece 100644
--- a/meson.build
+++ b/meson.build
@@ -2,7 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 project('DPDK', 'C',
-	version: '18.08.0',
+	version: '18.11-rc0',
 	license: 'BSD',
 	default_options: ['buildtype=release', 'default_library=static'],
 	meson_version: '>= 0.41'
-- 
2.17.1

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v14 0/6] enable hotplug on multi-process
                     ` (5 preceding siblings ...)
  2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v13 " Qi Zhang
@ 2018-08-10  0:42  1% ` Qi Zhang
    2018-08-16  3:04  1% ` [dpdk-dev] [PATCH v15 0/7] enable hotplug on multi-process Qi Zhang
  2018-09-28  4:23  1% ` [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process Qi Zhang
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-08-10  0:42 UTC (permalink / raw)
  To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v14:
- rebase.
- All changes belongs to patch 1/6.
  1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
     since it is only used by secondary process.
  2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
     I think its better to call rte_eth_dev_release_port_secondary after
     dev_uninit since it is possible that secondary process need to release
     some local resources in dev_uninit before release the port and return.
     Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
     because there is no special handle in all exist dev_uninit for secondary
     process.
  3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
     general step, so we don't need patches for i40e and ixgbe.
  4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
  when detaching a device. It is possible that device is not synced during
  some situation, so attach an exist device in primary still need to sync
  with secondary. Also, it's not necessary to rollback if we fail to
  attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11: - move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (6):
  ethdev: add function to release port in secondary process
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  drivers/net: enable hotplug on secondary process
  drivers/net: enable device detach on secondary
  examples/multi_process: add hotplug sample

 drivers/net/af_packet/rte_eth_af_packet.c    |   6 +-
 drivers/net/bnxt/bnxt_ethdev.c               |   2 +-
 drivers/net/bonding/rte_eth_bond_pmd.c       |   6 +-
 drivers/net/ena/ena_ethdev.c                 |   2 +-
 drivers/net/kni/rte_eth_kni.c                |   6 +-
 drivers/net/liquidio/lio_ethdev.c            |   2 +-
 drivers/net/null/rte_eth_null.c              |   6 +-
 drivers/net/octeontx/octeontx_ethdev.c       |   8 +
 drivers/net/pcap/rte_eth_pcap.c              |   6 +-
 drivers/net/tap/rte_eth_tap.c                |   8 +-
 drivers/net/vhost/rte_eth_vhost.c            |   6 +-
 drivers/net/virtio/virtio_ethdev.c           |   2 +-
 examples/multi_process/Makefile              |   1 +
 examples/multi_process/hotplug_mp/Makefile   |  23 ++
 examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++++
 examples/multi_process/hotplug_mp/commands.h |  10 +
 examples/multi_process/hotplug_mp/main.c     |  41 ++++
 lib/librte_eal/bsdapp/eal/Makefile           |   1 +
 lib/librte_eal/common/eal_common_dev.c       | 177 +++++++++++++-
 lib/librte_eal/common/eal_private.h          |  37 +++
 lib/librte_eal/common/hotplug_mp.c           | 348 +++++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h           |  48 ++++
 lib/librte_eal/common/include/rte_dev.h      |   6 +
 lib/librte_eal/common/meson.build            |   1 +
 lib/librte_eal/linuxapp/eal/Makefile         |   1 +
 lib/librte_eal/linuxapp/eal/eal.c            |   6 +
 lib/librte_ethdev/rte_ethdev.c               |  21 +-
 lib/librte_ethdev/rte_ethdev_driver.h        |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h           |   9 +-
 lib/librte_ethdev/rte_ethdev_version.map     |   7 +
 30 files changed, 997 insertions(+), 30 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH] doc: move and update experimental API description
  @ 2018-08-09 16:36  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-09 16:36 UTC (permalink / raw)
  To: Shreyansh Jain; +Cc: dev, Ferruh Yigit, Luca Boccassi, nhorman

25/05/2018 17:37, Ferruh Yigit:
> On 5/25/2018 1:22 PM, Luca Boccassi wrote:
> > On Fri, 2018-05-25 at 17:37 +0530, Shreyansh Jain wrote:
> >> Experimental API text has been moved into a sub-section of ABI
> >> Policy.
> >> A paragraph has been added to explain the process for removal of an
> >> experimental tag.
> >>
> >> Signed-off-by: Shreyansh Jain <shreyansh.jain@nxp.com>
> >>
> >> ---
> >> note:
> >>  The movement of text into a sub-section is relevant as the previous
> >> position
> >>  was in middle of a continuous text explaining ABI policy - whereas,
> >>  experimental is not truly an ABI policy.
> >>  No change to the original text has been made, except appending a new
> >>  paragraph. Though, this does spoil the blame/praise.
> >>
> >>  doc/guides/contributing/versioning.rst | 54 +++++++++++++++---------
> >> --
> >>  1 file changed, 31 insertions(+), 23 deletions(-)
> > 
> > Acked-by: Luca Boccassi <bluca@debian.org>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: fix symbol check for dash
  2018-08-09 15:14  0%     ` Neil Horman
@ 2018-08-09 16:13  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-09 16:13 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman, Ferruh Yigit, stephen

09/08/2018 17:14, Neil Horman:
> On Thu, Aug 09, 2018 at 01:14:23PM +0100, Ferruh Yigit wrote:
> > On 8/5/2018 10:38 AM, Thomas Monjalon wrote:
> > > The script check-symbol-change.sh was not running when
> > > /bin/sh redirects to dash.
> > > 
> > > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > > Cc: nhorman@tuxdriver.com
> > > 
> > > Reported-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > 
> > Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: fix symbol check for dash
  2018-08-09 12:14  0%   ` Ferruh Yigit
@ 2018-08-09 15:14  0%     ` Neil Horman
  2018-08-09 16:13  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-08-09 15:14 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen

On Thu, Aug 09, 2018 at 01:14:23PM +0100, Ferruh Yigit wrote:
> On 8/5/2018 10:38 AM, Thomas Monjalon wrote:
> > The script check-symbol-change.sh was not running when
> > /bin/sh redirects to dash.
> > 
> > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > Cc: nhorman@tuxdriver.com
> > 
> > Reported-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> 
> Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: fix symbol check for dash
  2018-08-05  9:38  3% ` [dpdk-dev] [PATCH] devtools: fix symbol check for dash Thomas Monjalon
@ 2018-08-09 12:14  0%   ` Ferruh Yigit
  2018-08-09 15:14  0%     ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-08-09 12:14 UTC (permalink / raw)
  To: Thomas Monjalon, nhorman; +Cc: dev, stephen

On 8/5/2018 10:38 AM, Thomas Monjalon wrote:
> The script check-symbol-change.sh was not running when
> /bin/sh redirects to dash.
> 
> Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> Cc: nhorman@tuxdriver.com
> 
> Reported-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

Tested-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1] doc: update release notes for 18.08
@ 2018-08-09 12:08 10% John McNamara
  0 siblings, 0 replies; 200+ results
From: John McNamara @ 2018-08-09 12:08 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Fix grammar, spelling and formatting of DPDK 18.08 release notes.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
Note, I removed the following unused sections of the doc:

* ABI Changes
* Removed Items
* Known Issues



 doc/guides/rel_notes/release_18_08.rst | 90 ++++++++++------------------------
 1 file changed, 25 insertions(+), 65 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index cbdf2d9..7d47c24 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -51,6 +51,7 @@ New Features
 
   Flow API support has been added to CXGBE Poll Mode Driver to offload
   flows to Chelsio T5/T6 NICs. Support added for:
+
   * Wildcard (LE-TCAM) and Exact (HASH) match filters.
   * Match items: physical ingress port, IPv4, IPv6, TCP and UDP.
   * Action items: queue, drop, count, and physical egress port redirect.
@@ -65,8 +66,8 @@ New Features
 
 * **Added descriptor status check support for fm10k.**
 
-  ``rte_eth_rx_descritpr_status`` and ``rte_eth_tx_descriptor_status``
-  are supported by fm10K.
+  The ``rte_eth_rx_descriptor_status`` and ``rte_eth_tx_descriptor_status``
+  APIs are now supported by fm10K.
 
 * **Updated the enic driver.**
 
@@ -82,20 +83,20 @@ New Features
 
   * Added port representors support.
   * Added Flow API support for e-switch rules.
-    Supported ACTION_PORT_ID, ACTION_DROP, ACTION_OF_POP_VLAN,
+    Added supported for ACTION_PORT_ID, ACTION_DROP, ACTION_OF_POP_VLAN,
     ACTION_OF_PUSH_VLAN, ACTION_OF_SET_VLAN_VID, ACTION_OF_SET_VLAN_PCP
     and ITEM_PORT_ID.
-  * Supported 32-bit compilation.
+  * Added support for 32-bit compilation.
 
-* **Added TSO support for mlx4 driver.**
+* **Added TSO support for the mlx4 driver.**
 
-  The support is from MLNX_OFED_4.4 and above.
+  Added TSO support for the mlx4 drivers from MLNX_OFED_4.4 and above.
 
 * **SoftNIC PMD rework.**
 
-  The SoftNIC PMD infrastructure is restructured to use the Packet Framework,
-  which makes it more flexible, modular and easier to add new functionality
-  in future.
+  The SoftNIC PMD infrastructure has been restructured to use the Packet
+  Framework, which makes it more flexible, modular and easier to add new
+  functionality in the future.
 
 * **Updated the AESNI MB PMD.**
 
@@ -127,8 +128,8 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
-* Path to runtime config file has changed. The new path is determined as
-  follows:
+* The path to the runtime config file has changed. The new path is determined
+  as follows:
 
   - If DPDK is running as root, ``/var/run/dpdk/<prefix>/config``
   - If DPDK is not running as root:
@@ -161,24 +162,26 @@ API Changes
   - ``rte_eth_conf.rxmode.ignore_offload_bitfield``
   - ``ETH_TXQ_FLAGS_IGNORE``
 
-* cryptodev: In struct ``struct rte_cryptodev_info``, field ``rte_pci_device *pci_dev``
-  has been replaced with field ``struct rte_device *device``.
-  Value 0 is accepted in ``sym.max_nb_sessions``, meaning that a device
-  supports an unlimited number of sessions.
-  Two new fields of type ``uint16_t`` have been added:
-  ``min_mbuf_headroom_req`` and ``min_mbuf_tailroom_req``.
-  These parameters specify the recommended headroom and tailroom for mbufs
-  to be processed by the PMD.
+* cryptodev: The following API changes have been made in 18.08:
+
+  - In struct ``struct rte_cryptodev_info``, field ``rte_pci_device *pci_dev``
+    has been replaced with field ``struct rte_device *device``.
+  - Value 0 is accepted in ``sym.max_nb_sessions``, meaning that a device
+    supports an unlimited number of sessions.
+  - Two new fields of type ``uint16_t`` have been added:
+    ``min_mbuf_headroom_req`` and ``min_mbuf_tailroom_req``.  These parameters
+    specify the recommended headroom and tailroom for mbufs to be processed by
+    the PMD.
 
-* cryptodev: Following functions were deprecated and are removed in 18.08:
+* cryptodev: The following functions were deprecated and are removed in 18.08:
 
   - ``rte_cryptodev_queue_pair_start``
   - ``rte_cryptodev_queue_pair_stop``
   - ``rte_cryptodev_queue_pair_attach_sym_session``
   - ``rte_cryptodev_queue_pair_detach_sym_session``
 
-* cryptodev: Following functions were deprecated and are replaced by
-  other functions in 18.08:
+* cryptodev: The following functions were deprecated and are replaced by other
+  functions in 18.08:
 
   - ``rte_cryptodev_get_header_session_size`` is replaced with
     ``rte_cryptodev_sym_get_header_session_size``
@@ -212,34 +215,6 @@ API Changes
   - ``RTE_COMP_FF_OOP_LB_IN_SGL_OUT``
 
 
-ABI Changes
------------
-
-.. This section should contain ABI changes. Sample format:
-
-   * Add a short 1-2 sentence description of the ABI change
-     that was announced in the previous releases and made in this release.
-     Use fixed width quotes for ``function_names`` or ``struct_names``.
-     Use the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
-Removed Items
--------------
-
-.. This section should contain removed items in this release. Sample format:
-
-   * Add a short 1-2 sentence description of the removed item
-     in the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
 Shared Library Versions
 -----------------------
 
@@ -314,21 +289,6 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_vhost.so.3
 
 
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue
-     in the present tense. Add information on any known workarounds.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =========================================================
-
-
 Tested Platforms
 ----------------
 
-- 
2.7.5

^ permalink raw reply	[relevance 10%]

* [dpdk-dev] [PATCH v3 1/2] eal: remove deprecated function returning mbuf pool ops name
    2018-07-26 21:42  3%   ` Thomas Monjalon
@ 2018-08-07 21:34  7%   ` Olivier Matz
  1 sibling, 0 replies; 200+ results
From: Olivier Matz @ 2018-08-07 21:34 UTC (permalink / raw)
  To: dev, Hemant Agrawal, John McNamara
  Cc: Thomas Monjalon, Anatoly Burakov, Santosh Shukla, Olivier Matz

From: Olivier Matz <olivier.matz@6wind.com>

rte_eal_mbuf_default_mempool_ops() is replaced by
rte_mbuf_best_mempool_ops().

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
---

v3:
* bump ABI version and update the release notes
v2:
* remove rte_eal_mbuf_user_pool_ops from .map in next patch instead of this

 doc/guides/rel_notes/deprecation.rst    |  9 ---------
 doc/guides/rel_notes/release_18_08.rst  |  6 +++++-
 lib/librte_eal/bsdapp/eal/Makefile      |  2 +-
 lib/librte_eal/bsdapp/eal/eal.c         | 10 ----------
 lib/librte_eal/common/include/rte_eal.h | 11 -----------
 lib/librte_eal/linuxapp/eal/Makefile    |  2 +-
 lib/librte_eal/linuxapp/eal/eal.c       | 10 ----------
 lib/librte_eal/meson.build              |  2 +-
 lib/librte_eal/rte_eal_version.map      |  1 -
 9 files changed, 8 insertions(+), 45 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index bade1e4c4..118f962d9 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -42,15 +42,6 @@ Deprecation Notices
   - ``eal_parse_pci_DomBDF`` replaced by ``rte_pci_addr_parse``
   - ``rte_eal_compare_pci_addr`` replaced by ``rte_pci_addr_cmp``
 
-* eal: a new set of mbuf mempool ops name APIs for user, platform and best
-  mempool names have been defined in ``rte_mbuf`` in v18.02. The uses of
-  ``rte_eal_mbuf_default_mempool_ops`` shall be replaced by
-  ``rte_mbuf_best_mempool_ops``.
-  The following function is deprecated since 18.05, and will be removed
-  in 18.08:
-
-  - ``rte_eal_mbuf_default_mempool_ops``
-
 * mbuf: The opaque ``mbuf->hash.sched`` field will be updated to support generic
   definition in line with the ethdev TM and MTR APIs. Currently, this field
   is defined in librte_sched in a non-generic way. The new generic format
diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 9849fec7d..06cc38e85 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -181,6 +181,10 @@ API Changes
   - ``RTE_COMP_FF_OOP_SGL_IN_LB_OUT``
   - ``RTE_COMP_FF_OOP_LB_IN_SGL_OUT``
 
+* eal: The function ``rte_eal_mbuf_default_mempool_ops`` was deprecated
+  and is removed in 18.08. It shall be replaced by
+  ``rte_mbuf_best_mempool_ops``.
+
 
 ABI Changes
 -----------
@@ -242,7 +246,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_compressdev.so.1
    + librte_cryptodev.so.5
      librte_distributor.so.1
-     librte_eal.so.7
+   + librte_eal.so.8
      librte_ethdev.so.9
      librte_eventdev.so.4
      librte_flow_classify.so.1
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index be684072e..d27da3d15 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 7
+LIBABIVER := 8
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 6a6dd5e85..89e8110a2 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -153,16 +153,6 @@ rte_eal_mbuf_user_pool_ops(void)
 	return internal_config.user_mbuf_pool_ops_name;
 }
 
-/* Return mbuf pool ops name */
-const char *
-rte_eal_mbuf_default_mempool_ops(void)
-{
-	if (internal_config.user_mbuf_pool_ops_name == NULL)
-		return RTE_MBUF_DEFAULT_MEMPOOL_OPS;
-
-	return internal_config.user_mbuf_pool_ops_name;
-}
-
 /* Return a pointer to the configuration structure */
 struct rte_config *
 rte_eal_get_configuration(void)
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 8de5d69e8..0c9c3f13b 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -501,17 +501,6 @@ enum rte_iova_mode rte_eal_iova_mode(void);
 const char * __rte_experimental
 rte_eal_mbuf_user_pool_ops(void);
 
-/**
- * @deprecated
- * Get default pool ops name for mbuf
- *
- * @return
- *   returns default pool ops name.
- */
-__rte_deprecated
-const char *
-rte_eal_mbuf_default_mempool_ops(void);
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index b48352825..fd92c75c2 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 7
+LIBABIVER := 8
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index d2d5aae80..511eb062f 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -161,16 +161,6 @@ rte_eal_mbuf_user_pool_ops(void)
 	return internal_config.user_mbuf_pool_ops_name;
 }
 
-/* Return mbuf pool ops name */
-const char *
-rte_eal_mbuf_default_mempool_ops(void)
-{
-	if (internal_config.user_mbuf_pool_ops_name == NULL)
-		return RTE_MBUF_DEFAULT_MEMPOOL_OPS;
-
-	return internal_config.user_mbuf_pool_ops_name;
-}
-
 /* Return a pointer to the configuration structure */
 struct rte_config *
 rte_eal_get_configuration(void)
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index 98174476c..e1fde15d1 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type "@0@"'.format(host_machine.system()))
 endif
 
-version = 7  # the version of the EAL API
+version = 8  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 deps += 'kvargs'
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f18387137..de9abc812 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -181,7 +181,6 @@ DPDK_17.11 {
 	rte_bus_get_iommu_class;
 	rte_eal_has_pci;
 	rte_eal_iova_mode;
-	rte_eal_mbuf_default_mempool_ops;
 	rte_eal_using_phys_addrs;
 	rte_eal_vfio_intr_mode;
 	rte_lcore_has_role;
-- 
2.11.0

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-02  9:25  0% ` Shreyansh Jain
@ 2018-08-05 23:41  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-05 23:41 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Shreyansh Jain, Neil Horman, John McNamara, Marko Kovacevic,
	keith.wiles

02/08/2018 11:25, Shreyansh Jain:
> On Wednesday 01 August 2018 05:37 PM, Anatoly Burakov wrote:
> > Due to the upcoming external memory support [1], some API and ABI
> > changes will be required. In addition, although the changes called
> > out in the deprecation notice are not yet present in form of code
> > in the published RFC itself, they are based on consensus on the
> > mailing list [2] on how to best implement this feature.
> > 
> > [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> > [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> > 
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> 
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] eal: remove deprecated function returning mbuf pool ops name
  2018-07-26 21:42  3%   ` Thomas Monjalon
@ 2018-08-05 21:45  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-05 21:45 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Hemant Agrawal, santosh.shukla, John McNamara

26/07/2018 23:42, Thomas Monjalon:
> 26/06/2018 11:56, Olivier Matz:
> > rte_eal_mbuf_default_mempool_ops() is replaced by
> > rte_mbuf_best_mempool_ops().
> > 
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> > 
> > v2:
> > * remove rte_eal_mbuf_user_pool_ops from .map in next patch instead of this
> > 
> >  doc/guides/rel_notes/deprecation.rst    |  9 ---------
> >  lib/librte_eal/bsdapp/eal/eal.c         | 10 ----------
> >  lib/librte_eal/common/include/rte_eal.h | 11 -----------
> >  lib/librte_eal/linuxapp/eal/eal.c       | 10 ----------
> >  lib/librte_eal/rte_eal_version.map      |  1 -
> >  5 files changed, 41 deletions(-)
> 
> Please bump ABI version and update the release notes.
> Thanks

Olivier, ping!

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: fix symbol check for dash
  @ 2018-08-05  9:38  3% ` Thomas Monjalon
  2018-08-09 12:14  0%   ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-08-05  9:38 UTC (permalink / raw)
  To: nhorman; +Cc: dev, stephen

The script check-symbol-change.sh was not running when
/bin/sh redirects to dash.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Cc: nhorman@tuxdriver.com

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 devtools/check-symbol-change.sh | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
index 40b72073a..daaf45e14 100755
--- a/devtools/check-symbol-change.sh
+++ b/devtools/check-symbol-change.sh
@@ -4,8 +4,8 @@
 
 build_map_changes()
 {
-	local fname=$1
-	local mapdb=$2
+	local fname="$1"
+	local mapdb="$2"
 
 	cat "$fname" | awk '
 		# Initialize our variables
@@ -80,7 +80,7 @@ build_map_changes()
 
 check_for_rule_violations()
 {
-	local mapdb=$1
+	local mapdb="$1"
 	local mname
 	local symname
 	local secname
@@ -89,10 +89,10 @@ check_for_rule_violations()
 
 	while read mname symname secname ar
 	do
-		if [ "$ar" == "add" ]
+		if [ "$ar" = "add" ]
 		then
 
-			if [ "$secname" == "unknown" ]
+			if [ "$secname" = "unknown" ]
 			then
 				# Just inform the user of this occurrence, but
 				# don't flag it as an error
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash
  2018-08-03 22:17  0% ` [dpdk-dev] " Stephen Hemminger
@ 2018-08-04  6:42  0%   ` Gavin Hu
  0 siblings, 0 replies; 200+ results
From: Gavin Hu @ 2018-08-04  6:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Honnappa Nagarahalli, stable

Hi Stephen,

I am no sure only supporting bash is acceptable or not. Any impact to freebsd?

We should seek wider opinions about this.

I did not meet your problem, either bash or dash, what's your shell?

Best Regards,
Gavin

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Saturday, August 4, 2018 6:17 AM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to
> dash
>
> On Wed,  1 Aug 2018 13:22:57 +0800
> Gavin Hu <gavin.hu@arm.com> wrote:
>
> > When running checkpatch.sh, it generates the following error on some
> > linux distributions(like Debian) with Dash as the default shell
> > interpreter.
> > trap: SIGINT: bad trap
> >
> > The fix is to replace SIGINT with INT signal, it works for both bash
> > and dash.
> >
> > Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>
> > ---
> >  devtools/checkpatches.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh index
> > 2509269df..ba795ad1d 100755
> > --- a/devtools/checkpatches.sh
> > +++ b/devtools/checkpatches.sh
> > @@ -29,7 +29,7 @@ clean_tmp_files() {
> >  fi
> >  }
> >
> > -trap "clean_tmp_files" SIGINT
> > +trap "clean_tmp_files" INT
> >
> >  print_usage () {
> >  cat <<- END_OF_HELP
>
> This patch alone is not sufficient to make checkpatch run successfully
>
> ./devtools/checkpatches.sh: 52: read: Illegal option -d
>
> It looks like the -d flag to read is also a bash extension.
>
> I recommend changing both checkpatches.sh and check-symbol-changes to
> have #!/bin/bash
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash
  2018-08-01  5:22  3% [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash Gavin Hu
  2018-08-01 10:40  0% ` [dpdk-dev] [dpdk-stable] " Mcnamara, John
@ 2018-08-03 22:17  0% ` Stephen Hemminger
  2018-08-04  6:42  0%   ` Gavin Hu
  1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-08-03 22:17 UTC (permalink / raw)
  To: Gavin Hu; +Cc: dev, honnappa.nagarahalli, stable

On Wed,  1 Aug 2018 13:22:57 +0800
Gavin Hu <gavin.hu@arm.com> wrote:

> When running checkpatch.sh, it generates the following error
> on some linux distributions(like Debian) with Dash as the
> default shell interpreter.
> trap: SIGINT: bad trap
> 
> The fix is to replace SIGINT with INT signal, it works for
> both bash and dash.
> 
> Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>
> ---
>  devtools/checkpatches.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index 2509269df..ba795ad1d 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -29,7 +29,7 @@ clean_tmp_files() {
>  	fi
>  }
>  
> -trap "clean_tmp_files" SIGINT
> +trap "clean_tmp_files" INT
>  
>  print_usage () {
>  	cat <<- END_OF_HELP

This patch alone is not sufficient to make checkpatch run successfully

./devtools/checkpatches.sh: 52: read: Illegal option -d

It looks like the -d flag to read is also a bash extension.

I recommend changing both checkpatches.sh and check-symbol-changes to have #!/bin/bash

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter
  @ 2018-08-03 13:36  3% ` Adrien Mazarguil
  2018-08-23 13:48  0%   ` Ferruh Yigit
                     ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Adrien Mazarguil @ 2018-08-03 13:36 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

This is a follow up to the "Flow API helpers enhancements" series submitted
almost a year ago [1]. The new title is due to the reduced scope of this
version.

rte_flow_conv() is a flexible replacement to rte_flow_copy(), itself a
temporary solution pending something better [2]. It replaces a lot of
duplicated code found in testpmd and removes some of the maintenance burden
that developers tend to forget (me included) when modifying pattern
item or actions (updating app/test-pmd/config.c to be clear).

This series was unearthed in order to complete the implementation of
RTE_FLOW_ACTION_TYPE_ENCAP_(VXLAN|NVGRE) in testpmd [3] without having to
duplicate existing code once again.

See individual patches for specific changes in this version.

v2 changes:

- rte_flow_copy() is kept, albeit deprecated, no API/ABI impact.
- Updated bonding PMD.
- No more automatic generation of rte_flow_conv.h.

[1] https://mails.dpdk.org/archives/dev/2017-October/077551.html
[2] https://mails.dpdk.org/archives/dev/2017-July/070492.html
[3] Currently the command-line parser (cmdline_flow.c) is aware of these
    actions, however config.c isn't. Flow rules with such actions cannot
    be created and cannot be validated with PMDs that implement them.

Adrien Mazarguil (7):
  ethdev: add flow API object converter
  ethdev: add flow API item/action name conversion
  app/testpmd: rely on flow API conversion function
  net/failsafe: switch to flow API object conversion function
  net/bonding: switch to flow API object conversion function
  ethdev: deprecate rte_flow_copy function
  ethdev: add missing item/actions to flow object converter

 app/test-pmd/config.c                      | 407 +++------------
 app/test-pmd/testpmd.h                     |   7 +-
 doc/guides/prog_guide/rte_flow.rst         |  20 +
 drivers/net/bonding/rte_eth_bond_api.c     |   6 +-
 drivers/net/bonding/rte_eth_bond_flow.c    |  31 +-
 drivers/net/bonding/rte_eth_bond_private.h |   5 +-
 drivers/net/failsafe/failsafe_ether.c      |   6 +-
 drivers/net/failsafe/failsafe_flow.c       |  31 +-
 drivers/net/failsafe/failsafe_private.h    |   5 +-
 lib/librte_ethdev/rte_ethdev_version.map   |   1 +
 lib/librte_ethdev/rte_flow.c               | 666 ++++++++++++++++++------
 lib/librte_ethdev/rte_flow.h               | 230 +++++++-
 12 files changed, 883 insertions(+), 532 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
                   ` (4 preceding siblings ...)
  2018-08-02  7:56  0% ` Maxime Coquelin
@ 2018-08-02  9:25  0% ` Shreyansh Jain
  2018-08-05 23:41  0%   ` Thomas Monjalon
  5 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-08-02  9:25 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, thomas, keith.wiles

On Wednesday 01 August 2018 05:37 PM, Anatoly Burakov wrote:
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
                   ` (3 preceding siblings ...)
  2018-08-02  5:58  0% ` Yongseok Koh
@ 2018-08-02  7:56  0% ` Maxime Coquelin
  2018-08-02  9:25  0% ` Shreyansh Jain
  5 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-08-02  7:56 UTC (permalink / raw)
  To: Anatoly Burakov, dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, thomas, keith.wiles



On 08/01/2018 02:07 PM, Anatoly Burakov wrote:
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>   doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 14714fe94..629154711 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
>   Deprecation Notices
>   -------------------
>   
> +* eal: certain structures will change in EAL on account of upcoming external
> +  memory support. Aside from internal changes leading to an ABI break, the
> +  following externally visible changes will also be implemented:
> +
> +  - ``rte_memseg_list`` will change to include a boolean flag indicating
> +    whether a particular memseg list is externally allocated. This will have
> +    implications for any users of memseg-walk-related functions, as they will
> +    now have to skip externally allocated segments in most cases if the intent
> +    is to only iterate over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> +    as some socket ID's will now be representing externally allocated memory. No
> +    changes will be required for existing code as backwards compatibility will
> +    be kept, and those who do not use this feature will not see these extra
> +    socket ID's.
> +
>   * eal: both declaring and identifying devices will be streamlined in v18.08.
>     New functions will appear to query a specific port from buses, classes of
>     device and device drivers. Device declaration will be made coherent with the
> 

Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
                   ` (2 preceding siblings ...)
  2018-08-02  3:38  0% ` Jerin Jacob
@ 2018-08-02  5:58  0% ` Yongseok Koh
  2018-08-02  7:56  0% ` Maxime Coquelin
  2018-08-02  9:25  0% ` Shreyansh Jain
  5 siblings, 0 replies; 200+ results
From: Yongseok Koh @ 2018-08-02  5:58 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic,
	Thomas Monjalon, keith.wiles


> On Aug 1, 2018, at 5:07 AM, Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> 
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatches.dpdk.org%2Fproject%2Fdpdk%2Flist%2F%3Fseries%3D453%26state%3D*&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cc4caf7979dd943bb48c508d5f7a7661f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636687220727753880&amp;sdata=9yDGZf21ImYOUzM85n92cs%2BsgeafrrhG%2FlmJpWVpcWA%3D&amp;reserved=0
> [2] https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-July%2F108002.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7Cc4caf7979dd943bb48c508d5f7a7661f%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636687220727753880&amp;sdata=zn4ZfXModNRgnSp649JFWd4Byr7RxH8mgcM6IKJTGWk%3D&amp;reserved=0
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 14714fe94..629154711 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
> Deprecation Notices
> -------------------
> 
> +* eal: certain structures will change in EAL on account of upcoming external
> +  memory support. Aside from internal changes leading to an ABI break, the
> +  following externally visible changes will also be implemented:
> +
> +  - ``rte_memseg_list`` will change to include a boolean flag indicating
> +    whether a particular memseg list is externally allocated. This will have
> +    implications for any users of memseg-walk-related functions, as they will
> +    now have to skip externally allocated segments in most cases if the intent
> +    is to only iterate over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> +    as some socket ID's will now be representing externally allocated memory. No
> +    changes will be required for existing code as backwards compatibility will
> +    be kept, and those who do not use this feature will not see these extra
> +    socket ID's.
> +
> * eal: both declaring and identifying devices will be streamlined in v18.08.
>   New functions will appear to query a specific port from buses, classes of
>   device and device drivers. Device declaration will be made coherent with the
> -- 
Acked-by: Yongseok Koh <yskoh@mellanox.com>
 
Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
  2018-08-01 12:20  0% ` Wiles, Keith
  2018-08-02  2:37  0% ` Wang, Zhihong
@ 2018-08-02  3:38  0% ` Jerin Jacob
  2018-08-02  5:58  0% ` Yongseok Koh
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-08-02  3:38 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, thomas, keith.wiles

-----Original Message-----
> Date: Wed, 1 Aug 2018 13:07:16 +0100
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> To: dev@dpdk.org
> CC: Neil Horman <nhorman@tuxdriver.com>, John McNamara
>  <john.mcnamara@intel.com>, Marko Kovacevic <marko.kovacevic@intel.com>,
>  thomas@monjalon.net, keith.wiles@intel.com
> Subject: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory
>  support
> X-Mailer: git-send-email 1.7.0.7
> 
> External Email
> 
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>


Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

> ---
>  doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 14714fe94..629154711 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
> 
> +* eal: certain structures will change in EAL on account of upcoming external
> +  memory support. Aside from internal changes leading to an ABI break, the
> +  following externally visible changes will also be implemented:
> +
> +  - ``rte_memseg_list`` will change to include a boolean flag indicating
> +    whether a particular memseg list is externally allocated. This will have
> +    implications for any users of memseg-walk-related functions, as they will
> +    now have to skip externally allocated segments in most cases if the intent
> +    is to only iterate over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> +    as some socket ID's will now be representing externally allocated memory. No
> +    changes will be required for existing code as backwards compatibility will
> +    be kept, and those who do not use this feature will not see these extra
> +    socket ID's.
> +
>  * eal: both declaring and identifying devices will be streamlined in v18.08.
>    New functions will appear to query a specific port from buses, classes of
>    device and device drivers. Device declaration will be made coherent with the
> --
> 2.17.1

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
  2018-08-01 12:20  0% ` Wiles, Keith
@ 2018-08-02  2:37  0% ` Wang, Zhihong
  2018-08-02  3:38  0% ` Jerin Jacob
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Wang, Zhihong @ 2018-08-02  2:37 UTC (permalink / raw)
  To: Burakov, Anatoly, dev
  Cc: Neil Horman, Mcnamara, John, Kovacevic, Marko, thomas, Wiles, Keith



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Anatoly Burakov
> Sent: Wednesday, August 1, 2018 8:07 PM
> To: dev@dpdk.org
> Cc: Neil Horman <nhorman@tuxdriver.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; thomas@monjalon.net; Wiles, Keith
> <keith.wiles@intel.com>
> Subject: [dpdk-dev] [PATCH] doc: add deprecation notice on external
> memory support
> 
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 14714fe94..629154711 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
>  Deprecation Notices
>  -------------------
> 
> +* eal: certain structures will change in EAL on account of upcoming external
> +  memory support. Aside from internal changes leading to an ABI break, the
> +  following externally visible changes will also be implemented:
> +
> +  - ``rte_memseg_list`` will change to include a boolean flag indicating
> +    whether a particular memseg list is externally allocated. This will have
> +    implications for any users of memseg-walk-related functions, as they will
> +    now have to skip externally allocated segments in most cases if the intent
> +    is to only iterate over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK will gain additional
> meaning,
> +    as some socket ID's will now be representing externally allocated memory.
> No
> +    changes will be required for existing code as backwards compatibility will
> +    be kept, and those who do not use this feature will not see these extra
> +    socket ID's.
> +
>  * eal: both declaring and identifying devices will be streamlined in v18.08.
>    New functions will appear to query a specific port from buses, classes of
>    device and device drivers. Device declaration will be made coherent with
> the
> --
> 2.17.1

Acked-by: Wang, Zhihong <zhihong.wang@intel.com>

Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH] devtools: trap SIGINT is not recognizable to dash
  2018-08-01 10:40  0% ` [dpdk-dev] [dpdk-stable] " Mcnamara, John
  2018-08-01 13:09  0%   ` Varghese, Vipin
@ 2018-08-01 14:37  0%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-08-01 14:37 UTC (permalink / raw)
  To: Gavin Hu; +Cc: dev, Mcnamara, John, honnappa.nagarahalli, stable

> > When running checkpatch.sh, it generates the following error on some linux
> > distributions(like Debian) with Dash as the default shell interpreter.
> > trap: SIGINT: bad trap
> > 
> > The fix is to replace SIGINT with INT signal, it works for both bash and
> > dash.
> > 
> > Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>
> 
> Acked-by: John McNamara <john.mcnamara@intel.com>

Applied, thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Stable ABI status of rte_meter_[t|s]rtcm_profile_config
  2018-08-01 14:30  4%   ` Dumitrescu, Cristian
@ 2018-08-01 14:36  4%     ` Kevin Traynor
  0 siblings, 0 replies; 200+ results
From: Kevin Traynor @ 2018-08-01 14:36 UTC (permalink / raw)
  To: Dumitrescu, Cristian, dev; +Cc: Andy Green, Singh, Jasvinder

On 08/01/2018 03:30 PM, Dumitrescu, Cristian wrote:
> 
> 
>> -----Original Message-----
>> From: Kevin Traynor [mailto:ktraynor@redhat.com]
>> Sent: Wednesday, August 1, 2018 11:48 AM
>> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Cc: Andy Green <andy@warmcat.com>; Singh, Jasvinder
>> <jasvinder.singh@intel.com>
>> Subject: Re: [dpdk-dev] Stable ABI status of
>> rte_meter_[t|s]rtcm_profile_config
>>
>> On 05/28/2018 04:31 AM, Andy Green wrote:
>>> Hi -
>>>
>>> Between 18.02 and the putative 18.05 there were changes in the way the
>>> meter stuff deals with its config.
>>>
>>> I updated the related code in lagopus, but I get warnings about using
>>> the new APIs (it's the same for rte_meter_trtcm_profile_config())
>>>
>>> ./dpdk/meter.c: In function 'dpdk_register_meter':
>>> ./dpdk/meter.c:119:7: warning: 'rte_meter_srtcm_profile_config' is
>>> deprecated: Symbol is not yet part of stable ABI
>>> [-Wdeprecated-declarations]
>>>        rte_meter_srtcm_profile_config(&lband->sp, &param);
>>>        ^
>>> In file included from ./dpdk/meter.c:27:0:
>>> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
>>> declared here
>>>  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
>>>  ^
>>> ./dpdk/meter.c:132:7: warning: 'rte_meter_srtcm_profile_config' is
>>> deprecated: Symbol is not yet part of stable ABI
>>> [-Wdeprecated-declarations]
>>>        rte_meter_srtcm_profile_config(&lband->sp, &param);
>>>        ^
>>> In file included from ./dpdk/meter.c:27:0:
>>> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
>>> declared here
>>>  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
>>>
>>
>> Hi Cristian,
>>
>> Are these API still to be considered experimental in 18.08, or the tags
>> can be removed?
>>
>> Kevin.
> 
> No, we should remove the experimental tag on these functions.
> 

ok, I just did a quick compile tested patch. Will send.

>>
>>>
>>> As far as I can see this api change is not optional, it changes the
>>> parameters for related apis to require a struct prepared with these new
>>> apis.
>>>
>>> -Andy
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Stable ABI status of rte_meter_[t|s]rtcm_profile_config
  2018-08-01 10:47  4% ` Kevin Traynor
  2018-08-01 11:32  4%   ` Andy Green
@ 2018-08-01 14:30  4%   ` Dumitrescu, Cristian
  2018-08-01 14:36  4%     ` Kevin Traynor
  1 sibling, 1 reply; 200+ results
From: Dumitrescu, Cristian @ 2018-08-01 14:30 UTC (permalink / raw)
  To: Kevin Traynor, dev; +Cc: Andy Green, Singh, Jasvinder



> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor@redhat.com]
> Sent: Wednesday, August 1, 2018 11:48 AM
> To: dev@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Cc: Andy Green <andy@warmcat.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>
> Subject: Re: [dpdk-dev] Stable ABI status of
> rte_meter_[t|s]rtcm_profile_config
> 
> On 05/28/2018 04:31 AM, Andy Green wrote:
> > Hi -
> >
> > Between 18.02 and the putative 18.05 there were changes in the way the
> > meter stuff deals with its config.
> >
> > I updated the related code in lagopus, but I get warnings about using
> > the new APIs (it's the same for rte_meter_trtcm_profile_config())
> >
> > ./dpdk/meter.c: In function 'dpdk_register_meter':
> > ./dpdk/meter.c:119:7: warning: 'rte_meter_srtcm_profile_config' is
> > deprecated: Symbol is not yet part of stable ABI
> > [-Wdeprecated-declarations]
> >        rte_meter_srtcm_profile_config(&lband->sp, &param);
> >        ^
> > In file included from ./dpdk/meter.c:27:0:
> > /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
> > declared here
> >  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
> >  ^
> > ./dpdk/meter.c:132:7: warning: 'rte_meter_srtcm_profile_config' is
> > deprecated: Symbol is not yet part of stable ABI
> > [-Wdeprecated-declarations]
> >        rte_meter_srtcm_profile_config(&lband->sp, &param);
> >        ^
> > In file included from ./dpdk/meter.c:27:0:
> > /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
> > declared here
> >  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
> >
> 
> Hi Cristian,
> 
> Are these API still to be considered experimental in 18.08, or the tags
> can be removed?
> 
> Kevin.

No, we should remove the experimental tag on these functions.

> 
> >
> > As far as I can see this api change is not optional, it changes the
> > parameters for related apis to require a struct prepared with these new
> > apis.
> >
> > -Andy


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH] devtools: trap SIGINT is not recognizable to dash
  2018-08-01 10:40  0% ` [dpdk-dev] [dpdk-stable] " Mcnamara, John
@ 2018-08-01 13:09  0%   ` Varghese, Vipin
  2018-08-01 14:37  0%   ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Varghese, Vipin @ 2018-08-01 13:09 UTC (permalink / raw)
  To: dev, Gavin Hu, dev; +Cc: honnappa.nagarahalli, stable

Checked with Ubuntu 16.04.4 LTS, it works.

Acked-by: Vipin Varghese <vipin.varghese@intel.com>

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org>
> Sent: Wednesday, August 1, 2018 4:10 PM
> To: Gavin Hu <gavin.hu@arm.com>; dev@dpdk.org
> Cc: honnappa.nagarahalli@arm.com; stable@dpdk.org
> Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] devtools: trap SIGINT is not
> recognizable to dash
> 
> 
> 
> > -----Original Message-----
> > From: stable [mailto:stable-bounces@dpdk.org] On Behalf Of Gavin Hu
> > Sent: Wednesday, August 1, 2018 6:23 AM
> > To: dev@dpdk.org
> > Cc: honnappa.nagarahalli@arm.com; gavin.hu@arm.com; stable@dpdk.org
> > Subject: [dpdk-stable] [PATCH] devtools: trap SIGINT is not
> > recognizable to dash
> >
> > When running checkpatch.sh, it generates the following error on some
> > linux distributions(like Debian) with Dash as the default shell interpreter.
> > trap: SIGINT: bad trap
> >
> > The fix is to replace SIGINT with INT signal, it works for both bash
> > and dash.
> >
> > Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>
> 
> Acked-by: John McNamara <john.mcnamara@intel.com>
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
  2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
@ 2018-08-01 12:20  0% ` Wiles, Keith
  2018-08-02  2:37  0% ` Wang, Zhihong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Wiles, Keith @ 2018-08-01 12:20 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, Neil Horman, Mcnamara, John, Kovacevic, Marko, thomas



> On Aug 1, 2018, at 7:07 AM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> 
> Due to the upcoming external memory support [1], some API and ABI
> changes will be required. In addition, although the changes called
> out in the deprecation notice are not yet present in form of code
> in the published RFC itself, they are based on consensus on the
> mailing list [2] on how to best implement this feature.
> 
> [1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
> [2] https://mails.dpdk.org/archives/dev/2018-July/108002.html
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 14714fe94..629154711 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
> Deprecation Notices
> -------------------
> 
> +* eal: certain structures will change in EAL on account of upcoming external
> +  memory support. Aside from internal changes leading to an ABI break, the
> +  following externally visible changes will also be implemented:
> +
> +  - ``rte_memseg_list`` will change to include a boolean flag indicating
> +    whether a particular memseg list is externally allocated. This will have
> +    implications for any users of memseg-walk-related functions, as they will
> +    now have to skip externally allocated segments in most cases if the intent
> +    is to only iterate over internal DPDK memory.
> +  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> +    as some socket ID's will now be representing externally allocated memory. No
> +    changes will be required for existing code as backwards compatibility will
> +    be kept, and those who do not use this feature will not see these extra
> +    socket ID's.
> +
> * eal: both declaring and identifying devices will be streamlined in v18.08.
>   New functions will appear to query a specific port from buses, classes of
>   device and device drivers. Device declaration will be made coherent with the

Acked-by: Keith Wiles <keith.wiles@intel.com>

> -- 
> 2.17.1

Regards,
Keith

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support
@ 2018-08-01 12:07 13% Anatoly Burakov
  2018-08-01 12:20  0% ` Wiles, Keith
                   ` (5 more replies)
  0 siblings, 6 replies; 200+ results
From: Anatoly Burakov @ 2018-08-01 12:07 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman, John McNamara, Marko Kovacevic, thomas, keith.wiles

Due to the upcoming external memory support [1], some API and ABI
changes will be required. In addition, although the changes called
out in the deprecation notice are not yet present in form of code
in the published RFC itself, they are based on consensus on the
mailing list [2] on how to best implement this feature.

[1] http://patches.dpdk.org/project/dpdk/list/?series=453&state=*
[2] https://mails.dpdk.org/archives/dev/2018-July/108002.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 14714fe94..629154711 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,21 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: certain structures will change in EAL on account of upcoming external
+  memory support. Aside from internal changes leading to an ABI break, the
+  following externally visible changes will also be implemented:
+
+  - ``rte_memseg_list`` will change to include a boolean flag indicating
+    whether a particular memseg list is externally allocated. This will have
+    implications for any users of memseg-walk-related functions, as they will
+    now have to skip externally allocated segments in most cases if the intent
+    is to only iterate over internal DPDK memory.
+  - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
+    as some socket ID's will now be representing externally allocated memory. No
+    changes will be required for existing code as backwards compatibility will
+    be kept, and those who do not use this feature will not see these extra
+    socket ID's.
+
 * eal: both declaring and identifying devices will be streamlined in v18.08.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
-- 
2.17.1

^ permalink raw reply	[relevance 13%]

* Re: [dpdk-dev] Stable ABI status of rte_meter_[t|s]rtcm_profile_config
  2018-08-01 10:47  4% ` Kevin Traynor
@ 2018-08-01 11:32  4%   ` Andy Green
  2018-08-01 14:30  4%   ` Dumitrescu, Cristian
  1 sibling, 0 replies; 200+ results
From: Andy Green @ 2018-08-01 11:32 UTC (permalink / raw)
  To: Kevin Traynor, dev, Dumitrescu, Cristian; +Cc: Singh, Jasvinder



On 08/01/2018 06:47 PM, Kevin Traynor wrote:
> On 05/28/2018 04:31 AM, Andy Green wrote:
>> Hi -
>>
>> Between 18.02 and the putative 18.05 there were changes in the way the
>> meter stuff deals with its config.
>>
>> I updated the related code in lagopus, but I get warnings about using
>> the new APIs (it's the same for rte_meter_trtcm_profile_config())
>>
>> ./dpdk/meter.c: In function 'dpdk_register_meter':
>> ./dpdk/meter.c:119:7: warning: 'rte_meter_srtcm_profile_config' is
>> deprecated: Symbol is not yet part of stable ABI
>> [-Wdeprecated-declarations]
>>         rte_meter_srtcm_profile_config(&lband->sp, &param);
>>         ^
>> In file included from ./dpdk/meter.c:27:0:
>> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
>> declared here
>>   rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
>>   ^
>> ./dpdk/meter.c:132:7: warning: 'rte_meter_srtcm_profile_config' is
>> deprecated: Symbol is not yet part of stable ABI
>> [-Wdeprecated-declarations]
>>         rte_meter_srtcm_profile_config(&lband->sp, &param);
>>         ^
>> In file included from ./dpdk/meter.c:27:0:
>> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
>> declared here
>>   rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
>>
> 
> Hi Cristian,
> 
> Are these API still to be considered experimental in 18.08, or the tags
> can be removed?

... to be clear that these apis claimed to be 'experimental' in 18.05 at 
all, when they aren't, is already broken in 18.05.

The only question is whether they want to continue ignoring the breakage 
into 18.08+ so future generations can enjoy it.

-Andy

> Kevin.
> 
>>
>> As far as I can see this api change is not optional, it changes the
>> parameters for related apis to require a struct prepared with these new
>> apis.
>>
>> -Andy
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] Stable ABI status of rte_meter_[t|s]rtcm_profile_config
  @ 2018-08-01 10:47  4% ` Kevin Traynor
  2018-08-01 11:32  4%   ` Andy Green
  2018-08-01 14:30  4%   ` Dumitrescu, Cristian
  0 siblings, 2 replies; 200+ results
From: Kevin Traynor @ 2018-08-01 10:47 UTC (permalink / raw)
  To: dev, Dumitrescu, Cristian; +Cc: Andy Green, Singh, Jasvinder

On 05/28/2018 04:31 AM, Andy Green wrote:
> Hi -
> 
> Between 18.02 and the putative 18.05 there were changes in the way the
> meter stuff deals with its config.
> 
> I updated the related code in lagopus, but I get warnings about using
> the new APIs (it's the same for rte_meter_trtcm_profile_config())
> 
> ./dpdk/meter.c: In function 'dpdk_register_meter':
> ./dpdk/meter.c:119:7: warning: 'rte_meter_srtcm_profile_config' is
> deprecated: Symbol is not yet part of stable ABI
> [-Wdeprecated-declarations]
>        rte_meter_srtcm_profile_config(&lband->sp, &param);
>        ^
> In file included from ./dpdk/meter.c:27:0:
> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
> declared here
>  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
>  ^
> ./dpdk/meter.c:132:7: warning: 'rte_meter_srtcm_profile_config' is
> deprecated: Symbol is not yet part of stable ABI
> [-Wdeprecated-declarations]
>        rte_meter_srtcm_profile_config(&lband->sp, &param);
>        ^
> In file included from ./dpdk/meter.c:27:0:
> /home/agreen/lagopus/src/dpdk/build/include/rte_meter.h:86:1: note:
> declared here
>  rte_meter_srtcm_profile_config(struct rte_meter_srtcm_profile *p,
> 

Hi Cristian,

Are these API still to be considered experimental in 18.08, or the tags
can be removed?

Kevin.

> 
> As far as I can see this api change is not optional, it changes the
> parameters for related apis to require a struct prepared with these new
> apis.
> 
> -Andy

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH] devtools: trap SIGINT is not recognizable to dash
  2018-08-01  5:22  3% [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash Gavin Hu
@ 2018-08-01 10:40  0% ` Mcnamara, John
  2018-08-01 13:09  0%   ` Varghese, Vipin
  2018-08-01 14:37  0%   ` Thomas Monjalon
  2018-08-03 22:17  0% ` [dpdk-dev] " Stephen Hemminger
  1 sibling, 2 replies; 200+ results
From: Mcnamara, John @ 2018-08-01 10:40 UTC (permalink / raw)
  To: Gavin Hu, dev; +Cc: honnappa.nagarahalli, stable



> -----Original Message-----
> From: stable [mailto:stable-bounces@dpdk.org] On Behalf Of Gavin Hu
> Sent: Wednesday, August 1, 2018 6:23 AM
> To: dev@dpdk.org
> Cc: honnappa.nagarahalli@arm.com; gavin.hu@arm.com; stable@dpdk.org
> Subject: [dpdk-stable] [PATCH] devtools: trap SIGINT is not recognizable to
> dash
> 
> When running checkpatch.sh, it generates the following error on some linux
> distributions(like Debian) with Dash as the default shell interpreter.
> trap: SIGINT: bad trap
> 
> The fix is to replace SIGINT with INT signal, it works for both bash and
> dash.
> 
> Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash
@ 2018-08-01  5:22  3% Gavin Hu
  2018-08-01 10:40  0% ` [dpdk-dev] [dpdk-stable] " Mcnamara, John
  2018-08-03 22:17  0% ` [dpdk-dev] " Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Gavin Hu @ 2018-08-01  5:22 UTC (permalink / raw)
  To: dev; +Cc: honnappa.nagarahalli, gavin.hu, stable

When running checkpatch.sh, it generates the following error
on some linux distributions(like Debian) with Dash as the
default shell interpreter.
trap: SIGINT: bad trap

The fix is to replace SIGINT with INT signal, it works for
both bash and dash.

Fixes: 4bec48184e ("devtools: add checks for ABI symbol addition")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@amr.com>
---
 devtools/checkpatches.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 2509269df..ba795ad1d 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -29,7 +29,7 @@ clean_tmp_files() {
 	fi
 }
 
-trap "clean_tmp_files" SIGINT
+trap "clean_tmp_files" INT
 
 print_usage () {
 	cat <<- END_OF_HELP
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] devtools: check_symbol_change requires bash
@ 2018-07-31 15:14  3% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-07-31 15:14 UTC (permalink / raw)
  To: nhorman; +Cc: dev, Stephen Hemminger

The syntax of check_symbol_change uses some bash syntax.
It does not run correctly on Debian where /bin/sh is not the
same as /bin/bash.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/check-symbol-change.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
index 40b72073a975..19035a8d40e4 100755
--- a/devtools/check-symbol-change.sh
+++ b/devtools/check-symbol-change.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
 
-- 
2.18.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] mempool: remove deprecated functions
  2018-07-27 13:45  0%   ` Andrew Rybchenko
@ 2018-07-27 14:38  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-07-27 14:38 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Olivier Matz

27/07/2018 15:45, Andrew Rybchenko:
> On 27.07.2018 00:34, Thomas Monjalon wrote:
> > 11/07/2018 12:59, Andrew Rybchenko:
> >> Functions rte_mempool_populate_phys(), rte_mempool_virt2phy() and
> >> rte_mempool_populate_phys_tab() are just wrappers for corresponding
> >> IOVA functions and were deprecated in v17.11.
> >>
> >> Functions rte_mempool_xmem_create(), rte_mempool_xmem_size(),
> >> rte_mempool_xmem_usage() and rte_mempool_populate_iova_tab() were
> >> deprecated in v18.05 and removal was announced earlier in v18.02.
> >>
> >> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> >> ---
> >>   lib/librte_mempool/Makefile                |   3 -
> >>   lib/librte_mempool/meson.build             |   4 -
> >>   lib/librte_mempool/rte_mempool.c           | 181 +--------------------
> >>   lib/librte_mempool/rte_mempool.h           | 179 --------------------
> >>   lib/librte_mempool/rte_mempool_version.map |   6 -
> >>   5 files changed, 1 insertion(+), 372 deletions(-)
> > Please update the release notes, deprecation notice, and bump ABI version.
> 
> Will do. Deprecation notice which schedules removal of xmem functions
> was removed on previous release when these function are deprecated.
> Is it a problem? Should removal of already deprecated function pass
> deprecation (removal) announcement procedure once again?

No, it's OK.
We should have left the notice about removal but it's too late :)

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] mempool: remove deprecated functions
  2018-07-26 21:34  3% ` Thomas Monjalon
@ 2018-07-27 13:45  0%   ` Andrew Rybchenko
  2018-07-27 14:38  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-07-27 13:45 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Olivier Matz

On 27.07.2018 00:34, Thomas Monjalon wrote:
> 11/07/2018 12:59, Andrew Rybchenko:
>> Functions rte_mempool_populate_phys(), rte_mempool_virt2phy() and
>> rte_mempool_populate_phys_tab() are just wrappers for corresponding
>> IOVA functions and were deprecated in v17.11.
>>
>> Functions rte_mempool_xmem_create(), rte_mempool_xmem_size(),
>> rte_mempool_xmem_usage() and rte_mempool_populate_iova_tab() were
>> deprecated in v18.05 and removal was announced earlier in v18.02.
>>
>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>   lib/librte_mempool/Makefile                |   3 -
>>   lib/librte_mempool/meson.build             |   4 -
>>   lib/librte_mempool/rte_mempool.c           | 181 +--------------------
>>   lib/librte_mempool/rte_mempool.h           | 179 --------------------
>>   lib/librte_mempool/rte_mempool_version.map |   6 -
>>   5 files changed, 1 insertion(+), 372 deletions(-)
> Please update the release notes, deprecation notice, and bump ABI version.

Will do. Deprecation notice which schedules removal of xmem functions
was removed on previous release when these function are deprecated.
Is it a problem? Should removal of already deprecated function pass
deprecation (removal) announcement procedure once again?

Andrew.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/2] mempool: remove deprecated functions
@ 2018-07-27 13:46  2% Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-07-27 13:46 UTC (permalink / raw)
  To: dev; +Cc: Olivier Matz, Thomas Monjalon

Functions rte_mempool_populate_phys(), rte_mempool_virt2phy() and
rte_mempool_populate_phys_tab() are just wrappers for corresponding
IOVA functions and were deprecated in v17.11.

Functions rte_mempool_xmem_create(), rte_mempool_xmem_size(),
rte_mempool_xmem_usage() and rte_mempool_populate_iova_tab() were
deprecated in v18.05 and removal was announced earlier in v18.02.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 doc/guides/rel_notes/release_18_08.rst     |  12 +-
 lib/librte_mempool/Makefile                |   5 +-
 lib/librte_mempool/meson.build             |   6 +-
 lib/librte_mempool/rte_mempool.c           | 181 +--------------------
 lib/librte_mempool/rte_mempool.h           | 179 --------------------
 lib/librte_mempool/rte_mempool_version.map |   6 -
 6 files changed, 14 insertions(+), 375 deletions(-)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 5f2401401..165e413f0 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -166,6 +166,16 @@ API Changes
   - ``RTE_COMP_FF_OOP_SGL_IN_LB_OUT``
   - ``RTE_COMP_FF_OOP_LB_IN_SGL_OUT``
 
+* mempool: Following functions were deprecated and are removed in 18.08:
+
+  - ``rte_mempool_populate_iova_tab``
+  - ``rte_mempool_populate_phys_tab``
+  - ``rte_mempool_populate_phys`` (``rte_mempool_populate_iova`` should be used)
+  - ``rte_mempool_virt2phy`` (``rte_mempool_virt2iova`` should be used)
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+
 
 ABI Changes
 -----------
@@ -241,7 +251,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.4
-     librte_mempool.so.4
+   + librte_mempool.so.5
      librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index e3c32b14f..20bf63fbc 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -7,15 +7,12 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_mempool.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
-# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
-# from earlier deprecated rte_mempool_populate_phys_tab()
-CFLAGS += -Wno-deprecated-declarations
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 4
+LIBABIVER := 5
 
 # memseg walk is not yet part of stable API
 CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index d507e5511..38d7ae890 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -5,17 +5,13 @@ allow_experimental_apis = true
 
 extra_flags = []
 
-# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
-# from earlier deprecated rte_mempool_populate_phys_tab()
-extra_flags += '-Wno-deprecated-declarations'
-
 foreach flag: extra_flags
 	if cc.has_argument(flag)
 		cflags += flag
 	endif
 endforeach
 
-version = 4
+version = 5
 sources = files('rte_mempool.c', 'rte_mempool_ops.c',
 		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8c8b9f809..d48e53c7e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -227,9 +227,7 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Internal function to calculate required memory chunk size shared
- * by default implementation of the corresponding callback and
- * deprecated external function.
+ * Internal function to calculate required memory chunk size.
  */
 size_t
 rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
@@ -252,66 +250,6 @@ rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
 	return pg_num << pg_shift;
 }
 
-/*
- * Calculate maximum amount of memory required to store given number of objects.
- */
-size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
-{
-	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
-						pg_shift);
-}
-
-/*
- * Calculate how much memory would be actually required with the
- * given memory footprint to store required number of elements.
- */
-ssize_t
-rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
-	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, __rte_unused unsigned int flags)
-{
-	uint32_t elt_cnt = 0;
-	rte_iova_t start, end;
-	uint32_t iova_idx;
-	size_t pg_sz = (size_t)1 << pg_shift;
-
-	/* if iova is NULL, assume contiguous memory */
-	if (iova == NULL) {
-		start = 0;
-		end = pg_sz * pg_num;
-		iova_idx = pg_num;
-	} else {
-		start = iova[0];
-		end = iova[0] + pg_sz;
-		iova_idx = 1;
-	}
-	while (elt_cnt < elt_num) {
-
-		if (end - start >= total_elt_sz) {
-			/* enough contiguous memory, add an object */
-			start += total_elt_sz;
-			elt_cnt++;
-		} else if (iova_idx < pg_num) {
-			/* no room to store one obj, add a page */
-			if (end == iova[iova_idx]) {
-				end += pg_sz;
-			} else {
-				start = iova[iova_idx];
-				end = iova[iova_idx] + pg_sz;
-			}
-			iova_idx++;
-
-		} else {
-			/* no more page, return how many elements fit */
-			return -(size_t)elt_cnt;
-		}
-	}
-
-	return (size_t)iova_idx << pg_shift;
-}
-
 /* free a memchunk allocated with rte_memzone_reserve() */
 static void
 rte_mempool_memchunk_mz_free(__rte_unused struct rte_mempool_memhdr *memhdr,
@@ -423,63 +361,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	return ret;
 }
 
-int
-rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
-	phys_addr_t paddr, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
-	void *opaque)
-{
-	return rte_mempool_populate_iova(mp, vaddr, paddr, len, free_cb, opaque);
-}
-
-/* Add objects in the pool, using a table of physical pages. Return the
- * number of objects added, or a negative value on error.
- */
-int
-rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
-	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
-	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque)
-{
-	uint32_t i, n;
-	int ret, cnt = 0;
-	size_t pg_sz = (size_t)1 << pg_shift;
-
-	/* mempool must not be populated */
-	if (mp->nb_mem_chunks != 0)
-		return -EEXIST;
-
-	if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
-		return rte_mempool_populate_iova(mp, vaddr, RTE_BAD_IOVA,
-			pg_num * pg_sz, free_cb, opaque);
-
-	for (i = 0; i < pg_num && mp->populated_size < mp->size; i += n) {
-
-		/* populate with the largest group of contiguous pages */
-		for (n = 1; (i + n) < pg_num &&
-			     iova[i + n - 1] + pg_sz == iova[i + n]; n++)
-			;
-
-		ret = rte_mempool_populate_iova(mp, vaddr + i * pg_sz,
-			iova[i], n * pg_sz, free_cb, opaque);
-		if (ret < 0) {
-			rte_mempool_free_memchunks(mp);
-			return ret;
-		}
-		/* no need to call the free callback for next chunks */
-		free_cb = NULL;
-		cnt += ret;
-	}
-	return cnt;
-}
-
-int
-rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr,
-	const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
-	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque)
-{
-	return rte_mempool_populate_iova_tab(mp, vaddr, paddr, pg_num, pg_shift,
-			free_cb, opaque);
-}
-
 /* Populate the mempool with a virtual area. Return the number of
  * objects added, or a negative value on error.
  */
@@ -1065,66 +946,6 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 	return NULL;
 }
 
-/*
- * Create the mempool over already allocated chunk of memory.
- * That external memory buffer can consists of physically disjoint pages.
- * Setting vaddr to NULL, makes mempool to fallback to rte_mempool_create()
- * behavior.
- */
-struct rte_mempool *
-rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
-		unsigned cache_size, unsigned private_data_size,
-		rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-		rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-		int socket_id, unsigned flags, void *vaddr,
-		const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift)
-{
-	struct rte_mempool *mp = NULL;
-	int ret;
-
-	/* no virtual address supplied, use rte_mempool_create() */
-	if (vaddr == NULL)
-		return rte_mempool_create(name, n, elt_size, cache_size,
-			private_data_size, mp_init, mp_init_arg,
-			obj_init, obj_init_arg, socket_id, flags);
-
-	/* check that we have both VA and PA */
-	if (iova == NULL) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	/* Check that pg_shift parameter is valid. */
-	if (pg_shift > MEMPOOL_PG_SHIFT_MAX) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
-		private_data_size, socket_id, flags);
-	if (mp == NULL)
-		return NULL;
-
-	/* call the mempool priv initializer */
-	if (mp_init)
-		mp_init(mp, mp_init_arg);
-
-	ret = rte_mempool_populate_iova_tab(mp, vaddr, iova, pg_num, pg_shift,
-		NULL, NULL);
-	if (ret < 0 || ret != (int)mp->size)
-		goto fail;
-
-	/* call the object initializers */
-	if (obj_init)
-		rte_mempool_obj_iter(mp, obj_init, obj_init_arg);
-
-	return mp;
-
- fail:
-	rte_mempool_free(mp);
-	return NULL;
-}
-
 /* Return the number of entries in the mempool */
 unsigned int
 rte_mempool_avail_count(const struct rte_mempool *mp)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 1f59553b3..5d1602555 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -973,74 +973,6 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 		   rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
 		   int socket_id, unsigned flags);
 
-/**
- * @deprecated
- * Create a new mempool named *name* in memory.
- *
- * The pool contains n elements of elt_size. Its size is set to n.
- * This function uses ``memzone_reserve()`` to allocate the mempool header
- * (and the objects if vaddr is NULL).
- * Depending on the input parameters, mempool elements can be either allocated
- * together with the mempool header, or an externally provided memory buffer
- * could be used to store mempool objects. In later case, that external
- * memory buffer can consist of set of disjoint physical pages.
- *
- * @param name
- *   The name of the mempool.
- * @param n
- *   The number of elements in the mempool. The optimum size (in terms of
- *   memory usage) for a mempool is when n is a power of two minus one:
- *   n = (2^q - 1).
- * @param elt_size
- *   The size of each element.
- * @param cache_size
- *   Size of the cache. See rte_mempool_create() for details.
- * @param private_data_size
- *   The size of the private data appended after the mempool
- *   structure. This is useful for storing some private data after the
- *   mempool structure, as is done for rte_mbuf_pool for example.
- * @param mp_init
- *   A function pointer that is called for initialization of the pool,
- *   before object initialization. The user can initialize the private
- *   data in this function if needed. This parameter can be NULL if
- *   not needed.
- * @param mp_init_arg
- *   An opaque pointer to data that can be used in the mempool
- *   constructor function.
- * @param obj_init
- *   A function called for each object at initialization of the pool.
- *   See rte_mempool_create() for details.
- * @param obj_init_arg
- *   An opaque pointer passed to the object constructor function.
- * @param socket_id
- *   The *socket_id* argument is the socket identifier in the case of
- *   NUMA. The value can be *SOCKET_ID_ANY* if there is no NUMA
- *   constraint for the reserved zone.
- * @param flags
- *   Flags controlling the behavior of the mempool. See
- *   rte_mempool_create() for details.
- * @param vaddr
- *   Virtual address of the externally allocated memory buffer.
- *   Will be used to store mempool objects.
- * @param iova
- *   Array of IO addresses of the pages that comprises given memory buffer.
- * @param pg_num
- *   Number of elements in the iova array.
- * @param pg_shift
- *   LOG2 of the physical pages size.
- * @return
- *   The pointer to the new allocated mempool, on success. NULL on error
- *   with rte_errno set appropriately. See rte_mempool_create() for details.
- */
-__rte_deprecated
-struct rte_mempool *
-rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
-		unsigned cache_size, unsigned private_data_size,
-		rte_mempool_ctor_t *mp_init, void *mp_init_arg,
-		rte_mempool_obj_cb_t *obj_init, void *obj_init_arg,
-		int socket_id, unsigned flags, void *vaddr,
-		const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift);
-
 /**
  * Create an empty mempool
  *
@@ -1123,48 +1055,6 @@ int rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque);
 
-__rte_deprecated
-int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
-	phys_addr_t paddr, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
-	void *opaque);
-
-/**
- * @deprecated
- * Add physical memory for objects in the pool at init
- *
- * Add a virtually contiguous memory chunk in the pool where objects can
- * be instantiated. The IO addresses corresponding to the virtual
- * area are described in iova[], pg_num, pg_shift.
- *
- * @param mp
- *   A pointer to the mempool structure.
- * @param vaddr
- *   The virtual address of memory that should be used to store objects.
- * @param iova
- *   An array of IO addresses of each page composing the virtual area.
- * @param pg_num
- *   Number of elements in the iova array.
- * @param pg_shift
- *   LOG2 of the physical pages size.
- * @param free_cb
- *   The callback used to free this chunk when destroying the mempool.
- * @param opaque
- *   An opaque argument passed to free_cb.
- * @return
- *   The number of objects added on success.
- *   On error, the chunks are not added in the memory list of the
- *   mempool and a negative errno is returned.
- */
-__rte_deprecated
-int rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
-	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
-	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
-
-__rte_deprecated
-int rte_mempool_populate_phys_tab(struct rte_mempool *mp, char *vaddr,
-	const phys_addr_t paddr[], uint32_t pg_num, uint32_t pg_shift,
-	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
-
 /**
  * Add virtually contiguous memory for objects in the pool at init
  *
@@ -1746,13 +1636,6 @@ rte_mempool_virt2iova(const void *elt)
 	return hdr->iova;
 }
 
-__rte_deprecated
-static inline phys_addr_t
-rte_mempool_virt2phy(__rte_unused const struct rte_mempool *mp, const void *elt)
-{
-	return rte_mempool_virt2iova(elt);
-}
-
 /**
  * Check the consistency of mempool objects.
  *
@@ -1821,68 +1704,6 @@ struct rte_mempool *rte_mempool_lookup(const char *name);
 uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	struct rte_mempool_objsz *sz);
 
-/**
- * @deprecated
- * Get the size of memory required to store mempool elements.
- *
- * Calculate the maximum amount of memory required to store given number
- * of objects. Assume that the memory buffer will be aligned at page
- * boundary.
- *
- * Note that if object size is bigger than page size, then it assumes
- * that pages are grouped in subsets of physically continuous pages big
- * enough to store at least one object.
- *
- * @param elt_num
- *   Number of elements.
- * @param total_elt_sz
- *   The size of each element, including header and trailer, as returned
- *   by rte_mempool_calc_obj_size().
- * @param pg_shift
- *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
- * @param flags
- *  The mempool flags.
- * @return
- *   Required memory size aligned at page boundary.
- */
-__rte_deprecated
-size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
-	uint32_t pg_shift, unsigned int flags);
-
-/**
- * @deprecated
- * Get the size of memory required to store mempool elements.
- *
- * Calculate how much memory would be actually required with the given
- * memory footprint to store required number of objects.
- *
- * @param vaddr
- *   Virtual address of the externally allocated memory buffer.
- *   Will be used to store mempool objects.
- * @param elt_num
- *   Number of elements.
- * @param total_elt_sz
- *   The size of each element, including header and trailer, as returned
- *   by rte_mempool_calc_obj_size().
- * @param iova
- *   Array of IO addresses of the pages that comprises given memory buffer.
- * @param pg_num
- *   Number of elements in the iova array.
- * @param pg_shift
- *   LOG2 of the physical pages size.
- * @param flags
- *  The mempool flags.
- * @return
- *   On success, the number of bytes needed to store given number of
- *   objects, aligned to the given page size. If the provided memory
- *   buffer is too small, return a negative value whose absolute value
- *   is the actual number of elements that can be stored in that buffer.
- */
-__rte_deprecated
-ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
-	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags);
-
 /**
  * Walk list of all memory pools
  *
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 7091b954b..17cbca460 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -8,9 +8,6 @@ DPDK_2.0 {
 	rte_mempool_list_dump;
 	rte_mempool_lookup;
 	rte_mempool_walk;
-	rte_mempool_xmem_create;
-	rte_mempool_xmem_size;
-	rte_mempool_xmem_usage;
 
 	local: *;
 };
@@ -34,8 +31,6 @@ DPDK_16.07 {
 	rte_mempool_ops_table;
 	rte_mempool_populate_anon;
 	rte_mempool_populate_default;
-	rte_mempool_populate_phys;
-	rte_mempool_populate_phys_tab;
 	rte_mempool_populate_virt;
 	rte_mempool_register_ops;
 	rte_mempool_set_ops_byname;
@@ -46,7 +41,6 @@ DPDK_17.11 {
 	global:
 
 	rte_mempool_populate_iova;
-	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
 
-- 
2.17.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2] doc: add SPDX and copyright to rel notes
  @ 2018-07-27  4:54  4% ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2018-07-27  4:54 UTC (permalink / raw)
  To: dev; +Cc: thomas, Hemant Agrawal

using "The DPDK Contributors" as decided by techboard.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 doc/guides/rel_notes/deprecation.rst   | 3 +++
 doc/guides/rel_notes/release_16_04.rst | 3 +++
 doc/guides/rel_notes/release_16_07.rst | 3 +++
 doc/guides/rel_notes/release_16_11.rst | 3 +++
 doc/guides/rel_notes/release_17_02.rst | 3 +++
 doc/guides/rel_notes/release_17_05.rst | 3 +++
 doc/guides/rel_notes/release_17_08.rst | 3 +++
 doc/guides/rel_notes/release_17_11.rst | 3 +++
 doc/guides/rel_notes/release_18_02.rst | 3 +++
 doc/guides/rel_notes/release_18_05.rst | 3 +++
 doc/guides/rel_notes/release_18_08.rst | 3 +++
 doc/guides/rel_notes/release_2_2.rst   | 3 +++
 12 files changed, 36 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 14714fe..1ca93f3 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2018 The DPDK contributors
+
 ABI and API Deprecation
 =======================
 
diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst
index d0a09ef..e9f1e6f 100644
--- a/doc/guides/rel_notes/release_16_04.rst
+++ b/doc/guides/rel_notes/release_16_04.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2016 The DPDK contributors
+
 DPDK Release 16.04
 ==================
 
diff --git a/doc/guides/rel_notes/release_16_07.rst b/doc/guides/rel_notes/release_16_07.rst
index a8a3fc1..2904aac 100644
--- a/doc/guides/rel_notes/release_16_07.rst
+++ b/doc/guides/rel_notes/release_16_07.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2016 The DPDK contributors
+
 DPDK Release 16.07
 ==================
 
diff --git a/doc/guides/rel_notes/release_16_11.rst b/doc/guides/rel_notes/release_16_11.rst
index 8c9ec65..92e0ec6 100644
--- a/doc/guides/rel_notes/release_16_11.rst
+++ b/doc/guides/rel_notes/release_16_11.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2016 The DPDK contributors
+
 DPDK Release 16.11
 ==================
 
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 357965a..d6c1c56 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2017 The DPDK contributors
+
 DPDK Release 17.02
 ==================
 
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 6892284..6418240 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2017 The DPDK contributors
+
 DPDK Release 17.05
 ==================
 
diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
index 0bcdfb7..dc62240 100644
--- a/doc/guides/rel_notes/release_17_08.rst
+++ b/doc/guides/rel_notes/release_17_08.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2017 The DPDK contributors
+
 DPDK Release 17.08
 ==================
 
diff --git a/doc/guides/rel_notes/release_17_11.rst b/doc/guides/rel_notes/release_17_11.rst
index 5176d69..2a93af3 100644
--- a/doc/guides/rel_notes/release_17_11.rst
+++ b/doc/guides/rel_notes/release_17_11.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2017 The DPDK contributors
+
 DPDK Release 17.11
 ==================
 
diff --git a/doc/guides/rel_notes/release_18_02.rst b/doc/guides/rel_notes/release_18_02.rst
index 44b7de5..8e40311 100644
--- a/doc/guides/rel_notes/release_18_02.rst
+++ b/doc/guides/rel_notes/release_18_02.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2018 The DPDK contributors
+
 DPDK Release 18.02
 ==================
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 6b36493..8dc22b0 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2018 The DPDK contributors
+
 DPDK Release 18.05
 ==================
 
diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 5f24014..cf80448 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2018 The DPDK contributors
+
 DPDK Release 18.08
 ==================
 
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index bb7d15a..cea5c87 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -1,3 +1,6 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright 2016 The DPDK contributors
+
 DPDK Release 2.2
 ================
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 1/2] eal: remove deprecated function returning mbuf pool ops name
  @ 2018-07-26 21:42  3%   ` Thomas Monjalon
  2018-08-05 21:45  0%     ` Thomas Monjalon
  2018-08-07 21:34  7%   ` [dpdk-dev] [PATCH v3 " Olivier Matz
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-26 21:42 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, Hemant Agrawal, santosh.shukla, John McNamara

26/06/2018 11:56, Olivier Matz:
> rte_eal_mbuf_default_mempool_ops() is replaced by
> rte_mbuf_best_mempool_ops().
> 
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
> 
> v2:
> * remove rte_eal_mbuf_user_pool_ops from .map in next patch instead of this
> 
>  doc/guides/rel_notes/deprecation.rst    |  9 ---------
>  lib/librte_eal/bsdapp/eal/eal.c         | 10 ----------
>  lib/librte_eal/common/include/rte_eal.h | 11 -----------
>  lib/librte_eal/linuxapp/eal/eal.c       | 10 ----------
>  lib/librte_eal/rte_eal_version.map      |  1 -
>  5 files changed, 41 deletions(-)

Please bump ABI version and update the release notes.
Thanks

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] mempool: remove deprecated functions
  @ 2018-07-26 21:34  3% ` Thomas Monjalon
  2018-07-27 13:45  0%   ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-26 21:34 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Olivier Matz

11/07/2018 12:59, Andrew Rybchenko:
> Functions rte_mempool_populate_phys(), rte_mempool_virt2phy() and
> rte_mempool_populate_phys_tab() are just wrappers for corresponding
> IOVA functions and were deprecated in v17.11.
> 
> Functions rte_mempool_xmem_create(), rte_mempool_xmem_size(),
> rte_mempool_xmem_usage() and rte_mempool_populate_iova_tab() were
> deprecated in v18.05 and removal was announced earlier in v18.02.
> 
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>  lib/librte_mempool/Makefile                |   3 -
>  lib/librte_mempool/meson.build             |   4 -
>  lib/librte_mempool/rte_mempool.c           | 181 +--------------------
>  lib/librte_mempool/rte_mempool.h           | 179 --------------------
>  lib/librte_mempool/rte_mempool_version.map |   6 -
>  5 files changed, 1 insertion(+), 372 deletions(-)

Please update the release notes, deprecation notice, and bump ABI version.
Thanks

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-19 10:58  0%     ` László Vadkerti
@ 2018-07-26 13:48  0%       ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-07-26 13:48 UTC (permalink / raw)
  To: László Vadkerti, Wiles, Keith
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob

On 19-Jul-18 11:58 AM, László Vadkerti wrote:
>> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>
>>> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
>>>
>>> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
>>>> This is a proposal to enable using externally allocated memory in
>>>> DPDK.
>>>> In a nutshell, here is what is being done here:
>>>> - Index malloc heaps by NUMA node index, rather than NUMA node itself
>>>> - Add identifier string to malloc heap, to uniquely identify it
>>>> - Allow creating named heaps and add/remove memory to/from those
>>>> heaps
>>>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>>>    of externally allocated memory
>>>>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>>>> - Allow malloc and memzones to allocate from named heaps The
>>>> responsibility to ensure memory is accessible before using it is on
>>>> the shoulders of the user - there is no checking done with regards to
>>>> validity of the memory (nor could there be...).
>>>> The following limitations are present:
>>>> - No multiprocess support
>>>> - No thread safety
>>>> There is currently no way to allocate memory during initialization
>>>> stage, so even if multiprocess support is added, it is not guaranteed
>>>> to work because of underlying issues with mapping fbarrays in
>>>> secondary processes. This is not an issue in single process scenario,
>>>> but it may be an issue in a multiprocess scenario in case where
>>>> primary doesn't intend to share the externally allocated memory, yet
>>>> adding such memory could fail because some other process failed to
>>>> attach to this shared memory when it wasn't needed.
>>>> Anatoly Burakov (11):
>>>>    mem: allow memseg lists to be marked as external
>>>>    eal: add function to rerieve socket index by socket ID
>>>>    malloc: index heaps using heap ID rather than NUMA node
>>>>    malloc: add name to malloc heaps
>>>>    malloc: enable retrieving statistics from named heaps
>>>>    malloc: enable allocating from named heaps
>>>>    malloc: enable creating new malloc heaps
>>>>    malloc: allow adding memory to named heaps
>>>>    malloc: allow removing memory from named heaps
>>>>    malloc: allow destroying heaps
>>>>    memzone: enable reserving memory from named heaps
>>>>   config/common_base                            |   1 +
>>>>   lib/librte_eal/common/eal_common_lcore.c      |  15 +
>>>>   lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>>>>   lib/librte_eal/common/eal_common_memzone.c    | 283
>> ++++++++++++++----
>>>>   .../common/include/rte_eal_memconfig.h        |   5 +-
>>>>   lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>>>>   lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>>>>   .../common/include/rte_malloc_heap.h          |   2 +
>>>>   lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>>>>   lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>>>>   lib/librte_eal/common/malloc_heap.h           |  26 ++
>>>>   lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>>>>   lib/librte_eal/rte_eal_version.map            |  10 +
>>>>   13 files changed, 1118 insertions(+), 109 deletions(-)
>>>
>>> So, now that the RFC is out, i would like to ask a general question.
>>>
>>> One other thing that this patchset is missing, is the ability for data
>> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.
>> Currently, we can kinda sorta do that with various _init() API's (initializing a
>> data structure over already allocated memzone), but this is not ideal and is a
>> hassle for anyone using external memory in DPDK.
>>>
>>> There are basically four ways to approach this problem (that i can see).
>>>
>>> First way is to change "socket ID" to mean "heap ID" everywhere. This has
>> an upside of having a consistent API to allocate from internal and external
>> heaps, with little to no API additions, only internal changes to account for the
>> fact that "socket ID" is now "heap ID".
>>>
>>> However, there is a massive downside to this approach: it is a *giant* API
>> change, and it's also a giant *ABI-compatible* API change. Meaning,
>> replacing socket ID with heap ID will not cause compile failures for old code,
>> which would result in many subtle bugs in already existing codebases. So,
>> while in the perfect world this would've been my preferred approach,
>> realistically i think this is a very, very bad idea.
>>>
>>> Second one is to add a separate "heap name" API's to everything. This has
>> an upside of clean separation between allocation from internal and external
>> heaps. (well, whether it's an upside is debatable...) This is the approach i
>> expected to take when i was creating this patchset.
>>>
>>> The downside is that we have to add new API's to every library and every
>> DPDK data structure, to allow explicit allocation from external heaps. We will
>> have to maintain both, and things like hardware drivers will need to have a
>> way to indicate the need to allocate things from a particular external heap.
>>>
>>> The third way is to expose the "heap ID" externally, and allow a single,
>> unified API to reserve memory. That is, create an API that would map either
>> a NUMA node ID or a heap name to an ID, and allow reserving memory
>> through that ID regardless of whether it's internal or external memory. This
>> would also allow to gradually phase out socket-based ID's in favor of heap ID
>> API, should we choose to do so.
>>>
>>> The downside for this is, it adds a layer of indirection between socket ID
>> and reserving memory on a particular NUMA node, and it makes it hard to
>> produce a single value of "heap ID" in such a way as to replicate current
>> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to
>> explicitly try to allocate on all sockets, unless we keep old API's around in
>> parallel.
>>>
>>> Finally, a fourth way would be to abuse the socket ID to also mean
>> something else, which is an approach i've seen numerous times already, and
>> one that i don't like. We could register new heaps as a new, fake socket ID,
>> and use that to address external heaps (each heap would get its own
>> socket). So, keep current socket ID behavior, but for non-existent sockets it
>> would be possible to be registered as a fake socket pointing to an external
>> heap.
>>>
>>> The upside for this approach would be that no API changes are required
>> whatsoever to existing libraries - this scheme is compatible with both internal
>> and external heaps without adding a separate API.
>>>
>>> The downside is bad semantics - "special" sockets, handling of
>>> SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that
>>> happens to correspond to an existing external heap", and many other
>>> things that can be confusing. I don't like this option, but it's an
>>> option :)
>>>
>>> Thoughts? Comments?
>>
>> #1 is super clean, but very disruptive to everyone. Very Bad IMO
>> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at
>> least in the external heap cases.
>> #3 not sure I fully understand it, but reproducing heap IDs for testing is a
>> problem and requires new/old APIs
>>
>> #4 Very easy to add, IMO it is clean and very small disruption to developers.
>> It does require the special handling, but I feel it is OK and can be explained in
>> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is
>> normal socket and > 64K is external id.
>>
>> My vote would be #4, as it seems the least risk and work. :-)
>>
> We are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.
> Using an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.
> The allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).

Hi Laszlo,

That depends on what you mean by "all other processes". If they are all 
part of the primary-secondary process prefix, then my plan is to enable 
private and shared heaps - i.e. a heap is either available to a single 
process, or it is available to some or all processes within a prefix.

It is also not possible to share the same area with different process 
prefixes (i.e. between two different primaries) because each of the 
processes will think it owns the entire memory and will do with it as it 
pleases. Using the same memory region with two different process 
prefixes will break many assumptions heap has - for example, it relies 
on a per-heap lock to control access to the heap, and that will not work 
if you map the same memory area into multiple primary processes. I do 
not foresee a mechanism to fix this problem within DPDK, but obviously 
if you have any suggestions, they will be considered :)

The reason we have to care about private vs. shared heaps is because of 
how DPDK handles memory management. In order for DPDK facilities such as 
rte_mem_virt2iova() or rte_memseg_walk() to work, we need to keep track 
of the pages we use for the heap - i.e. from DPDK's point of view, 
external memory behaves just like regular memory and is tracked using 
the same method of keeping page tables around (see rte_memseg_list).

These page tables need to be shared between all processes that use a 
specific heap. This introduces an inherent point of failure - you may 
mmap() the *area itself* successfully at the same address, but you may 
still fail to *attach to the page tables*, which will cause a particular 
heap to not be available in a process. This is a problem that i do not 
see a solution for at the moment, and it is something that users 
attempting to use external memory in secondary processes will have to 
deal with.

I haven't yet decided whether this should be automatic (i.e. shared 
heaps "automagically" appearing in all processes) or manual (make the 
user explicitly attach to an externally allocated heap in each process 
within the prefix). I would tend to go for the latter as it gives the 
user more control, and it is easier to implement because there's no need 
to engage IPC to make this work.

> Anyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id  and problem solved with #4 :)
> 
> The cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.
> Meaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”
> SOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.
> 
> If changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.
> Socket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.
> The socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.
> Using socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.
> 
> Irrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.

So, in other words, the consesus seems to be that we need to stay with 
the old socket_id and just use weird socket ID's for external heaps. 
Okay, so be it. Less work for me implementing it :)

> 
> It would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.

I do not see physical socket ID of externally allocated memory as a 
matter of concern for DPDK. I think this information should be up to the 
user application to handle, not DPDK. From my point of view, we 
shouldn't care where the memory came from, we just facilitate using it. 
If the user chooses to store additional metadata about the memory 
somewhere else - that is his prerogative, but i don't think having a 
provision for "physical socket ID" etc for external heaps should be in DPDK.

> 
>>>
>>> I myself still favor the second way, however there are good arguments to
>> be made for each of these options.
>>>
>>> --
>>> Thanks,
>>> Anatoly
>>
>> Regards,
>> Keith
> 
> Thanks,
>   Laszlo
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: fix checkpatch for filename with space
  2018-07-20 18:25  0% ` Neil Horman
@ 2018-07-20 20:56  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-07-20 20:56 UTC (permalink / raw)
  To: dev; +Cc: Neil Horman

20/07/2018 20:25, Neil Horman:
> On Fri, Jul 20, 2018 at 01:41:03PM +0200, Thomas Monjalon wrote:
> > If the patch filename or the temporary file path have a space
> > in their name, the script checkpatches.sh does not work.
> > The variables for the filenames must be enclosed in quotes
> > in order to preserve spaces.
> > 
> > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > 
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> 
> > ---
> > 
> > Strangely, I did a fix for check-symbol-change.sh and I forgot
> > to fix checkpatches.sh.
> > 
> > ---

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: fix checkpatch for filename with space
  2018-07-20 11:41  3% [dpdk-dev] [PATCH] devtools: fix checkpatch " Thomas Monjalon
@ 2018-07-20 18:25  0% ` Neil Horman
  2018-07-20 20:56  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-07-20 18:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Fri, Jul 20, 2018 at 01:41:03PM +0200, Thomas Monjalon wrote:
> If the patch filename or the temporary file path have a space
> in their name, the script checkpatches.sh does not work.
> The variables for the filenames must be enclosed in quotes
> in order to preserve spaces.
> 
> Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>

> ---
> 
> Strangely, I did a fix for check-symbol-change.sh and I forgot
> to fix checkpatches.sh.
> 
> ---
>  devtools/checkpatches.sh | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
> index 1439bce94..e97a4f2c9 100755
> --- a/devtools/checkpatches.sh
> +++ b/devtools/checkpatches.sh
> @@ -25,7 +25,7 @@ NEW_TYPEDEFS,COMPARISON_TO_NULL"
>  
>  clean_tmp_files() {
>  	if echo $tmpinput | grep -q '^checkpatches\.' ; then
> -		rm -f $tmpinput
> +		rm -f "$tmpinput"
>  	fi
>  }
>  
> @@ -77,13 +77,13 @@ check () { # <patch> <commit> <title>
>  	elif [ -n "$2" ] ; then
>  		tmpinput=$(mktemp checkpatches.XXXXXX)
>  		git format-patch --find-renames \
> -		--no-stat --stdout -1 $commit > $tmpinput
> +		--no-stat --stdout -1 $commit > "$tmpinput"
>  	else
>  		tmpinput=$(mktemp checkpatches.XXXXXX)
> -		cat > $tmpinput
> +		cat > "$tmpinput"
>  	fi
>  
> -	report=$($DPDK_CHECKPATCH_PATH $options $tmpinput 2>/dev/null)
> +	report=$($DPDK_CHECKPATCH_PATH $options "$tmpinput" 2>/dev/null)
>  	if [ $? -ne 0 ] ; then
>  		$verbose || printf '\n### %s\n\n' "$3"
>  		printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: fix checkpatch for filename with space
@ 2018-07-20 11:41  3% Thomas Monjalon
  2018-07-20 18:25  0% ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-20 11:41 UTC (permalink / raw)
  To: nhorman; +Cc: dev

If the patch filename or the temporary file path have a space
in their name, the script checkpatches.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---

Strangely, I did a fix for check-symbol-change.sh and I forgot
to fix checkpatches.sh.

---
 devtools/checkpatches.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 1439bce94..e97a4f2c9 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -25,7 +25,7 @@ NEW_TYPEDEFS,COMPARISON_TO_NULL"
 
 clean_tmp_files() {
 	if echo $tmpinput | grep -q '^checkpatches\.' ; then
-		rm -f $tmpinput
+		rm -f "$tmpinput"
 	fi
 }
 
@@ -77,13 +77,13 @@ check () { # <patch> <commit> <title>
 	elif [ -n "$2" ] ; then
 		tmpinput=$(mktemp checkpatches.XXXXXX)
 		git format-patch --find-renames \
-		--no-stat --stdout -1 $commit > $tmpinput
+		--no-stat --stdout -1 $commit > "$tmpinput"
 	else
 		tmpinput=$(mktemp checkpatches.XXXXXX)
-		cat > $tmpinput
+		cat > "$tmpinput"
 	fi
 
-	report=$($DPDK_CHECKPATCH_PATH $options $tmpinput 2>/dev/null)
+	report=$($DPDK_CHECKPATCH_PATH $options "$tmpinput" 2>/dev/null)
 	if [ $? -ne 0 ] ; then
 		$verbose || printf '\n### %s\n\n' "$3"
 		printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] devtools: fix symbol check for filename with space
  2018-07-19 15:37  0%       ` Neil Horman
@ 2018-07-20  9:37  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-07-20  9:37 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

19/07/2018 17:37, Neil Horman:
> On Thu, Jul 19, 2018 at 02:09:47PM +0200, Thomas Monjalon wrote:
> > 19/07/2018 13:14, Neil Horman:
> > > On Wed, Jul 18, 2018 at 11:26:58PM +0200, Thomas Monjalon wrote:
> > > > If the patch filename or the temporary file path have a space
> > > > in their name, the script check-symbol-change.sh does not work.
> > > > The variables for the filenames must be enclosed in quotes
> > > > in order to preserve spaces.
> > > > 
> > > > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > > > Cc: nhorman@tuxdriver.com
> > > > 
> > > > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > > > ---
> > > > v2: one occurence of "$mapfile" was missed in v1
> > > I don't have any issue with this change, but the only way I see to introduce a
> > > space into the tempfile name is to set $TMPDIR to '/path/with silly spaces' or
> > > something simmilar.  I think we discussed this before, but it would alsmot make
> > > sense to, instead of quoting everything, instead specify -p ./ to ensure the
> > > tempfile has no spaces.
> > 
> > When I save patches from my inbox, the filename has some spaces.
> > 
> > I think quoting variables is mandatory.
> > 
> > 
> Sure, it doesn't hurt anything really
> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>

Applied

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] devtools: fix symbol check for filename with space
  2018-07-19 12:09  0%     ` Thomas Monjalon
@ 2018-07-19 15:37  0%       ` Neil Horman
  2018-07-20  9:37  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-07-19 15:37 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, Jul 19, 2018 at 02:09:47PM +0200, Thomas Monjalon wrote:
> 19/07/2018 13:14, Neil Horman:
> > On Wed, Jul 18, 2018 at 11:26:58PM +0200, Thomas Monjalon wrote:
> > > If the patch filename or the temporary file path have a space
> > > in their name, the script check-symbol-change.sh does not work.
> > > The variables for the filenames must be enclosed in quotes
> > > in order to preserve spaces.
> > > 
> > > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > > Cc: nhorman@tuxdriver.com
> > > 
> > > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > > ---
> > > v2: one occurence of "$mapfile" was missed in v1
> > I don't have any issue with this change, but the only way I see to introduce a
> > space into the tempfile name is to set $TMPDIR to '/path/with silly spaces' or
> > something simmilar.  I think we discussed this before, but it would alsmot make
> > sense to, instead of quoting everything, instead specify -p ./ to ensure the
> > tempfile has no spaces.
> 
> When I save patches from my inbox, the filename has some spaces.
> 
> I think quoting variables is mandatory.
> 
> 
> 
> 
Sure, it doesn't hurt anything really

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] devtools: fix symbol check for filename with space
  2018-07-19 11:14  0%   ` Neil Horman
@ 2018-07-19 12:09  0%     ` Thomas Monjalon
  2018-07-19 15:37  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-19 12:09 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

19/07/2018 13:14, Neil Horman:
> On Wed, Jul 18, 2018 at 11:26:58PM +0200, Thomas Monjalon wrote:
> > If the patch filename or the temporary file path have a space
> > in their name, the script check-symbol-change.sh does not work.
> > The variables for the filenames must be enclosed in quotes
> > in order to preserve spaces.
> > 
> > Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> > Cc: nhorman@tuxdriver.com
> > 
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> > v2: one occurence of "$mapfile" was missed in v1
> I don't have any issue with this change, but the only way I see to introduce a
> space into the tempfile name is to set $TMPDIR to '/path/with silly spaces' or
> something simmilar.  I think we discussed this before, but it would alsmot make
> sense to, instead of quoting everything, instead specify -p ./ to ensure the
> tempfile has no spaces.

When I save patches from my inbox, the filename has some spaces.

I think quoting variables is mandatory.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] devtools: fix symbol check for filename with space
  2018-07-18 21:26  3% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
@ 2018-07-19 11:14  0%   ` Neil Horman
  2018-07-19 12:09  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-07-19 11:14 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Wed, Jul 18, 2018 at 11:26:58PM +0200, Thomas Monjalon wrote:
> If the patch filename or the temporary file path have a space
> in their name, the script check-symbol-change.sh does not work.
> The variables for the filenames must be enclosed in quotes
> in order to preserve spaces.
> 
> Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
> Cc: nhorman@tuxdriver.com
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
> v2: one occurence of "$mapfile" was missed in v1
I don't have any issue with this change, but the only way I see to introduce a
space into the tempfile name is to set $TMPDIR to '/path/with silly spaces' or
something simmilar.  I think we discussed this before, but it would alsmot make
sense to, instead of quoting everything, instead specify -p ./ to ensure the
tempfile has no spaces.

Neil

> ---
>  devtools/check-symbol-change.sh | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
> index 9952a8d66..40b72073a 100755
> --- a/devtools/check-symbol-change.sh
> +++ b/devtools/check-symbol-change.sh
> @@ -7,7 +7,7 @@ build_map_changes()
>  	local fname=$1
>  	local mapdb=$2
>  
> -	cat $fname | awk '
> +	cat "$fname" | awk '
>  		# Initialize our variables
>  		BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
>  
> @@ -71,10 +71,10 @@ build_map_changes()
>  					print map " " sym " unknown del"
>  				}
>  			}
> -		}' > ./$mapdb
> +		}' > "$mapdb"
>  
> -		sort -u $mapdb > ./$mapdb.2
> -		mv -f $mapdb.2 $mapdb
> +		sort -u "$mapdb" > "$mapdb.2"
> +		mv -f "$mapdb.2" "$mapdb"
>  
>  }
>  
> @@ -111,7 +111,7 @@ check_for_rule_violations()
>  				# to be moving from an already supported
>  				# section or its a violation
>  				grep -q \
> -				"$mname $symname [^EXPERIMENTAL] del" $mapdb
> +				"$mname $symname [^EXPERIMENTAL] del" "$mapdb"
>  				if [ $? -ne 0 ]
>  				then
>  					echo -n "ERROR: symbol $symname "
> @@ -133,7 +133,7 @@ check_for_rule_violations()
>  				echo "gone through the deprecation process"
>  			fi
>  		fi
> -	done < $mapdb
> +	done < "$mapdb"
>  
>  	return $ret
>  }
> @@ -146,14 +146,14 @@ exit_code=1
>  
>  clean_and_exit_on_sig()
>  {
> -	rm -f $mapfile
> +	rm -f "$mapfile"
>  	exit $exit_code
>  }
>  
> -build_map_changes $patch $mapfile
> -check_for_rule_violations $mapfile
> +build_map_changes "$patch" "$mapfile"
> +check_for_rule_violations "$mapfile"
>  exit_code=$?
>  
> -rm -f $mapfile
> +rm -f "$mapfile"
>  
>  exit $exit_code
> -- 
> 2.17.1
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-13 17:56  0%   ` Wiles, Keith
@ 2018-07-19 10:58  0%     ` László Vadkerti
  2018-07-26 13:48  0%       ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: László Vadkerti @ 2018-07-19 10:58 UTC (permalink / raw)
  To: Wiles, Keith, Burakov, Anatoly
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob

> On Jul 13, 2018, at 7:57 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
>
> > On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> >
> > On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
> >> This is a proposal to enable using externally allocated memory in
> >> DPDK.
> >> In a nutshell, here is what is being done here:
> >> - Index malloc heaps by NUMA node index, rather than NUMA node itself
> >> - Add identifier string to malloc heap, to uniquely identify it
> >> - Allow creating named heaps and add/remove memory to/from those
> >> heaps
> >> - Allocate memseg lists at runtime, to keep track of IOVA addresses
> >>   of externally allocated memory
> >>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> >> - Allow malloc and memzones to allocate from named heaps The
> >> responsibility to ensure memory is accessible before using it is on
> >> the shoulders of the user - there is no checking done with regards to
> >> validity of the memory (nor could there be...).
> >> The following limitations are present:
> >> - No multiprocess support
> >> - No thread safety
> >> There is currently no way to allocate memory during initialization
> >> stage, so even if multiprocess support is added, it is not guaranteed
> >> to work because of underlying issues with mapping fbarrays in
> >> secondary processes. This is not an issue in single process scenario,
> >> but it may be an issue in a multiprocess scenario in case where
> >> primary doesn't intend to share the externally allocated memory, yet
> >> adding such memory could fail because some other process failed to
> >> attach to this shared memory when it wasn't needed.
> >> Anatoly Burakov (11):
> >>   mem: allow memseg lists to be marked as external
> >>   eal: add function to rerieve socket index by socket ID
> >>   malloc: index heaps using heap ID rather than NUMA node
> >>   malloc: add name to malloc heaps
> >>   malloc: enable retrieving statistics from named heaps
> >>   malloc: enable allocating from named heaps
> >>   malloc: enable creating new malloc heaps
> >>   malloc: allow adding memory to named heaps
> >>   malloc: allow removing memory from named heaps
> >>   malloc: allow destroying heaps
> >>   memzone: enable reserving memory from named heaps
> >>  config/common_base                            |   1 +
> >>  lib/librte_eal/common/eal_common_lcore.c      |  15 +
> >>  lib/librte_eal/common/eal_common_memory.c     |  51 +++-
> >>  lib/librte_eal/common/eal_common_memzone.c    | 283
> ++++++++++++++----
> >>  .../common/include/rte_eal_memconfig.h        |   5 +-
> >>  lib/librte_eal/common/include/rte_lcore.h     |  19 +-
> >>  lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
> >>  .../common/include/rte_malloc_heap.h          |   2 +
> >>  lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
> >>  lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
> >>  lib/librte_eal/common/malloc_heap.h           |  26 ++
> >>  lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
> >>  lib/librte_eal/rte_eal_version.map            |  10 +
> >>  13 files changed, 1118 insertions(+), 109 deletions(-)
> >
> > So, now that the RFC is out, i would like to ask a general question.
> >
> > One other thing that this patchset is missing, is the ability for data
> structures (e.g. hash, mempool, etc.) to be allocated from external heaps.
> Currently, we can kinda sorta do that with various _init() API's (initializing a
> data structure over already allocated memzone), but this is not ideal and is a
> hassle for anyone using external memory in DPDK.
> >
> > There are basically four ways to approach this problem (that i can see).
> >
> > First way is to change "socket ID" to mean "heap ID" everywhere. This has
> an upside of having a consistent API to allocate from internal and external
> heaps, with little to no API additions, only internal changes to account for the
> fact that "socket ID" is now "heap ID".
> >
> > However, there is a massive downside to this approach: it is a *giant* API
> change, and it's also a giant *ABI-compatible* API change. Meaning,
> replacing socket ID with heap ID will not cause compile failures for old code,
> which would result in many subtle bugs in already existing codebases. So,
> while in the perfect world this would've been my preferred approach,
> realistically i think this is a very, very bad idea.
> >
> > Second one is to add a separate "heap name" API's to everything. This has
> an upside of clean separation between allocation from internal and external
> heaps. (well, whether it's an upside is debatable...) This is the approach i
> expected to take when i was creating this patchset.
> >
> > The downside is that we have to add new API's to every library and every
> DPDK data structure, to allow explicit allocation from external heaps. We will
> have to maintain both, and things like hardware drivers will need to have a
> way to indicate the need to allocate things from a particular external heap.
> >
> > The third way is to expose the "heap ID" externally, and allow a single,
> unified API to reserve memory. That is, create an API that would map either
> a NUMA node ID or a heap name to an ID, and allow reserving memory
> through that ID regardless of whether it's internal or external memory. This
> would also allow to gradually phase out socket-based ID's in favor of heap ID
> API, should we choose to do so.
> >
> > The downside for this is, it adds a layer of indirection between socket ID
> and reserving memory on a particular NUMA node, and it makes it hard to
> produce a single value of "heap ID" in such a way as to replicate current
> functionality of allocating with SOCKET_ID_ANY. Most likely user will have to
> explicitly try to allocate on all sockets, unless we keep old API's around in
> parallel.
> >
> > Finally, a fourth way would be to abuse the socket ID to also mean
> something else, which is an approach i've seen numerous times already, and
> one that i don't like. We could register new heaps as a new, fake socket ID,
> and use that to address external heaps (each heap would get its own
> socket). So, keep current socket ID behavior, but for non-existent sockets it
> would be possible to be registered as a fake socket pointing to an external
> heap.
> >
> > The upside for this approach would be that no API changes are required
> whatsoever to existing libraries - this scheme is compatible with both internal
> and external heaps without adding a separate API.
> >
> > The downside is bad semantics - "special" sockets, handling of
> > SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that
> > happens to correspond to an existing external heap", and many other
> > things that can be confusing. I don't like this option, but it's an
> > option :)
> >
> > Thoughts? Comments?
> 
> #1 is super clean, but very disruptive to everyone. Very Bad IMO
> #2 is also clean, but adds a lot of new APIs that everyone needs to use or at
> least in the external heap cases.
> #3 not sure I fully understand it, but reproducing heap IDs for testing is a
> problem and requires new/old APIs
> 
> #4 Very easy to add, IMO it is clean and very small disruption to developers.
> It does require the special handling, but I feel it is OK and can be explained in
> the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is
> normal socket and > 64K is external id.
> 
> My vote would be #4, as it seems the least risk and work. :-)
> 
We are living with #4 (overloaded socket_ids) since ~5 years now but it indeed generates some confusion and it is a kind of hack so it may not be the best choice going forward in official releases but for sure is the easiest/simplest solution requiring the least modifications.
Using an overloaded socket_id is especially disturbing in the dump memory config printout where the user will see multiple socket ids on a single socket system or more than the available real number of sockets, however it could still be explained in the notes and the documentation.
The allocation behavior with SOCKET_ID_ANY is also a question as I think it shouldn’t roll over to allocate memory in the external heap, we especially disabled this feature in our implementation. The reason behind is that the external memory may be a limited resource where only explicit allocation requests would be allowed and also in a multi-process environment we may not want all external heaps to be mapped into all other processes address space meaning that not all heaps are accessible from every process (I’m not sure if it is planned to be supported though but it would be an important and useful feature based on our experiences).
Anyway I think the confusion with this option comes due to the misleading “socket_id” name which would not really mean socket id anymore. So we should probably just document it as pseudo socket_id  and problem solved with #4 :)

The cleanest solution in my opinion would be #1 which could be combined with #4 so that the physical socket_id could be directly passed as the heap_id (or rather call it allocation id or just location?) so that backward compatibility could also be kept.
Meaning to apply #1, change “socket_id” to “heap_id” (or “alloc_id”?) in all functions which are today expecting the socket_id to indicate the location of the allocations but keep a direct mapping from socket_id to heap_id, e.g. as Keith suggested lower range of heap_id would be equivalent to the socket_id and upper range would be the external id, this way even existing applications would still work without changing the code just by passing the socket_id in the heap_id parameter. However it is a question what would happen with the socket_id stored in data structures such as struct rte_mempool where socket_id is stored with a meaning “Socket id passed at create.”
SOCKET_ID_ANY would only mean lower range of heap_ids (physical socket ids) but not the external heap and if needed a new HEAP_ID_ANY could be introduced.

If changing heap_id to socket_id in existing functions is a big issue then one option would be to keep the original API and introduce new equivalent functions allowing to use the heap_id instead of the socket_id, e.g. rte_mempool_create would have an equivalent function to use with the heap_id instead of the socket_id.
Socket_id could then be converted to heap_id with a new function which should always be possible and can still use to direct mapping approach with lower/upper range convention.
The socket_id based functions would then just be wrappers calling the heap_id equivalent function after converting the socket_id to heap_id.
Using socket_id to indicate the location could still be relevant so the old socket_id based functions may not even need to be deprecated unless it would become hard to maintain.

Irrespective of the chosen option, external heaps should be registered/identified by name and there could be a function to fetch/lookup the id (heap_id or pseudo socket_id) by registered heap name which then could be used in the related API calls.

It would also be another work item to update all the data structures which are storing the socket_id to use it as the location identifier and I think few of them may need to store both the real physical socket_id and the heap_id, e.g. in struct lcore_config where the user may want to know the real physical socket id but want to set specific heap_id as the default allocation location for the given lcore.

> >
> > I myself still favor the second way, however there are good arguments to
> be made for each of these options.
> >
> > --
> > Thanks,
> > Anatoly
> 
> Regards,
> Keith

Thanks,
 Laszlo

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] devtools: fix symbol check for filename with space
  2018-07-18 21:06  3% [dpdk-dev] [PATCH] devtools: fix symbol check for filename with space Thomas Monjalon
@ 2018-07-18 21:26  3% ` Thomas Monjalon
  2018-07-19 11:14  0%   ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-18 21:26 UTC (permalink / raw)
  To: dev; +Cc: nhorman

If the patch filename or the temporary file path have a space
in their name, the script check-symbol-change.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Cc: nhorman@tuxdriver.com

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
v2: one occurence of "$mapfile" was missed in v1
---
 devtools/check-symbol-change.sh | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
index 9952a8d66..40b72073a 100755
--- a/devtools/check-symbol-change.sh
+++ b/devtools/check-symbol-change.sh
@@ -7,7 +7,7 @@ build_map_changes()
 	local fname=$1
 	local mapdb=$2
 
-	cat $fname | awk '
+	cat "$fname" | awk '
 		# Initialize our variables
 		BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
 
@@ -71,10 +71,10 @@ build_map_changes()
 					print map " " sym " unknown del"
 				}
 			}
-		}' > ./$mapdb
+		}' > "$mapdb"
 
-		sort -u $mapdb > ./$mapdb.2
-		mv -f $mapdb.2 $mapdb
+		sort -u "$mapdb" > "$mapdb.2"
+		mv -f "$mapdb.2" "$mapdb"
 
 }
 
@@ -111,7 +111,7 @@ check_for_rule_violations()
 				# to be moving from an already supported
 				# section or its a violation
 				grep -q \
-				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				"$mname $symname [^EXPERIMENTAL] del" "$mapdb"
 				if [ $? -ne 0 ]
 				then
 					echo -n "ERROR: symbol $symname "
@@ -133,7 +133,7 @@ check_for_rule_violations()
 				echo "gone through the deprecation process"
 			fi
 		fi
-	done < $mapdb
+	done < "$mapdb"
 
 	return $ret
 }
@@ -146,14 +146,14 @@ exit_code=1
 
 clean_and_exit_on_sig()
 {
-	rm -f $mapfile
+	rm -f "$mapfile"
 	exit $exit_code
 }
 
-build_map_changes $patch $mapfile
-check_for_rule_violations $mapfile
+build_map_changes "$patch" "$mapfile"
+check_for_rule_violations "$mapfile"
 exit_code=$?
 
-rm -f $mapfile
+rm -f "$mapfile"
 
 exit $exit_code
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] devtools: fix symbol check for filename with space
@ 2018-07-18 21:06  3% Thomas Monjalon
  2018-07-18 21:26  3% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-18 21:06 UTC (permalink / raw)
  To: dev; +Cc: nhorman

If the patch filename or the temporary file path have a space
in their name, the script check-symbol-change.sh does not work.
The variables for the filenames must be enclosed in quotes
in order to preserve spaces.

Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Cc: nhorman@tuxdriver.com

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 devtools/check-symbol-change.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
index 9952a8d66..69b874ace 100755
--- a/devtools/check-symbol-change.sh
+++ b/devtools/check-symbol-change.sh
@@ -7,7 +7,7 @@ build_map_changes()
 	local fname=$1
 	local mapdb=$2
 
-	cat $fname | awk '
+	cat "$fname" | awk '
 		# Initialize our variables
 		BEGIN {map="";sym="";ar="";sec=""; in_sec=0; in_map=0}
 
@@ -71,10 +71,10 @@ build_map_changes()
 					print map " " sym " unknown del"
 				}
 			}
-		}' > ./$mapdb
+		}' > "$mapdb"
 
-		sort -u $mapdb > ./$mapdb.2
-		mv -f $mapdb.2 $mapdb
+		sort -u "$mapdb" > "$mapdb.2"
+		mv -f "$mapdb.2" "$mapdb"
 
 }
 
@@ -111,7 +111,7 @@ check_for_rule_violations()
 				# to be moving from an already supported
 				# section or its a violation
 				grep -q \
-				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				"$mname $symname [^EXPERIMENTAL] del" "$mapdb"
 				if [ $? -ne 0 ]
 				then
 					echo -n "ERROR: symbol $symname "
@@ -133,7 +133,7 @@ check_for_rule_violations()
 				echo "gone through the deprecation process"
 			fi
 		fi
-	done < $mapdb
+	done < "$mapdb"
 
 	return $ret
 }
@@ -150,10 +150,10 @@ clean_and_exit_on_sig()
 	exit $exit_code
 }
 
-build_map_changes $patch $mapfile
-check_for_rule_violations $mapfile
+build_map_changes "$patch" "$mapfile"
+check_for_rule_violations "$mapfile"
 exit_code=$?
 
-rm -f $mapfile
+rm -f "$mapfile"
 
 exit $exit_code
-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/4] eventdev: add eth Tx adapter APIs
  2018-07-10 12:17  3% ` Jerin Jacob
@ 2018-07-16  8:34  0%   ` Rao, Nikhil
  0 siblings, 0 replies; 200+ results
From: Rao, Nikhil @ 2018-07-16  8:34 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: olivier.matz, dev, anoob.joseph

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, July 10, 2018 5:48 PM
> To: Rao, Nikhil <nikhil.rao@intel.com>
> Cc: olivier.matz@6wind.com; dev@dpdk.org; anoob.joseph@cavium.com
> Subject: Re: [PATCH 1/4] eventdev: add eth Tx adapter APIs
> 
> ---
> 
> 1) Update doc/api/doxy-api-index.md
OK.

> 2) Update lib/librte_eventdev/Makefile
> +SYMLINK-y-include += rte_event_eth_tx_adapter.h
> 
This is done in patch 3 of this series.

> 
> I think, the following working is _pending_
> 
> 1) Update app/test-eventdev/ for Tx adapter
> 2) Update examples/eventdev_pipeline/ for Tx adapter
> 3) Add Tx adapter documentation
> 4) Add Tx adapter ops for octeontx driver
> 5) Add Tx adapter ops for dpaa driver(if need)
> 
> Nikhil,
> If you are OK then Cavium would like to take up (1), (2) and (4) activities.
> 
> Let me know your thoughts.
> 
Fine with me.

> Since this patch set already crossed the RC1 deadline. We will complete all
> the _pending_ work and push to next-eventdev tree in the very beginning
> of
> v18.11 so that Anoob's adapter helper function work can be added v18.11.
> 
> 
> >
> > This patch series adds the event ethernet Tx adapter which is based on
> > a previous RFC
> >  * RFCv1 - http://mails.dpdk.org/archives/dev/2018-May/102936.html
> >  * RFCv2 - http://mails.dpdk.org/archives/dev/2018-June/104075.html
> >
> > RFC -> V1:
> > =========
> >
> > * Move port and tx queue id to mbuf from mbuf private area. (Jerin
> > Jacob)
> >
> > * Support for PMD transmit function. (Jerin Jacob)
> >
> > * mbuf change has been replaced with
> rte_event_eth_tx_adapter_txq_set().
> > The goal is to align with the mbuf change for a qid field.
> > (http://mails.dpdk.org/archives/dev/2018-February/090651.html). Once
> > the mbuf change is available, the function can be replaced with a
> > macro with no impact to applications.
> >
> > * Various cleanups (Jerin Jacob)
> >
> >  lib/librte_eventdev/rte_event_eth_tx_adapter.h | 497
> +++++++++++++++++++++++++
> >  lib/librte_mbuf/rte_mbuf.h                     |   4 +-
> >  MAINTAINERS                                    |   5 +
> >  3 files changed, 505 insertions(+), 1 deletion(-)  create mode 100644
> > lib/librte_eventdev/rte_event_eth_tx_adapter.h
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * A structure used to retrieve statistics for an eth tx adapter instance.
> > + */
> > +struct rte_event_eth_tx_adapter_stats {
> > +       uint64_t tx_retry;
> > +       /**< Number of transmit retries */
> > +       uint64_t tx_packets;
> > +       /**< Number of packets transmitted */
> > +       uint64_t tx_dropped;
> > +       /**< Number of packets dropped */ };
> > +
> > +/** Event Eth Tx Adapter Structure */ struct rte_event_eth_tx_adapter
> > +{
> > +       uint8_t id;
> > +       /**< Adapter Identifier */
> > +       uint8_t eventdev_id;
> > +       /**< Max mbufs processed in any service function invocation */
> > +       uint32_t max_nb_tx;
> > +       /**< The adapter can return early if it has processed at least
> > +        * max_nb_tx mbufs. This isn't treated as a requirement; batching
> may
> > +        * cause the adapter to process more than max_nb_tx mbufs.
> > +        */
> > +       uint32_t nb_queues;
> > +       /**< Number of Tx queues in adapter */
> > +       int socket_id;
> > +       /**< socket id */
> > +       rte_spinlock_t tx_lock;
> > +       /**<  Synchronization with data path */
> > +       void *dev_private;
> > +       /**< PMD private data */
> > +       char
> mem_name[RTE_EVENT_ETH_TX_ADAPTER_SERVICE_NAME_LEN];
> > +       /**< Memory allocation name */
> > +       rte_event_eth_tx_adapter_conf_cb conf_cb;
> > +       /** Configuration callback */
> > +       void *conf_arg;
> > +       /**< Configuration callback argument */
> > +       uint16_t dev_count;
> > +       /**< Highest port id supported + 1 */
> > +       struct rte_event_eth_tx_adapter_ethdev *txa_ethdev;
> > +       /**< Per ethernet device structure */
> > +       struct rte_event_eth_tx_adapter_stats stats; }
> > +__rte_cache_aligned;
> 
> Can you move this structure to .c file as implementation, Reasons are -
> a) It should not be under ABI deprecation
> b) INTERNAL_PORT based adapter may have different values.i.e the above
> structure is implementation defined.
>
> > +
> > +struct rte_event_eth_tx_adapters {
> > +       struct rte_event_eth_tx_adapter **data; };
> > +
> 
> same as above
> 
> > +/* Per eth device structure */
> > +struct rte_event_eth_tx_adapter_ethdev {
> > +       /* Pointer to ethernet device */
> > +       struct rte_eth_dev *dev;
> > +       /* Number of queues added */
> > +       uint16_t nb_queues;
> > +       /* PMD specific queue data */
> > +       void *queues;
> > +};
> 
> same as above
> 
> > +
> > +extern struct rte_event_eth_tx_adapters rte_event_eth_tx_adapters;
> > +
> 
> same as above
>
OK, if these fields are not going to be used within the other adapter, I will move these to the .c file.
 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a new event ethernet Tx adapter with the specified identifier.
> > + *
> > + * @param id
> > + *  The identifier of the event ethernet Tx adapter.
> > + * @param dev_id
> > + *  The event device identifier.
> > + * @param port_config
> > + *  Event port configuration, the adapter uses this configuration to
> > + *  create an event port if needed.
> > + * @return
> > + *   - 0: Success
> > + *   - <0: Error code on failure
> > + */
> > +int __rte_experimental
> > +rte_event_eth_tx_adapter_create(uint8_t id, uint8_t dev_id,
> > +                               struct rte_event_port_conf
> > +*port_config);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Create a new event ethernet Tx adapter with the specified identifier.
> > + *
> > + * @param id
> > + *  The identifier of the event ethernet Tx adapter.
> > + * @param dev_id
> > + *  The event device identifier.
> > + * @param conf_cb
> > + *  Callback function that initalizes members of the
> 
> s/initalizes/initializes
> 
> > + *  struct rte_event_eth_tx_adapter_conf struct passed into
> > + *  it.
> > + * @param conf_arg
> > + *  Argument that is passed to the conf_cb function.
> > + * @return
> > + *   - 0: Success
> > + *   - <0: Error code on failure
> > + */
> > +int __rte_experimental
> > +rte_event_eth_tx_adapter_create_ext(uint8_t id, uint8_t dev_id,
> > +                               rte_event_eth_tx_adapter_conf_cb conf_cb,
> > +                               void *conf_arg);
> > +
> > +/**
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Add a Tx queue to the adapter.
> > + * A queue value of -1 is used to indicate all
> > + * queues within the device.
> > + *
> > + * @param id
> > + *  Adapter identifier.
> > + * @param eth_dev_id
> > + *  Ethernet Port Identifier.
> > + * @param queue
> > + *  Tx queue index.
> > + * @return
> > + *  - 0: Success, Queues added succcessfully.
> 
> s/succcessfully/successfully
> 
> 
> > + *  - <0: Error code on failure.
> > + */
> > +int __rte_experimental
> > +rte_event_eth_tx_adapter_queue_add(uint8_t id,
> > +                               uint16_t eth_dev_id,
> > +                               int32_t queue);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + *
> > + * Set Tx queue in the mbuf.
> > + *
> > + * @param pkt
> > + *  Pointer to the mbuf.
> > + * @param queue
> > + *  Tx queue index.
> > + */
> > +void __rte_experimental
> > +rte_event_eth_tx_adapter_txq_set(struct rte_mbuf *pkt, uint16_t
> > +queue);
> 
> 1) Can you make this as static inline for better performance(as it is just a
> mbuf field access)?
OK.
This would also move the private definition of   struct txa_mbuf_txq_id  to the adapter header file, which would be needed to deprecated once the field is
available in rte_mbuf.h.

> 
> 2) Please add _get function, It will be useful for application and Tx adapter
> op implementation.
> 
> 
OK.

> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Retrieve the adapter event port. The adapter creates an event port
> > +if
> > + * the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT is not set in
> the
> > + * eth Tx capabilities of the event device.
> > + *
> > + * @param id
> > + *  Adapter Identifier.
> > + * @param[out] event_port_id
> > + *  Event port pointer.
> > + * @return
> > + *   - 0: Success.
> > + *   - <0: Error code on failure.
> > + */
> > +int __rte_experimental
> > +rte_event_eth_tx_adapter_event_port_get(uint8_t id, uint8_t
> > +*event_port_id);
> > +
> > +static __rte_always_inline uint16_t __rte_experimental
> > +__rte_event_eth_tx_adapter_enqueue(uint8_t id, uint8_t dev_id,
> uint8_t port_id,
> > +                               struct rte_event ev[],
> > +                               uint16_t nb_events,
> > +                               const event_tx_adapter_enqueue fn) {
> > +       const struct rte_eventdev *dev = &rte_eventdevs[dev_id];
> 
> Access to *dev twice(see below rte_event_eth_tx_adapter_enqueue())
> 
> > +       struct rte_event_eth_tx_adapter *txa =
> > +
> > + rte_event_eth_tx_adapters.data[id];
> 
> Just like common Tx adapter implementation, We can manage  ethdev
> queue to adapter mapping internally. So this deference is not required in
> fastpath.
> 
> Please simply call the following, just like other eventdev ops.
> fn(dev->data->ports[port_id], ev, nb_events)
> 
> 
OK.

> > +
> > +#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
> > +       if (id >= RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE ||
> > +               dev_id >= RTE_EVENT_MAX_DEVS ||
> > +               !rte_eventdevs[dev_id].attached) {
> > +               rte_errno = -EINVAL;
> > +               return 0;
> > +       }
> > +
> > +       if (port_id >= dev->data->nb_ports) {
> > +               rte_errno = -EINVAL;
> > +               return 0;
> > +       }
> > +#endif
> > +       return fn((void *)txa, dev, dev->data->ports[port_id], ev,
> > +nb_events); }
> > +
> > +/**
> > + * Enqueue a burst of events objects or an event object supplied in
> > +*rte_event*
> > + * structure on an  event device designated by its *dev_id* through
> > +the event
> > + * port specified by *port_id*. This function is supported if the
> > +eventdev PMD
> > + * has the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT
> capability flag set.
> > + *
> > + * The *nb_events* parameter is the number of event objects to
> > +enqueue which are
> > + * supplied in the *ev* array of *rte_event* structure.
> > + *
> > + * The rte_event_eth_tx_adapter_enqueue() function returns the
> number
> > +of
> > + * events objects it actually enqueued. A return value equal to
> > +*nb_events*
> > + * means that all event objects have been enqueued.
> > + *
> > + * @param id
> > + *  The identifier of the tx adapter.
> > + * @param dev_id
> > + *  The identifier of the device.
> > + * @param port_id
> > + *  The identifier of the event port.
> > + * @param ev
> > + *  Points to an array of *nb_events* objects of type *rte_event*
> > +structure
> > + *  which contain the event object enqueue operations to be processed.
> > + * @param nb_events
> > + *  The number of event objects to enqueue, typically number of
> > + *  rte_event_port_enqueue_depth() available for this port.
> > + *
> > + * @return
> > + *   The number of event objects actually enqueued on the event device.
> The
> > + *   return value can be less than the value of the *nb_events*
> parameter when
> > + *   the event devices queue is full or if invalid parameters are specified
> in a
> > + *   *rte_event*. If the return value is less than *nb_events*, the
> remaining
> > + *   events at the end of ev[] are not consumed and the caller has to take
> care
> > + *   of them, and rte_errno is set accordingly. Possible errno values
> include:
> > + *   - -EINVAL  The port ID is invalid, device ID is invalid, an event's queue
> > + *              ID is invalid, or an event's sched type doesn't match the
> > + *              capabilities of the destination queue.
> > + *   - -ENOSPC  The event port was backpressured and unable to enqueue
> > + *              one or more events. This error code is only applicable to
> > + *              closed systems.
> > + */
> > +static inline uint16_t __rte_experimental
> > +rte_event_eth_tx_adapter_enqueue(uint8_t id, uint8_t dev_id,
> > +                               uint8_t port_id,
> > +                               struct rte_event ev[],
> > +                               uint16_t nb_events) {
> > +       const struct rte_eventdev *dev = &rte_eventdevs[dev_id];
> > +
> > +       return __rte_event_eth_tx_adapter_enqueue(id, dev_id, port_id,
> ev,
> > +                                               nb_events,
> > +                                               dev->txa_enqueue);
> 
> As per above, Since the function call logic is simplified you can add the
> above function logic here.
> 
OK, I will also delete the id parameter.

> > +}
> > +
> > index dabb12d..ab23503 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -388,6 +388,11 @@ F: lib/librte_eventdev/*crypto_adapter*
> >  F: test/test/test_event_crypto_adapter.c
> >  F: doc/guides/prog_guide/event_crypto_adapter.rst
> >
> > +Eventdev Ethdev Tx Adapter API - EXPERIMENTAL
> > +M: Nikhil Rao <nikhil.rao@intel.com>
> > +T: git://dpdk.org/next/dpdk-next-eventdev
> > +F: lib/librte_eventdev/*eth_tx_adapter*
> 
> Add the testcase also.
> 
I have made that update in patch 4 of this series.

> Overall it looks good. No more comments on specification.
> 

Thanks for the review,
Nikhil

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v9] checkpatches.sh: Add checks for ABI symbol addition
  @ 2018-07-15 23:12  4%   ` Thomas Monjalon
  2018-08-14  3:53  4%   ` Rao, Nikhil
  1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-07-15 23:12 UTC (permalink / raw)
  To: Neil Horman
  Cc: dev, john.mcnamara, bruce.richardson, Ferruh Yigit, Stephen Hemminger

27/06/2018 20:01, Neil Horman:
> Recently, some additional patches were added to allow for programmatic
> marking of C symbols as experimental.  The addition of these markers is
> dependent on the manual addition of exported symbols to the EXPERIMENTAL
> section of the corresponding libraries version map file.  The consensus
> on review is that, in addition to mandating the addition of symbols to
> the EXPERIMENTAL version in the map, we need a mechanism to enforce our
> documented process of mandating that addition when they are introduced.
> To that end, I am proposing this change.  It is an addition to the
> checkpatches script, which scan incoming patches for additions and
> removals of symbols to the map file, and warns the user appropriately
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: thomas@monjalon.net
> CC: john.mcnamara@intel.com
> CC: bruce.richardson@intel.com
> CC: Ferruh Yigit <ferruh.yigit@intel.com>
> CC: Stephen Hemminger <stephen@networkplumber.org>
> 
> ---
> +		tmpinput=$(mktemp checkpatches.XXXXXX)
> +		git format-patch --find-renames \
> +		--no-stat --stdout -1 $commit > ./$tmpinput

In case $tmpinput is an absolute path (like in /tmp),
we must not prepend it with ./
I fix it when applying.

Applied, thanks

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] DPDK 18.05 only works with up to 4 NUMAs systems
  @ 2018-07-14  9:44  0%       ` Kumar, Ravi1
  0 siblings, 0 replies; 200+ results
From: Kumar, Ravi1 @ 2018-07-14  9:44 UTC (permalink / raw)
  To: Burakov, Anatoly, dev

>On 28-Jun-18 8:03 AM, Kumar, Ravi1 wrote:

>>> On 22-Jun-18 5:37 PM, Kumar, Ravi1 wrote:

>>>> Hi,

>>>>

>>>> As the memory subsystem in DPDK 18.05 is reworked, it has introduced a problem for AMD EPYC 2P platforms.

>>>> The issue is that DPDK 18.05 only works with up to 4 NUMAs. For AMD EPYC 2P platforms, DPDK now only works with P0 (NUMA 0-3) and does not work with P1 (NUMA 4-7).

>>>>

>>>> The problem can be fixed by reducing some of the default settings of the memory subsystem.

>>>>

>>>> To solve this issue:

>>>> -              We can create our own config file for our integrated 10G NIC, that is for amd_xgbe PMD. This will make amd_xgbe immune to this problem.

>>>> -              However, when any other NIC (Intel, Mellanox, Cavium or Broadcom etc.) is plugged into NUMA 4-7, the problem will still be exposed.

>>>> -              If we only fix it in "config/common_base", it will cover all cases.

>>>>

>>>> Our current workaround is:

>>>> Edit config file "./config/common_base" and change the following line

>>>>                   CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072

>>>> TO

>>>>                   CONFIG_RTE_MAX_MEM_MB_PER_TYPE=65536

>>>>

>>>> Any better solution for this issue is welcome.

>>>>

>>>> We would appreciate if this issue can be fixed in the next release (18.08) so the STOCK version of DPDK works on AMD EPYC 2P platforms.

>>>>

>>>> Regards,

>>>> Ravi

>>>>

>>>

>>> Hi Ravi,

>>>

>>> What is the reason behind this limitation? Is it too much virtual memory being preallocated?

>>>

>>> --

>>> Thanks,

>>> Anatoly

>>>

>> Hi Anatoly,

>>

>> We believe this is true.  By default, too much virtual memory is being preallocated. The result is it can only support up to 4 NUMAs.

>>

>> Our workaround is to reduce the amount of preallocated virtual memory by half, so to support up to 8 NUMAs.

>>

>> Regards,

>> Ravi

>>

>

>I assume you see a bunch of failed mmap() calls with ENOMEM?

>

>In general, changing base config that way is an OK change, and it won't even be an ABI break since this memory is allocated at runtime. I just want to make sure that we fix the underlying problem, rather than the symptom.

>

>--

>Thanks,

>Anatoly

Hi Anatoly,



Sorry for the late reply. I have been away and took me some time to get the logs.



Here are some more details.



Dpdk-18.05/config/common_base contains the constants used to configure the memory subsystem.



CONFIG_RTE_MAX_NUMA_NODES=8

CONFIG_RTE_MAX_MEMSEG_LISTS=64

# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages

# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller

CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192

CONFIG_RTE_MAX_MEM_MB_PER_LIST=32768

# a "type" is a combination of page size and NUMA node. total number of memseg

# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split

# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or

# RTE_MAX_MEM_MB_PER_TYPE megabytes of memory (split over multiple lists of

# RTE_MAX_MEM_MB_PER_LIST), whichever is smaller

CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768

CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072

# global maximum usable amount of VA, in megabytes

CONFIG_RTE_MAX_MEM_MB=524288



From the documentation.

Dpdk-18.05/doc/guides/prog_guide/env_abstraction_layer.rst



All possible virtual memory space that can ever be used for hugepage mapping in a DPDK process is preallocated at startup, thereby placing an upper limit on how much memory a DPDK application can have. DPDK memory is stored in segment lists, each segment is strictly one physical page. It is possible to change the amount of virtual memory being preallocated at startup by editing the following config variables:



* ``CONFIG_RTE_MAX_MEMSEG_LISTS`` controls how many segment lists can DPDK have

* ``CONFIG_RTE_MAX_MEM_MB_PER_LIST`` controls how much megabytes of memory each segment list can address

* ``CONFIG_RTE_MAX_MEMSEG_PER_LIST`` controls how many segments each segment can have

* ``CONFIG_RTE_MAX_MEMSEG_PER_TYPE`` controls how many segments each memory typ can have (where "type" is defined as "page size + NUMA node" combination)

* ``CONFIG_RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each memory type can address

* ``CONFIG_RTE_MAX_MEM_MB`` places a global maximum on the amount of memory DPDK can reserve



Normally, these options do not need to be changed.



.. note::



Preallocated virtual memory is not to be confused with preallocated hugepage memory! All DPDK processes preallocate virtual memory at startup. Hugepages  can later be mapped into that preallocated VA space (if dynamic memory mode is enabled), and can optionally be mapped into it at startup.



Memory setup with 2M pages works with the default configuration.  With the default configuration and 2M hugepages



1.            Total amount of memory for each NUMA zone does not exceed 128G (CONFIG_RTE_MAX_MEM_MB_PER_TYPE).

2.            Total number of segment lists per NUMA is limited to 32768 (CONFIG_RTE_MAX_MEMSEG_PER_TYPE).   This constraint is met for each numa zone.  This is the limiting factor for memory per numa with 2M hugepages and the default configuration.

3.            The data structures are capable of supporting 64G of memory for each numa zone (32768 segments * 2M hugepagesize).

4.            8 NUMA zones * 64G = 512G.   Therefore the total for all numa zones does not exceed 512G (CONFIG_RTE_MAX_MEM_MB).

5.            Resources are capable of allocating up to 64G per NUMA zone.  Things will work as long as there are enough 2M hugepages  to cover the memory  needs of the DPDK applications AND no memory zone needs more than 64G.



With the default configuration and 1G hugepages



1.            Total amount of memory for each NUMA zone is limited to 128G (CONFIG_RTE_MAX_MEM_MB_PER_TYPE).  This constraint is hit for each numa zone.  This is the limiting factor for memory per numa.

2.            Total number of segment lists (128) does not exceed 32768 (CONFIG_RTE_MAX_MEMSEG_PER_TYPE).    There are 128 segments per NUMA.

3.            The data structures are capable of supporting 128G of memory for each numa zone (128 segments * 1G hugepagesize).     However, only the first four NUMA zones get initialized before we hit CONFIG_RTE_MAX_MEM_MB (512G).

4.            The total for all numa zones is limited to 512G (CONFIG_RTE_MAX_MEM_MB).  This  limit is  hit after configuring the first four NUMA zones (4 x 128G = 512G).   The rest of the NUMA zones cannot allocate memory.



Apparently, it is intended to support max 8 NUMAs by default (CONFIG_RTE_MAX_NUMA_NODES=8), but when 1G hugepages are use, it can only support up to 4 NUMAs.



Possible workarounds when using 1G hugepages:

1.            Decrease CONFIG_RTE_MAX_MEM_MB_PER_TYPE to 65536 (limit of 64G per NUMA zone).  This is probably the best option unless you need a lot of memory in any given NUMA.

2.            Or, increase CONFIG_RTE_MAX_MEM_MB to 1048576.



With default settings, I got the following errors.



amd@amd:~/dpdk/app/test-pmd$ sudo -E ./testpmd -c 0x7 -n 4 -- -i  --portmask=0x3 --nb-cores=2

EAL: Detected 64 lcore(s)

EAL: Detected 8 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

EAL: Probing VFIO support...

EAL: PCI device 0000:01:00.0 on NUMA socket 0

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:01:00.1 on NUMA socket 0

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:13:00.0 on NUMA socket 1

EAL:   probe driver: 8086:10d3 net_e1000_em

EAL: PCI device 0000:31:00.0 on NUMA socket 3

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:31:00.1 on NUMA socket 3

EAL:   probe driver: 8086:1563 net_ixgbe

Interactive-mode selected

testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=163456, size=2176, socket=1

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_2>: n=163456, size=2176, socket=2

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_3>: n=163456, size=2176, socket=3

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_4>: n=163456, size=2176, socket=4

testpmd: preferred mempool ops selected: ring_mp_mc

EAL: Error - exiting with code: 1

  Cause: Creation of mbuf pool for socket 4 failed: Cannot allocate memory

amd@amd:~/dpdk/app/test-pmd$



After changing  CONFIG_RTE_MAX_MEM_MB_PER_TYPE=131072 to CONFIG_RTE_MAX_MEM_MB_PER_TYPE=65536, the error is gone.



amd@amd:~/dpdk/app/test-pmd$ sudo -E ./testpmd -c 0x7 -n 4 -- -i  --portmask=0x3 --nb-cores=2

EAL: Detected 64 lcore(s)

EAL: Detected 8 NUMA nodes

EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

EAL: Probing VFIO support...

EAL: PCI device 0000:01:00.0 on NUMA socket 0

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:01:00.1 on NUMA socket 0

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:13:00.0 on NUMA socket 1

EAL:   probe driver: 8086:10d3 net_e1000_em

EAL: PCI device 0000:31:00.0 on NUMA socket 3

EAL:   probe driver: 8086:1563 net_ixgbe

EAL: PCI device 0000:31:00.1 on NUMA socket 3

EAL:   probe driver: 8086:1563 net_ixgbe

Interactive-mode selected

testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_1>: n=163456, size=2176, socket=1

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_2>: n=163456, size=2176, socket=2

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_3>: n=163456, size=2176, socket=3

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_4>: n=163456, size=2176, socket=4

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_5>: n=163456, size=2176, socket=5

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_6>: n=163456, size=2176, socket=6

testpmd: preferred mempool ops selected: ring_mp_mc

testpmd: create a new mbuf pool <mbuf_pool_socket_7>: n=163456, size=2176, socket=7

testpmd: preferred mempool ops selected: ring_mp_mc

Configuring Port 0 (socket 0)

Port 0: A0:36:9F:F7:94:68

Configuring Port 1 (socket 0)

Port 1: A0:36:9F:F7:94:69

Checking link statuses...

Done

testpmd>



Regards,

Ravi

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  2018-07-13 17:10  2% ` Burakov, Anatoly
@ 2018-07-13 17:56  0%   ` Wiles, Keith
  2018-07-19 10:58  0%     ` László Vadkerti
  0 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2018-07-13 17:56 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: dev, srinath.mannam, scott.branden, ajit.khaparde,
	Thomas Monjalon, Shreyansh Jain, jerin.jacob



> On Jul 13, 2018, at 12:10 PM, Burakov, Anatoly <anatoly.burakov@intel.com> wrote:
> 
> On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
>> This is a proposal to enable using externally allocated memory
>> in DPDK.
>> In a nutshell, here is what is being done here:
>> - Index malloc heaps by NUMA node index, rather than NUMA node itself
>> - Add identifier string to malloc heap, to uniquely identify it
>> - Allow creating named heaps and add/remove memory to/from those heaps
>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>   of externally allocated memory
>>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>> - Allow malloc and memzones to allocate from named heaps
>> The responsibility to ensure memory is accessible before using it is
>> on the shoulders of the user - there is no checking done with regards
>> to validity of the memory (nor could there be...).
>> The following limitations are present:
>> - No multiprocess support
>> - No thread safety
>> There is currently no way to allocate memory during initialization
>> stage, so even if multiprocess support is added, it is not guaranteed
>> to work because of underlying issues with mapping fbarrays in
>> secondary processes. This is not an issue in single process scenario,
>> but it may be an issue in a multiprocess scenario in case where
>> primary doesn't intend to share the externally allocated memory, yet
>> adding such memory could fail because some other process failed to
>> attach to this shared memory when it wasn't needed.
>> Anatoly Burakov (11):
>>   mem: allow memseg lists to be marked as external
>>   eal: add function to rerieve socket index by socket ID
>>   malloc: index heaps using heap ID rather than NUMA node
>>   malloc: add name to malloc heaps
>>   malloc: enable retrieving statistics from named heaps
>>   malloc: enable allocating from named heaps
>>   malloc: enable creating new malloc heaps
>>   malloc: allow adding memory to named heaps
>>   malloc: allow removing memory from named heaps
>>   malloc: allow destroying heaps
>>   memzone: enable reserving memory from named heaps
>>  config/common_base                            |   1 +
>>  lib/librte_eal/common/eal_common_lcore.c      |  15 +
>>  lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>>  lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
>>  .../common/include/rte_eal_memconfig.h        |   5 +-
>>  lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>>  lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>>  .../common/include/rte_malloc_heap.h          |   2 +
>>  lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>>  lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>>  lib/librte_eal/common/malloc_heap.h           |  26 ++
>>  lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>>  lib/librte_eal/rte_eal_version.map            |  10 +
>>  13 files changed, 1118 insertions(+), 109 deletions(-)
> 
> So, now that the RFC is out, i would like to ask a general question.
> 
> One other thing that this patchset is missing, is the ability for data structures (e.g. hash, mempool, etc.) to be allocated from external heaps. Currently, we can kinda sorta do that with various _init() API's (initializing a data structure over already allocated memzone), but this is not ideal and is a hassle for anyone using external memory in DPDK.
> 
> There are basically four ways to approach this problem (that i can see).
> 
> First way is to change "socket ID" to mean "heap ID" everywhere. This has an upside of having a consistent API to allocate from internal and external heaps, with little to no API additions, only internal changes to account for the fact that "socket ID" is now "heap ID".
> 
> However, there is a massive downside to this approach: it is a *giant* API change, and it's also a giant *ABI-compatible* API change. Meaning, replacing socket ID with heap ID will not cause compile failures for old code, which would result in many subtle bugs in already existing codebases. So, while in the perfect world this would've been my preferred approach, realistically i think this is a very, very bad idea.
> 
> Second one is to add a separate "heap name" API's to everything. This has an upside of clean separation between allocation from internal and external heaps. (well, whether it's an upside is debatable...) This is the approach i expected to take when i was creating this patchset.
> 
> The downside is that we have to add new API's to every library and every DPDK data structure, to allow explicit allocation from external heaps. We will have to maintain both, and things like hardware drivers will need to have a way to indicate the need to allocate things from a particular external heap.
> 
> The third way is to expose the "heap ID" externally, and allow a single, unified API to reserve memory. That is, create an API that would map either a NUMA node ID or a heap name to an ID, and allow reserving memory through that ID regardless of whether it's internal or external memory. This would also allow to gradually phase out socket-based ID's in favor of heap ID API, should we choose to do so.
> 
> The downside for this is, it adds a layer of indirection between socket ID and reserving memory on a particular NUMA node, and it makes it hard to produce a single value of "heap ID" in such a way as to replicate current functionality of allocating with SOCKET_ID_ANY. Most likely user will have to explicitly try to allocate on all sockets, unless we keep old API's around in parallel.
> 
> Finally, a fourth way would be to abuse the socket ID to also mean something else, which is an approach i've seen numerous times already, and one that i don't like. We could register new heaps as a new, fake socket ID, and use that to address external heaps (each heap would get its own socket). So, keep current socket ID behavior, but for non-existent sockets it would be possible to be registered as a fake socket pointing to an external heap.
> 
> The upside for this approach would be that no API changes are required whatsoever to existing libraries - this scheme is compatible with both internal and external heaps without adding a separate API.
> 
> The downside is bad semantics - "special" sockets, handling of SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that happens to correspond to an existing external heap", and many other things that can be confusing. I don't like this option, but it's an option :)
> 
> Thoughts? Comments?

#1 is super clean, but very disruptive to everyone. Very Bad IMO
#2 is also clean, but adds a lot of new APIs that everyone needs to use or at least in the external heap cases.
#3 not sure I fully understand it, but reproducing heap IDs for testing is a problem and requires new/old APIs

#4 Very easy to add, IMO it is clean and very small disruption to developers. It does require the special handling, but I feel it is OK and can be explained in the docs. Having a socket id as an ‘int’ gives us a lot room e.g. id < 64K is normal socket and > 64K is external id.

My vote would be #4, as it seems the least risk and work. :-)

> 
> I myself still favor the second way, however there are good arguments to be made for each of these options.
> 
> -- 
> Thanks,
> Anatoly

Regards,
Keith


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK
  @ 2018-07-13 17:10  2% ` Burakov, Anatoly
  2018-07-13 17:56  0%   ` Wiles, Keith
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-07-13 17:10 UTC (permalink / raw)
  To: dev
  Cc: srinath.mannam, scott.branden, ajit.khaparde, Thomas Monjalon,
	Shreyansh Jain, jerin.jacob, Keith Wiles

On 06-Jul-18 2:17 PM, Anatoly Burakov wrote:
> This is a proposal to enable using externally allocated memory
> in DPDK.
> 
> In a nutshell, here is what is being done here:
> 
> - Index malloc heaps by NUMA node index, rather than NUMA node itself
> - Add identifier string to malloc heap, to uniquely identify it
> - Allow creating named heaps and add/remove memory to/from those heaps
> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>    of externally allocated memory
>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> - Allow malloc and memzones to allocate from named heaps
> 
> The responsibility to ensure memory is accessible before using it is
> on the shoulders of the user - there is no checking done with regards
> to validity of the memory (nor could there be...).
> 
> The following limitations are present:
> 
> - No multiprocess support
> - No thread safety
> 
> There is currently no way to allocate memory during initialization
> stage, so even if multiprocess support is added, it is not guaranteed
> to work because of underlying issues with mapping fbarrays in
> secondary processes. This is not an issue in single process scenario,
> but it may be an issue in a multiprocess scenario in case where
> primary doesn't intend to share the externally allocated memory, yet
> adding such memory could fail because some other process failed to
> attach to this shared memory when it wasn't needed.
> 
> Anatoly Burakov (11):
>    mem: allow memseg lists to be marked as external
>    eal: add function to rerieve socket index by socket ID
>    malloc: index heaps using heap ID rather than NUMA node
>    malloc: add name to malloc heaps
>    malloc: enable retrieving statistics from named heaps
>    malloc: enable allocating from named heaps
>    malloc: enable creating new malloc heaps
>    malloc: allow adding memory to named heaps
>    malloc: allow removing memory from named heaps
>    malloc: allow destroying heaps
>    memzone: enable reserving memory from named heaps
> 
>   config/common_base                            |   1 +
>   lib/librte_eal/common/eal_common_lcore.c      |  15 +
>   lib/librte_eal/common/eal_common_memory.c     |  51 +++-
>   lib/librte_eal/common/eal_common_memzone.c    | 283 ++++++++++++++----
>   .../common/include/rte_eal_memconfig.h        |   5 +-
>   lib/librte_eal/common/include/rte_lcore.h     |  19 +-
>   lib/librte_eal/common/include/rte_malloc.h    | 158 +++++++++-
>   .../common/include/rte_malloc_heap.h          |   2 +
>   lib/librte_eal/common/include/rte_memzone.h   | 183 +++++++++++
>   lib/librte_eal/common/malloc_heap.c           | 277 +++++++++++++++--
>   lib/librte_eal/common/malloc_heap.h           |  26 ++
>   lib/librte_eal/common/rte_malloc.c            | 197 +++++++++++-
>   lib/librte_eal/rte_eal_version.map            |  10 +
>   13 files changed, 1118 insertions(+), 109 deletions(-)
> 

So, now that the RFC is out, i would like to ask a general question.

One other thing that this patchset is missing, is the ability for data 
structures (e.g. hash, mempool, etc.) to be allocated from external 
heaps. Currently, we can kinda sorta do that with various _init() API's 
(initializing a data structure over already allocated memzone), but this 
is not ideal and is a hassle for anyone using external memory in DPDK.

There are basically four ways to approach this problem (that i can see).

First way is to change "socket ID" to mean "heap ID" everywhere. This 
has an upside of having a consistent API to allocate from internal and 
external heaps, with little to no API additions, only internal changes 
to account for the fact that "socket ID" is now "heap ID".

However, there is a massive downside to this approach: it is a *giant* 
API change, and it's also a giant *ABI-compatible* API change. Meaning, 
replacing socket ID with heap ID will not cause compile failures for old 
code, which would result in many subtle bugs in already existing 
codebases. So, while in the perfect world this would've been my 
preferred approach, realistically i think this is a very, very bad idea.

Second one is to add a separate "heap name" API's to everything. This 
has an upside of clean separation between allocation from internal and 
external heaps. (well, whether it's an upside is debatable...) This is 
the approach i expected to take when i was creating this patchset.

The downside is that we have to add new API's to every library and every 
DPDK data structure, to allow explicit allocation from external heaps. 
We will have to maintain both, and things like hardware drivers will 
need to have a way to indicate the need to allocate things from a 
particular external heap.

The third way is to expose the "heap ID" externally, and allow a single, 
unified API to reserve memory. That is, create an API that would map 
either a NUMA node ID or a heap name to an ID, and allow reserving 
memory through that ID regardless of whether it's internal or external 
memory. This would also allow to gradually phase out socket-based ID's 
in favor of heap ID API, should we choose to do so.

The downside for this is, it adds a layer of indirection between socket 
ID and reserving memory on a particular NUMA node, and it makes it hard 
to produce a single value of "heap ID" in such a way as to replicate 
current functionality of allocating with SOCKET_ID_ANY. Most likely user 
will have to explicitly try to allocate on all sockets, unless we keep 
old API's around in parallel.

Finally, a fourth way would be to abuse the socket ID to also mean 
something else, which is an approach i've seen numerous times already, 
and one that i don't like. We could register new heaps as a new, fake 
socket ID, and use that to address external heaps (each heap would get 
its own socket). So, keep current socket ID behavior, but for 
non-existent sockets it would be possible to be registered as a fake 
socket pointing to an external heap.

The upside for this approach would be that no API changes are required 
whatsoever to existing libraries - this scheme is compatible with both 
internal and external heaps without adding a separate API.

The downside is bad semantics - "special" sockets, handling of 
SOCKET_ID_ANY, handling of "invalid socket" vs. "invalid socket that 
happens to correspond to an existing external heap", and many other 
things that can be confusing. I don't like this option, but it's an 
option :)

Thoughts? Comments?

I myself still favor the second way, however there are good arguments to 
be made for each of these options.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v2] eal: move runtime config file to new location
  @ 2018-07-13 10:44  9% ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-07-13 10:44 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, thomas,
	bruce.richardson, harry.van.haaren

As per deprecation notice [1], move DPDK runtime config to default
DPDK runtime data location. Also, remove the deprecation notice and
update release notes to indicate the changes.

[1] http://dpdk.org/dev/patchwork/patch/40418/

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/rel_notes/deprecation.rst   | 10 ----------
 doc/guides/rel_notes/release_18_08.rst |  7 +++++++
 lib/librte_eal/common/eal_filesystem.h | 10 +++-------
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 5de59833d..14714fe94 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,16 +8,6 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
-* eal: DPDK runtime configuration file (located at
-  ``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
-
-  - if DPDK is running as root, path will be set to
-    ``/var/run/dpdk/<prefix>/config``
-  - if DPDK is not running as root and $XDG_RUNTIME_DIR is set, path will be set
-    to ``$XDG_RUNTIME_DIR/dpdk/<prefix>/config``
-  - if DPDK is not running as root and $XDG_RUNTIME_DIR is not set, path will be
-    set to ``/tmp/dpdk/<prefix>/config``
-
 * eal: both declaring and identifying devices will be streamlined in v18.08.
   New functions will appear to query a specific port from buses, classes of
   device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index d41546c27..ebe7e6bd6 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -129,6 +129,13 @@ API Changes
   - ``RTE_COMP_FF_OOP_SGL_IN_LB_OUT``
   - ``RTE_COMP_FF_OOP_LB_IN_SGL_OUT``
 
+* Path to runtime config file has changed. The new path is determined as
+  follows:
+  - If DPDK is running as root, ``/var/run/dpdk/<prefix>/config``
+  - If DPDK is not running as root:
+    - If ``$XDG_RUNTIME_DIR`` is set, ``${XDG_RUNTIME_DIR}/dpdk/<prefix>/config``
+    - Otherwise, ``/tmp/dpdk/<prefix>/config``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_eal/common/eal_filesystem.h b/lib/librte_eal/common/eal_filesystem.h
index 364f38d13..de05febf4 100644
--- a/lib/librte_eal/common/eal_filesystem.h
+++ b/lib/librte_eal/common/eal_filesystem.h
@@ -12,7 +12,6 @@
 #define EAL_FILESYSTEM_H
 
 /** Path of rte config file. */
-#define RUNTIME_CONFIG_FMT "%s/.%s_config"
 
 #include <stdint.h>
 #include <limits.h>
@@ -30,17 +29,14 @@ eal_create_runtime_dir(void);
 const char *
 eal_get_runtime_dir(void);
 
+#define RUNTIME_CONFIG_FNAME "config"
 static inline const char *
 eal_runtime_config_path(void)
 {
 	static char buffer[PATH_MAX]; /* static so auto-zeroed */
-	const char *directory = "/var/run";
-	const char *home_dir = getenv("HOME");
 
-	if (getuid() != 0 && home_dir != NULL)
-		directory = home_dir;
-	snprintf(buffer, sizeof(buffer) - 1, RUNTIME_CONFIG_FMT, directory,
-			internal_config.hugefile_prefix);
+	snprintf(buffer, sizeof(buffer) - 1, "%s/%s", eal_get_runtime_dir(),
+			RUNTIME_CONFIG_FNAME);
 	return buffer;
 }
 
-- 
2.17.1

^ permalink raw reply	[relevance 9%]

* [dpdk-dev] [PATCH v2 8/9] doc: add deprecation notice for EAL command line options
  @ 2018-07-13 10:27  5%   ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-07-13 10:27 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, ray.kinsella,
	kuralamudhan.ramakrishnan, louise.m.daly, bruce.richardson,
	ferruh.yigit, konstantin.ananyev, thomas

Options --no-shconf and --huge-unlink will be removed, and
replaced with --in-memory option, which will be a superset
of these two, and an offially support method to run DPDK
entirely in memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    RFC->v1:
    - Add this patch

 doc/guides/rel_notes/deprecation.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 5de59833d..dd1b5c5d8 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -8,6 +8,11 @@ API and ABI deprecation notices are to be posted here.
 Deprecation Notices
 -------------------
 
+* eal: command-line options ``--no-shconf`` and ``--huge-unlink`` will be
+    removed, and replaced with a single option ``--in-memory``, which will
+    enable DPDK to operate entirely in memory, without creating any files on any
+    filesystems.
+
 * eal: DPDK runtime configuration file (located at
   ``/var/run/.<prefix>_config``) will be moved. The new path will be as follows:
 
-- 
2.17.1

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 1/2] examples/ethtool: add to meson build
  2018-07-12  7:54  3%   ` Thomas Monjalon
@ 2018-07-12 10:46  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-07-12 10:46 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Thu, Jul 12, 2018 at 09:54:32AM +0200, Thomas Monjalon wrote:
> 29/03/2018 16:04, Bruce Richardson:
> > Add the ethtool example to the meson build. This example is more
> > complicated than the previously added ones as it has files in two
> > subdirectories. An ethtool "wrapper lib" in one, used by the actual
> > example "ethtool app" in the other.
> > 
> > Rather than using recursive operation, like is done with the makefiles,
> > we instead can just special-case the building of the library from the
> > single .c file, and then use that as a dependency when building the app
> > proper.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> It does not compile because of experimental function:
> examples/ethtool/lib/rte_ethtool.c:186:2: error:
> ‘rte_eth_dev_get_module_info’ is deprecated: Symbol is not yet part of stable ABI
> 
Ok. This set is fairly old, and I think I've found other issues with it
since. I suggest we drop this set for 18.08 consideration.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] examples/ethtool: add to meson build
  @ 2018-07-12  7:54  3%   ` Thomas Monjalon
  2018-07-12 10:46  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-07-12  7:54 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

29/03/2018 16:04, Bruce Richardson:
> Add the ethtool example to the meson build. This example is more
> complicated than the previously added ones as it has files in two
> subdirectories. An ethtool "wrapper lib" in one, used by the actual
> example "ethtool app" in the other.
> 
> Rather than using recursive operation, like is done with the makefiles,
> we instead can just special-case the building of the library from the
> single .c file, and then use that as a dependency when building the app
> proper.
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

It does not compile because of experimental function:
examples/ethtool/lib/rte_ethtool.c:186:2: error:
‘rte_eth_dev_get_module_info’ is deprecated: Symbol is not yet part of stable ABI

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v13 00/19] enable hotplug on multi-process
                     ` (4 preceding siblings ...)
  2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
@ 2018-07-12  1:18  1% ` Qi Zhang
  2018-08-10  0:42  1% ` [dpdk-dev] [PATCH v14 0/6] " Qi Zhang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-07-12  1:18 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11:
- move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_local which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this on all Intel devices and
vdev, it can be refereneced by other driver when equevalent fix is
required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (19):
  ethdev: add function to release port in local process
  bus/pci: fix PCI address compare
  bus/pci: enable vfio unmap resource for secondary
  vfio: remove uneccessary IPC for group fd clear
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  net/i40e: enable hotplug on secondary process
  net/ixgbe: enable hotplug on secondary process
  net/af_packet: enable hotplug on secondary process
  net/bonding: enable hotplug on secondary process
  net/kni: enable hotplug on secondary process
  net/null: enable hotplug on secondary process
  net/octeontx: enable hotplug on secondary process
  net/pcap: enable hotplug on secondary process
  net/softnic: enable hotplug on secondary process
  net/tap: enable hotplug on secondary process
  net/vhost: enable hotplug on secondary process
  examples/multi_process: add hotplug sample
  doc: update release notes for multi process hotplug

 doc/guides/rel_notes/release_18_08.rst         |  11 +
 drivers/bus/pci/linux/pci_vfio.c               | 129 +++++++--
 drivers/net/af_packet/rte_eth_af_packet.c      |   7 +-
 drivers/net/bonding/rte_eth_bond_pmd.c         |   7 +-
 drivers/net/i40e/i40e_ethdev.c                 |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c               |   3 +
 drivers/net/kni/rte_eth_kni.c                  |   7 +-
 drivers/net/null/rte_eth_null.c                |  12 +-
 drivers/net/octeontx/octeontx_ethdev.c         |   9 +
 drivers/net/pcap/rte_eth_pcap.c                |  11 +-
 drivers/net/softnic/rte_eth_softnic.c          |  15 +-
 drivers/net/tap/rte_eth_tap.c                  |  13 +-
 drivers/net/vhost/rte_eth_vhost.c              |   7 +-
 examples/multi_process/Makefile                |   1 +
 examples/multi_process/hotplug_mp/Makefile     |  23 ++
 examples/multi_process/hotplug_mp/commands.c   | 214 +++++++++++++++
 examples/multi_process/hotplug_mp/commands.h   |  10 +
 examples/multi_process/hotplug_mp/main.c       |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile             |   1 +
 lib/librte_eal/common/eal_common_dev.c         | 177 ++++++++++++-
 lib/librte_eal/common/eal_private.h            |  37 +++
 lib/librte_eal/common/hotplug_mp.c             | 348 +++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h             |  48 ++++
 lib/librte_eal/common/include/rte_dev.h        |   6 +
 lib/librte_eal/common/meson.build              |   1 +
 lib/librte_eal/linuxapp/eal/Makefile           |   1 +
 lib/librte_eal/linuxapp/eal/eal.c              |   6 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c         |  45 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h         |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |   8 -
 lib/librte_ethdev/rte_ethdev.c                 |  12 +
 lib/librte_ethdev/rte_ethdev_driver.h          |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h             |   8 +
 33 files changed, 1134 insertions(+), 103 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process
                     ` (3 preceding siblings ...)
  2018-07-12  1:14  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
@ 2018-07-12  1:18  1% ` Qi Zhang
  2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v13 " Qi Zhang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-07-12  1:18 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11:
- move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_local which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this on all Intel devices and
vdev, it can be refereneced by other driver when equevalent fix is
required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (19):
  ethdev: add function to release port in local process
  bus/pci: fix PCI address compare
  bus/pci: enable vfio unmap resource for secondary
  vfio: remove uneccessary IPC for group fd clear
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  net/i40e: enable hotplug on secondary process
  net/ixgbe: enable hotplug on secondary process
  net/af_packet: enable hotplug on secondary process
  net/bonding: enable hotplug on secondary process
  net/kni: enable hotplug on secondary process
  net/null: enable hotplug on secondary process
  net/octeontx: enable hotplug on secondary process
  net/pcap: enable hotplug on secondary process
  net/softnic: enable hotplug on secondary process
  net/tap: enable hotplug on secondary process
  net/vhost: enable hotplug on secondary process
  examples/multi_process: add hotplug sample
  doc: update release notes for multi process hotplug

 doc/guides/rel_notes/release_18_08.rst         |  11 +
 drivers/bus/pci/linux/pci_vfio.c               | 129 +++++++--
 drivers/net/af_packet/rte_eth_af_packet.c      |   7 +-
 drivers/net/bonding/rte_eth_bond_pmd.c         |   7 +-
 drivers/net/i40e/i40e_ethdev.c                 |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c               |   3 +
 drivers/net/kni/rte_eth_kni.c                  |   7 +-
 drivers/net/null/rte_eth_null.c                |  12 +-
 drivers/net/octeontx/octeontx_ethdev.c         |   9 +
 drivers/net/pcap/rte_eth_pcap.c                |  11 +-
 drivers/net/softnic/rte_eth_softnic.c          |  15 +-
 drivers/net/tap/rte_eth_tap.c                  |  13 +-
 drivers/net/vhost/rte_eth_vhost.c              |   7 +-
 examples/multi_process/Makefile                |   1 +
 examples/multi_process/hotplug_mp/Makefile     |  23 ++
 examples/multi_process/hotplug_mp/commands.c   | 214 +++++++++++++++
 examples/multi_process/hotplug_mp/commands.h   |  10 +
 examples/multi_process/hotplug_mp/main.c       |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile             |   1 +
 lib/librte_eal/common/eal_common_dev.c         | 177 ++++++++++++-
 lib/librte_eal/common/eal_private.h            |  37 +++
 lib/librte_eal/common/hotplug_mp.c             | 348 +++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h             |  48 ++++
 lib/librte_eal/common/include/rte_dev.h        |   6 +
 lib/librte_eal/common/meson.build              |   1 +
 lib/librte_eal/linuxapp/eal/Makefile           |   1 +
 lib/librte_eal/linuxapp/eal/eal.c              |   6 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c         |  45 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h         |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |   8 -
 lib/librte_ethdev/rte_ethdev.c                 |  12 +
 lib/librte_ethdev/rte_ethdev_driver.h          |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h             |   8 +
 33 files changed, 1134 insertions(+), 103 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v13 19/19] doc: update release notes for multi process hotplug
  2018-07-12  1:14  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
@ 2018-07-12  1:15  4%   ` Qi Zhang
  0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-07-12  1:15 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

Update release notes for the new multi-process hotplug feature.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/rel_notes/release_18_08.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index bc0124295..1251e4b5b 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -46,6 +46,12 @@ New Features
   Flow API support has been added to CXGBE Poll Mode Driver to offload
   flows to Chelsio T5/T6 NICs.
 
+* **Support device multi-process hotplug.**
+
+  Hotplug and hot-unplug for devices will now be supported in multiprocessing
+  scenario. Any ethdev devices created in the primary process will be regarded
+  as shared and will be available for all DPDK processes. Synchronization between
+  processes will be done using DPDK IPC.
 
 API Changes
 -----------
@@ -60,6 +66,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+  In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+  that device be attached on all processes, while ``rte_eal_hotplug_remove``
+  will guarantee device be detached on all processes.
 
 ABI Changes
 -----------
-- 
2.13.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process
                     ` (2 preceding siblings ...)
  2018-07-11 13:47  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
@ 2018-07-12  1:14  1% ` Qi Zhang
  2018-07-12  1:15  4%   ` [dpdk-dev] [PATCH v13 19/19] doc: update release notes for multi process hotplug Qi Zhang
  2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-07-12  1:14 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
  so, modify the sample code to use rte_eal_hotplug_add and
  rte_eal_hotplug_remove to attach/detach device.

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11:
- move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_local which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this on all Intel devices and
vdev, it can be refereneced by other driver when equevalent fix is
required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a pci device */
> attach 0000:81:00.0

/* detach the pci device */
> detach 0000:81:00.0

/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0

/* detach the vdev af_packet device */
> detach net_af_packet

Qi Zhang (19):
  ethdev: add function to release port in local process
  bus/pci: fix PCI address compare
  bus/pci: enable vfio unmap resource for secondary
  vfio: remove uneccessary IPC for group fd clear
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  net/i40e: enable hotplug on secondary process
  net/ixgbe: enable hotplug on secondary process
  net/af_packet: enable hotplug on secondary process
  net/bonding: enable hotplug on secondary process
  net/kni: enable hotplug on secondary process
  net/null: enable hotplug on secondary process
  net/octeontx: enable hotplug on secondary process
  net/pcap: enable hotplug on secondary process
  net/softnic: enable hotplug on secondary process
  net/tap: enable hotplug on secondary process
  net/vhost: enable hotplug on secondary process
  examples/multi_process: add hotplug sample
  doc: update release notes for multi process hotplug

 doc/guides/rel_notes/release_18_08.rst         |  11 +
 drivers/bus/pci/linux/pci_vfio.c               | 129 +++++++--
 drivers/net/af_packet/rte_eth_af_packet.c      |   7 +-
 drivers/net/bonding/rte_eth_bond_pmd.c         |   7 +-
 drivers/net/i40e/i40e_ethdev.c                 |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c               |   3 +
 drivers/net/kni/rte_eth_kni.c                  |   7 +-
 drivers/net/null/rte_eth_null.c                |  12 +-
 drivers/net/octeontx/octeontx_ethdev.c         |   9 +
 drivers/net/pcap/rte_eth_pcap.c                |  11 +-
 drivers/net/softnic/rte_eth_softnic.c          |  15 +-
 drivers/net/tap/rte_eth_tap.c                  |  13 +-
 drivers/net/vhost/rte_eth_vhost.c              |   7 +-
 examples/multi_process/Makefile                |   1 +
 examples/multi_process/hotplug_mp/Makefile     |  23 ++
 examples/multi_process/hotplug_mp/commands.c   | 214 +++++++++++++++
 examples/multi_process/hotplug_mp/commands.h   |  10 +
 examples/multi_process/hotplug_mp/main.c       |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile             |   1 +
 lib/librte_eal/common/eal_common_dev.c         | 177 ++++++++++++-
 lib/librte_eal/common/eal_private.h            |  37 +++
 lib/librte_eal/common/hotplug_mp.c             | 348 +++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h             |  48 ++++
 lib/librte_eal/common/include/rte_dev.h        |   6 +
 lib/librte_eal/common/meson.build              |   1 +
 lib/librte_eal/linuxapp/eal/Makefile           |   1 +
 lib/librte_eal/linuxapp/eal/eal.c              |   6 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c         |  45 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h         |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |   8 -
 lib/librte_ethdev/rte_ethdev.c                 |  12 +
 lib/librte_ethdev/rte_ethdev_driver.h          |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h             |   8 +
 33 files changed, 1134 insertions(+), 103 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v12 19/19] doc: update release notes for multi process hotplug
  2018-07-11 13:47  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
@ 2018-07-11 13:48  4%   ` Qi Zhang
  0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-07-11 13:48 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

Update release notes for the new multi-process hotplug feature.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/rel_notes/release_18_08.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index bc0124295..1251e4b5b 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -46,6 +46,12 @@ New Features
   Flow API support has been added to CXGBE Poll Mode Driver to offload
   flows to Chelsio T5/T6 NICs.
 
+* **Support device multi-process hotplug.**
+
+  Hotplug and hot-unplug for devices will now be supported in multiprocessing
+  scenario. Any ethdev devices created in the primary process will be regarded
+  as shared and will be available for all DPDK processes. Synchronization between
+  processes will be done using DPDK IPC.
 
 API Changes
 -----------
@@ -60,6 +66,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+  In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+  that device be attached on all processes, while ``rte_eal_hotplug_remove``
+  will guarantee device be detached on all processes.
 
 ABI Changes
 -----------
-- 
2.13.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process
      2018-07-11  3:08  1% ` [dpdk-dev] [PATCH v11 00/19] " Qi Zhang
@ 2018-07-11 13:47  1% ` Qi Zhang
  2018-07-11 13:48  4%   ` [dpdk-dev] [PATCH v12 19/19] doc: update release notes for multi process hotplug Qi Zhang
  2018-07-12  1:14  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-07-11 13:47 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
  any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.

v11:
- move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_local which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this on all Intel devices and
vdev, it can be refereneced by other driver when equevalent fix is
required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a af_packet vdev */
>attach net_af_packet,iface=eth0

/* detach port 0 */
>detach 0

Qi Zhang (19):
  ethdev: add function to release port in local process
  bus/pci: fix PCI address compare
  bus/pci: enable vfio unmap resource for secondary
  vfio: remove uneccessary IPC for group fd clear
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  net/i40e: enable hotplug on secondary process
  net/ixgbe: enable hotplug on secondary process
  net/af_packet: enable hotplug on secondary process
  net/bonding: enable hotplug on secondary process
  net/kni: enable hotplug on secondary process
  net/null: enable hotplug on secondary process
  net/octeontx: enable hotplug on secondary process
  net/pcap: enable hotplug on secondary process
  net/softnic: enable hotplug on secondary process
  net/tap: enable hotplug on secondary process
  net/vhost: enable hotplug on secondary process
  examples/multi_process: add hotplug sample
  doc: update release notes for multi process hotplug

 doc/guides/rel_notes/release_18_08.rst         |  11 +
 drivers/bus/pci/linux/pci_vfio.c               | 129 +++++++--
 drivers/net/af_packet/rte_eth_af_packet.c      |   7 +-
 drivers/net/bonding/rte_eth_bond_pmd.c         |   7 +-
 drivers/net/i40e/i40e_ethdev.c                 |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c               |   3 +
 drivers/net/kni/rte_eth_kni.c                  |   7 +-
 drivers/net/null/rte_eth_null.c                |  12 +-
 drivers/net/octeontx/octeontx_ethdev.c         |   9 +
 drivers/net/pcap/rte_eth_pcap.c                |  11 +-
 drivers/net/softnic/rte_eth_softnic.c          |  15 +-
 drivers/net/tap/rte_eth_tap.c                  |  13 +-
 drivers/net/vhost/rte_eth_vhost.c              |   7 +-
 examples/multi_process/Makefile                |   1 +
 examples/multi_process/hotplug_mp/Makefile     |  23 ++
 examples/multi_process/hotplug_mp/commands.c   | 197 ++++++++++++++
 examples/multi_process/hotplug_mp/commands.h   |  10 +
 examples/multi_process/hotplug_mp/main.c       |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile             |   1 +
 lib/librte_eal/common/eal_common_dev.c         | 177 ++++++++++++-
 lib/librte_eal/common/eal_private.h            |  37 +++
 lib/librte_eal/common/hotplug_mp.c             | 348 +++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h             |  48 ++++
 lib/librte_eal/common/include/rte_dev.h        |   6 +
 lib/librte_eal/common/meson.build              |   1 +
 lib/librte_eal/linuxapp/eal/Makefile           |   1 +
 lib/librte_eal/linuxapp/eal/eal.c              |   6 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c         |  45 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h         |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |   8 -
 lib/librte_ethdev/rte_ethdev.c                 |  12 +
 lib/librte_ethdev/rte_ethdev_driver.h          |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h             |   8 +
 33 files changed, 1117 insertions(+), 103 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v10 05/19] eal: enable hotplug on multi-process
  @ 2018-07-11  8:39  3%           ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-07-11  8:39 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas
  Cc: Ananyev, Konstantin, dev, Richardson, Bruce, Yigit, Ferruh,
	Shelton, Benjamin H, Vangati, Narender

On 11-Jul-18 3:11 AM, Zhang, Qi Z wrote:
>>>> +/* Max length for a bus name */
>>>> +#define RTE_BUS_NAME_MAX_LEN 32
>>>
>>> Is this enforced anywhere in the bus codebase? Can we guarantee that
>>> bus name will never be bigger than this?
>>
>> I think 32 should be enough for a bus name even in future.
> 
> Sorry, I missed your point, I think it is not enforced, we still can add a new bus exceed 32,
> but for RTE_DEV_NAME_MAX_LEN which is used in rte_devargs to enforce all device name not exceed 64.
> So, it's better to move RTE_BUS_NAME_MAX_LEN into hotplug_mp as internal , and this can be regarded as a limitation for hotplug so far, though it should be enough for all exist cases.
> And same for RTE_DEV_ARGS_MAX_LEN.

Can we fix it in this patchset, or would that involve an ABI break of 
some sort?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v11 19/19] doc: update release notes for multi process hotplug
  2018-07-11  3:08  1% ` [dpdk-dev] [PATCH v11 00/19] " Qi Zhang
@ 2018-07-11  3:09  4%   ` Qi Zhang
  0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-07-11  3:09 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

Update release notes for the new multi-process hotplug feature.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 doc/guides/rel_notes/release_18_08.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index bc0124295..1251e4b5b 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -46,6 +46,12 @@ New Features
   Flow API support has been added to CXGBE Poll Mode Driver to offload
   flows to Chelsio T5/T6 NICs.
 
+* **Support device multi-process hotplug.**
+
+  Hotplug and hot-unplug for devices will now be supported in multiprocessing
+  scenario. Any ethdev devices created in the primary process will be regarded
+  as shared and will be available for all DPDK processes. Synchronization between
+  processes will be done using DPDK IPC.
 
 API Changes
 -----------
@@ -60,6 +66,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+  In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+  that device be attached on all processes, while ``rte_eal_hotplug_remove``
+  will guarantee device be detached on all processes.
 
 ABI Changes
 -----------
-- 
2.13.6

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v11 00/19] enable hotplug on multi-process
    @ 2018-07-11  3:08  1% ` Qi Zhang
  2018-07-11  3:09  4%   ` [dpdk-dev] [PATCH v11 19/19] doc: update release notes for multi process hotplug Qi Zhang
  2018-07-11 13:47  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 200+ results
From: Qi Zhang @ 2018-07-11  3:08 UTC (permalink / raw)
  To: thomas, anatoly.burakov
  Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
	benjamin.h.shelton, narender.vangati, Qi Zhang

v11:
- move out common code from pci_vfio_unmap_secondary and
  pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
  to patch 5/19 to improve code readability.

v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
  all processes now, it is not necessary to support private vdev for
  a secondary process which is identified by a not-NULL devargs in
  "--vdev". So re-work on all vdev driver changes to simpified device
  probe scenario on a secondary process, devargs will be ignored on
  secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.

v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
  eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
  devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
  1) The device can't be detached clearly in a secondary process, which
     also cause it can't be attached again, due to the error that
     /dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
  2) repeat detach/attach device will cause "cannot find TAILQ entry
     for PCI device" due to incorrect PCI address compare.
     (see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue

v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.

NOTE:
  Some issues which is not related with this patchset is expected when
  play with hotplug_mp sample as belows.

- Attach a PCI device twice may cause device can't be detached
  below fix is required:
  https://patches.dpdk.org/patch/42030/

- ixgbe device can't detached, below fix is required
  https://patches.dpdk.org/patch/42031/

v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.

  NOTE: current patchset depends on below IPC fix, or it may not be able
  to attach a shared vdev.
  https://patches.dpdk.org/patch/41647/

v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
  secondary process, so we don't need to patch on PMD if PMD use the
  default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
  thread.

v5:
- since we will keep mp thread separate from interrupt thread,
  it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
  way to prevent rte_eth_dev_release_port be called after
  rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
  previous re-work
- fix the issue when the first attached device is a private device from
  secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
  an open and in discussion as below.
  https://mails.dpdk.org/archives/dev/2018-June/105359.html

v4:
- since mp thread will be merged to interrupt thread, the fix on v3
  for sync IPC deadlock will not work. the new version enable the
  machanism to invoke a mp action callback in a temporary thread to
  avoid the IPC deadlock, with this, secondary to primary request
  impelemtation also be simplified, since we can use sync request
  directly in a separate thread.

v3:
- enable mp init callback register to help non-eal module to initialize
  mp channel during rte_eal_init
- fix when attach share device from secondary.
  1) dead lock due to sync IPC be invoked in rte_malloc in primary
     process when handle secondary request to attach device, the
     solution is primary process to issue share device attach/detach
     in interrupt thread.
  2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
  1) rename sample to "hotplug_mp", and move to example/multi-process.
  2) cleanup header include.
  3) call rte_eal_cleanup before exit.

v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
  rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling 
- fix meson.build.
- improve comments. 

Background:
===========

Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.

While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.

Solution:
=========

So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:

1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary

In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.

Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.

Scenario for Case 1, 2:

attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success

detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed

Scenario for case 3, 4:

attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
   goto i).
c) primary forward attach request to all secondary as async request
   (because this in mp thread context, use sync request will deadlock,
    same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
 
detach device from secondary:
a) secondary send async request to primary and wait on a condition
   which will be released by matched response from primary.
b) primary receive the request and  perform pre-detach check, if device
   is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.

APIs chenages:
==============

scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.


PMD Impact:
===========

Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_local which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this on all Intel devices and
vdev, it can be refereneced by other driver when equevalent fix is
required

Example:
========

The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.

/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto

Command Line Example:

>help
>list

/* attach a af_packet vdev */
>attach net_af_packet,iface=eth0

/* detach port 0 */
>detach 0

Qi Zhang (19):
  ethdev: add function to release port in local process
  bus/pci: fix PCI address compare
  bus/pci: enable vfio unmap resource for secondary
  vfio: remove uneccessary IPC for group fd clear
  eal: enable hotplug on multi-process
  eal: support attach or detach share device from  secondary
  net/i40e: enable hotplug on secondary process
  net/ixgbe: enable hotplug on secondary process
  net/af_packet: enable hotplug on secondary process
  net/bonding: enable hotplug on secondary process
  net/kni: enable hotplug on secondary process
  net/null: enable hotplug on secondary process
  net/octeontx: enable hotplug on secondary process
  net/pcap: enable hotplug on secondary process
  net/softnic: enable hotplug on secondary process
  net/tap: enable hotplug on secondary process
  net/vhost: enable hotplug on secondary process
  examples/multi_process: add hotplug sample
  doc: update release notes for multi process hotplug

 doc/guides/rel_notes/release_18_08.rst         |  11 +
 drivers/bus/pci/linux/pci_vfio.c               | 129 +++++++--
 drivers/net/af_packet/rte_eth_af_packet.c      |   7 +-
 drivers/net/bonding/rte_eth_bond_pmd.c         |   7 +-
 drivers/net/i40e/i40e_ethdev.c                 |   2 +
 drivers/net/ixgbe/ixgbe_ethdev.c               |   3 +
 drivers/net/kni/rte_eth_kni.c                  |   7 +-
 drivers/net/null/rte_eth_null.c                |  12 +-
 drivers/net/octeontx/octeontx_ethdev.c         |   9 +
 drivers/net/pcap/rte_eth_pcap.c                |  11 +-
 drivers/net/softnic/rte_eth_softnic.c          |  15 +-
 drivers/net/tap/rte_eth_tap.c                  |  13 +-
 drivers/net/vhost/rte_eth_vhost.c              |   7 +-
 examples/multi_process/Makefile                |   1 +
 examples/multi_process/hotplug_mp/Makefile     |  23 ++
 examples/multi_process/hotplug_mp/commands.c   | 197 ++++++++++++++
 examples/multi_process/hotplug_mp/commands.h   |  10 +
 examples/multi_process/hotplug_mp/main.c       |  41 +++
 lib/librte_eal/bsdapp/eal/Makefile             |   1 +
 lib/librte_eal/common/eal_common_dev.c         | 184 ++++++++++++-
 lib/librte_eal/common/eal_private.h            |  37 +++
 lib/librte_eal/common/hotplug_mp.c             | 346 +++++++++++++++++++++++++
 lib/librte_eal/common/hotplug_mp.h             |  48 ++++
 lib/librte_eal/common/include/rte_dev.h        |   6 +
 lib/librte_eal/common/meson.build              |   1 +
 lib/librte_eal/linuxapp/eal/Makefile           |   1 +
 lib/librte_eal/linuxapp/eal/eal.c              |   6 +
 lib/librte_eal/linuxapp/eal/eal_vfio.c         |  45 +---
 lib/librte_eal/linuxapp/eal/eal_vfio.h         |   1 -
 lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c |   8 -
 lib/librte_ethdev/rte_ethdev.c                 |  12 +
 lib/librte_ethdev/rte_ethdev_driver.h          |  16 +-
 lib/librte_ethdev/rte_ethdev_pci.h             |   8 +
 33 files changed, 1122 insertions(+), 103 deletions(-)
 create mode 100644 examples/multi_process/hotplug_mp/Makefile
 create mode 100644 examples/multi_process/hotplug_mp/commands.c
 create mode 100644 examples/multi_process/hotplug_mp/commands.h
 create mode 100644 examples/multi_process/hotplug_mp/main.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.c
 create mode 100644 lib/librte_eal/common/hotplug_mp.h

-- 
2.13.6

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v2 1/4] cryptodev: add min headroom and tailroom requirement
  @ 2018-07-10 14:42  5%     ` Anoob Joseph
  0 siblings, 0 replies; 200+ results
From: Anoob Joseph @ 2018-07-10 14:42 UTC (permalink / raw)
  To: Declan Doherty, Pablo de Lara
  Cc: Anoob Joseph, Akhil Goyal, Ankur Dwivedi, Jerin Jacob,
	Narayana Prasad, dev

Enabling crypto devs to specify the minimum headroom and tailroom it
expects in the mbuf. For net PMDs, standard headroom has to be honoured
by applications, which is not strictly followed for crypto devs. This
prevents crypto devs from using free space in mbuf (available as
head/tailroom) for internal requirements in crypto operations. Addition
of head/tailroom requirement will help PMDs to communicate such
requirements to the application.

The availability and use of head/tailroom is an optimization if the
hardware supports use of head/tailroom for crypto-op info. For devices
that do not support using the head/tailroom, they can continue to operate
without any performance-drop.

Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
---
v2:
* No change

v1:
* Removed deprecation notice
* Updated release note
* Renamed new fields to have 'mbuf' in the name
* Changed the type of new fields to uint16_t (instead of uint32_t)

 doc/guides/rel_notes/release_18_08.rst | 6 ++++++
 lib/librte_cryptodev/rte_cryptodev.h   | 6 ++++++
 2 files changed, 12 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_08.rst b/doc/guides/rel_notes/release_18_08.rst
index 5bc23c5..fae0d26 100644
--- a/doc/guides/rel_notes/release_18_08.rst
+++ b/doc/guides/rel_notes/release_18_08.rst
@@ -70,6 +70,12 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* cryptodev: Additional fields in rte_cryptodev_info.
+
+  Two new fields of type ``uint16_t`` added in ``rte_cryptodev_info``
+  structure: ``min_mbuf_headroom_req`` and ``min_mbuf_tailroom_req``. These
+  parameters specify the recommended headroom and tailroom for mbufs to be
+  processed by the PMD.
 
 Removed Items
 -------------
diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h
index 92ce6d4..4e5b5b4 100644
--- a/lib/librte_cryptodev/rte_cryptodev.h
+++ b/lib/librte_cryptodev/rte_cryptodev.h
@@ -382,6 +382,12 @@ struct rte_cryptodev_info {
 	unsigned max_nb_queue_pairs;
 	/**< Maximum number of queues pairs supported by device. */
 
+	uint16_t min_mbuf_headroom_req;
+	/**< Minimum mbuf headroom required by device */
+
+	uint16_t min_mbuf_tailroom_req;
+	/**< Minimum mbuf tailroom required by device */
+
 	struct {
 		unsigned max_nb_sessions;
 		/**< Maximum number of sessions supported by device. */
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 1/4] eventdev: add eth Tx adapter APIs
  @ 2018-07-10 12:17  3% ` Jerin Jacob
  2018-07-16  8:34  0%   ` Rao, Nikhil
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-07-10 12:17 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: olivier.matz, dev, anoob.joseph

-----Original Message-----
> Date: Fri, 6 Jul 2018 12:12:06 +0530
> From: Nikhil Rao <nikhil.rao@intel.com>
> To: jerin.jacob@caviumnetworks.com, olivier.matz@6wind.com
> CC: nikhil.rao@intel.com, dev@dpdk.org
> Subject: [PATCH 1/4] eventdev: add eth Tx adapter APIs
> X-Mailer: git-send-email 1.8.3.1
> 
> 
> The ethernet Tx adapter abstracts the transmit stage of an
> event driven packet processing application. The transmit
> stage may be implemented with eventdev PMD support or use a
> rte_service function implemented in the adapter. These APIs
> provide a common configuration and control interface and
> an transmit API for the eventdev PMD implementation.
> 
> The transmit port is specified using mbuf::port. The transmit
> queue is specified using the rte_event_eth_tx_adapter_txq_set()
> function. The mbuf will specify a queue ID in the future
> (http://mails.dpdk.org/archives/dev/2018-February/090651.html)
> at which point this function will be replaced with a macro.
> 
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> ---

1) Update doc/api/doxy-api-index.md
2) Update lib/librte_eventdev/Makefile
+SYMLINK-y-include += rte_event_eth_tx_adapter.h


I think, the following working is _pending_

1) Update app/test-eventdev/ for Tx adapter
2) Update examples/eventdev_pipeline/ for Tx adapter
3) Add Tx adapter documentation
4) Add Tx adapter ops for octeontx driver
5) Add Tx adapter ops for dpaa driver(if need)

Nikhil,
If you are OK then Cavium would like to take up (1), (2) and (4) activities.

Let me know your thoughts.

Since this patch set already crossed the RC1 deadline. We will complete
all the _pending_ work and push to next-eventdev tree in the very beginning of
v18.11 so that Anoob's adapter helper function work can be added v18.11.


> 
> This patch series adds the event ethernet Tx adapter which is
> based on a previous RFC
>  * RFCv1 - http://mails.dpdk.org/archives/dev/2018-May/102936.html
>  * RFCv2 - http://mails.dpdk.org/archives/dev/2018-June/104075.html
> 
> RFC -> V1:
> =========
> 
> * Move port and tx queue id to mbuf from mbuf private area. (Jerin Jacob)
> 
> * Support for PMD transmit function. (Jerin Jacob)
> 
> * mbuf change has been replaced with rte_event_eth_tx_adapter_txq_set().
> The goal is to align with the mbuf change for a qid field.
> (http://mails.dpdk.org/archives/dev/2018-February/090651.html). Once the mbuf
> change is available, the function can be replaced with a macro with no impact
> to applications.
> 
> * Various cleanups (Jerin Jacob)
> 
>  lib/librte_eventdev/rte_event_eth_tx_adapter.h | 497 +++++++++++++++++++++++++
>  lib/librte_mbuf/rte_mbuf.h                     |   4 +-
>  MAINTAINERS                                    |   5 +
>  3 files changed, 505 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_eventdev/rte_event_eth_tx_adapter.h
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * A structure used to retrieve statistics for an eth tx adapter instance.
> + */
> +struct rte_event_eth_tx_adapter_stats {
> +       uint64_t tx_retry;
> +       /**< Number of transmit retries */
> +       uint64_t tx_packets;
> +       /**< Number of packets transmitted */
> +       uint64_t tx_dropped;
> +       /**< Number of packets dropped */
> +};
> +
> +/** Event Eth Tx Adapter Structure */
> +struct rte_event_eth_tx_adapter {
> +       uint8_t id;
> +       /**< Adapter Identifier */
> +       uint8_t eventdev_id;
> +       /**< Max mbufs processed in any service function invocation */
> +       uint32_t max_nb_tx;
> +       /**< The adapter can return early if it has processed at least
> +        * max_nb_tx mbufs. This isn't treated as a requirement; batching may
> +        * cause the adapter to process more than max_nb_tx mbufs.
> +        */
> +       uint32_t nb_queues;
> +       /**< Number of Tx queues in adapter */
> +       int socket_id;
> +       /**< socket id */
> +       rte_spinlock_t tx_lock;
> +       /**<  Synchronization with data path */
> +       void *dev_private;
> +       /**< PMD private data */
> +       char mem_name[RTE_EVENT_ETH_TX_ADAPTER_SERVICE_NAME_LEN];
> +       /**< Memory allocation name */
> +       rte_event_eth_tx_adapter_conf_cb conf_cb;
> +       /** Configuration callback */
> +       void *conf_arg;
> +       /**< Configuration callback argument */
> +       uint16_t dev_count;
> +       /**< Highest port id supported + 1 */
> +       struct rte_event_eth_tx_adapter_ethdev *txa_ethdev;
> +       /**< Per ethernet device structure */
> +       struct rte_event_eth_tx_adapter_stats stats;
> +} __rte_cache_aligned;

Can you move this structure to .c file as implementation, Reasons are -
a) It should not be under ABI deprecation
b) INTERNAL_PORT based adapter may have different values.i.e the above
structure is implementation defined.

> +
> +struct rte_event_eth_tx_adapters {
> +       struct rte_event_eth_tx_adapter **data;
> +};
> +

same as above

> +/* Per eth device structure */
> +struct rte_event_eth_tx_adapter_ethdev {
> +       /* Pointer to ethernet device */
> +       struct rte_eth_dev *dev;
> +       /* Number of queues added */
> +       uint16_t nb_queues;
> +       /* PMD specific queue data */
> +       void *queues;
> +};

same as above

> +
> +extern struct rte_event_eth_tx_adapters rte_event_eth_tx_adapters;
> +

same as above

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a new event ethernet Tx adapter with the specified identifier.
> + *
> + * @param id
> + *  The identifier of the event ethernet Tx adapter.
> + * @param dev_id
> + *  The event device identifier.
> + * @param port_config
> + *  Event port configuration, the adapter uses this configuration to
> + *  create an event port if needed.
> + * @return
> + *   - 0: Success
> + *   - <0: Error code on failure
> + */
> +int __rte_experimental
> +rte_event_eth_tx_adapter_create(uint8_t id, uint8_t dev_id,
> +                               struct rte_event_port_conf *port_config);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Create a new event ethernet Tx adapter with the specified identifier.
> + *
> + * @param id
> + *  The identifier of the event ethernet Tx adapter.
> + * @param dev_id
> + *  The event device identifier.
> + * @param conf_cb
> + *  Callback function that initalizes members of the

s/initalizes/initializes

> + *  struct rte_event_eth_tx_adapter_conf struct passed into
> + *  it.
> + * @param conf_arg
> + *  Argument that is passed to the conf_cb function.
> + * @return
> + *   - 0: Success
> + *   - <0: Error code on failure
> + */
> +int __rte_experimental
> +rte_event_eth_tx_adapter_create_ext(uint8_t id, uint8_t dev_id,
> +                               rte_event_eth_tx_adapter_conf_cb conf_cb,
> +                               void *conf_arg);
> +
> +/**
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Add a Tx queue to the adapter.
> + * A queue value of -1 is used to indicate all
> + * queues within the device.
> + *
> + * @param id
> + *  Adapter identifier.
> + * @param eth_dev_id
> + *  Ethernet Port Identifier.
> + * @param queue
> + *  Tx queue index.
> + * @return
> + *  - 0: Success, Queues added succcessfully.

s/succcessfully/successfully


> + *  - <0: Error code on failure.
> + */
> +int __rte_experimental
> +rte_event_eth_tx_adapter_queue_add(uint8_t id,
> +                               uint16_t eth_dev_id,
> +                               int32_t queue);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + *
> + * Set Tx queue in the mbuf.
> + *
> + * @param pkt
> + *  Pointer to the mbuf.
> + * @param queue
> + *  Tx queue index.
> + */
> +void __rte_experimental
> +rte_event_eth_tx_adapter_txq_set(struct rte_mbuf *pkt, uint16_t queue);

1) Can you make this as static inline for better performance(as it is just
a mbuf field access)?

2) Please add _get function, It will be useful for application and
Tx adapter op implementation.


> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Retrieve the adapter event port. The adapter creates an event port if
> + * the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT is not set in the
> + * eth Tx capabilities of the event device.
> + *
> + * @param id
> + *  Adapter Identifier.
> + * @param[out] event_port_id
> + *  Event port pointer.
> + * @return
> + *   - 0: Success.
> + *   - <0: Error code on failure.
> + */
> +int __rte_experimental
> +rte_event_eth_tx_adapter_event_port_get(uint8_t id, uint8_t *event_port_id);
> +
> +static __rte_always_inline uint16_t __rte_experimental
> +__rte_event_eth_tx_adapter_enqueue(uint8_t id, uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event ev[],
> +                               uint16_t nb_events,
> +                               const event_tx_adapter_enqueue fn)
> +{
> +       const struct rte_eventdev *dev = &rte_eventdevs[dev_id];

Access to *dev twice(see below rte_event_eth_tx_adapter_enqueue())

> +       struct rte_event_eth_tx_adapter *txa =
> +                                       rte_event_eth_tx_adapters.data[id];

Just like common Tx adapter implementation, We can manage  ethdev queue to adapter mapping
internally. So this deference is not required in fastpath.

Please simply call the following, just like other eventdev ops.
fn(dev->data->ports[port_id], ev, nb_events)


> +
> +#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
> +       if (id >= RTE_EVENT_ETH_TX_ADAPTER_MAX_INSTANCE ||
> +               dev_id >= RTE_EVENT_MAX_DEVS ||
> +               !rte_eventdevs[dev_id].attached) {
> +               rte_errno = -EINVAL;
> +               return 0;
> +       }
> +
> +       if (port_id >= dev->data->nb_ports) {
> +               rte_errno = -EINVAL;
> +               return 0;
> +       }
> +#endif
> +       return fn((void *)txa, dev, dev->data->ports[port_id], ev, nb_events);
> +}
> +
> +/**
> + * Enqueue a burst of events objects or an event object supplied in *rte_event*
> + * structure on an  event device designated by its *dev_id* through the event
> + * port specified by *port_id*. This function is supported if the eventdev PMD
> + * has the RTE_EVENT_ETH_TX_ADAPTER_CAP_INTERNAL_PORT capability flag set.
> + *
> + * The *nb_events* parameter is the number of event objects to enqueue which are
> + * supplied in the *ev* array of *rte_event* structure.
> + *
> + * The rte_event_eth_tx_adapter_enqueue() function returns the number of
> + * events objects it actually enqueued. A return value equal to *nb_events*
> + * means that all event objects have been enqueued.
> + *
> + * @param id
> + *  The identifier of the tx adapter.
> + * @param dev_id
> + *  The identifier of the device.
> + * @param port_id
> + *  The identifier of the event port.
> + * @param ev
> + *  Points to an array of *nb_events* objects of type *rte_event* structure
> + *  which contain the event object enqueue operations to be processed.
> + * @param nb_events
> + *  The number of event objects to enqueue, typically number of
> + *  rte_event_port_enqueue_depth() available for this port.
> + *
> + * @return
> + *   The number of event objects actually enqueued on the event device. The
> + *   return value can be less than the value of the *nb_events* parameter when
> + *   the event devices queue is full or if invalid parameters are specified in a
> + *   *rte_event*. If the return value is less than *nb_events*, the remaining
> + *   events at the end of ev[] are not consumed and the caller has to take care
> + *   of them, and rte_errno is set accordingly. Possible errno values include:
> + *   - -EINVAL  The port ID is invalid, device ID is invalid, an event's queue
> + *              ID is invalid, or an event's sched type doesn't match the
> + *              capabilities of the destination queue.
> + *   - -ENOSPC  The event port was backpressured and unable to enqueue
> + *              one or more events. This error code is only applicable to
> + *              closed systems.
> + */
> +static inline uint16_t __rte_experimental
> +rte_event_eth_tx_adapter_enqueue(uint8_t id, uint8_t dev_id,
> +                               uint8_t port_id,
> +                               struct rte_event ev[],
> +                               uint16_t nb_events)
> +{
> +       const struct rte_eventdev *dev = &rte_eventdevs[dev_id];
> +
> +       return __rte_event_eth_tx_adapter_enqueue(id, dev_id, port_id, ev,
> +                                               nb_events,
> +                                               dev->txa_enqueue);

As per above, Since the function call logic is simplified you can add the
above function logic here.

> +}
> +
> index dabb12d..ab23503 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -388,6 +388,11 @@ F: lib/librte_eventdev/*crypto_adapter*
>  F: test/test/test_event_crypto_adapter.c
>  F: doc/guides/prog_guide/event_crypto_adapter.rst
> 
> +Eventdev Ethdev Tx Adapter API - EXPERIMENTAL
> +M: Nikhil Rao <nikhil.rao@intel.com>
> +T: git://dpdk.org/next/dpdk-next-eventdev
> +F: lib/librte_eventdev/*eth_tx_adapter*

Add the testcase also.

Overall it looks good. No more comments on specification.

> +
>  Raw device API - EXPERIMENTAL
>  M: Shreyansh Jain <shreyansh.jain@nxp.com>
>  M: Hemant Agrawal <hemant.agrawal@nxp.com>
> --
> 1.8.3.1
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/3] cryptodev: add min headroom and tailroom requirement
  2018-07-10 10:26  0%     ` De Lara Guarch, Pablo
@ 2018-07-10 10:50  0%       ` Anoob Joseph
  0 siblings, 0 replies; 200+ results
From: Anoob Joseph @ 2018-07-10 10:50 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, Doherty, Declan
  Cc: Akhil Goyal, Ankur Dwivedi, Jerin Jacob, Narayana Prasad, dev

Hi Pablo,

I'll look into this and will give you an updated patch.

Thanks,
Anoob

On 10-07-2018 15:56, De Lara Guarch, Pablo wrote:
> External Email
>
> Hi Anoob,
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Anoob Joseph
>> Sent: Wednesday, July 4, 2018 2:56 PM
>> To: Doherty, Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
>> <pablo.de.lara.guarch@intel.com>
>> Cc: Anoob Joseph <anoob.joseph@caviumnetworks.com>; Akhil Goyal
>> <akhil.goyal@nxp.com>; Ankur Dwivedi
>> <ankur.dwivedi@caviumnetworks.com>; Jerin Jacob
>> <jerin.jacob@caviumnetworks.com>; Narayana Prasad
>> <narayanaprasad.athreya@caviumnetworks.com>; dev@dpdk.org
>> Subject: [dpdk-dev] [PATCH v1 1/3] cryptodev: add min headroom and tailroom
>> requirement
>>
>> Enabling crypto devs to specify the minimum headroom and tailroom it expects
>> in the mbuf. For net PMDs, standard headroom has to be honoured by
>> applications, which is not strictly followed for crypto devs. This prevents crypto
>> devs from using free space in mbuf (available as
>> head/tailroom) for internal requirements in crypto operations. Addition of
>> head/tailroom requirement will help PMDs to communicate such requirements
>> to the application.
>>
>> The availability and use of head/tailroom is an optimization if the hardware
>> supports use of head/tailroom for crypto-op info. For devices that do not
>> support using the head/tailroom, they can continue to operate without any
>> performance-drop.
>>
>> Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
>> ---
>> v1:
>> * Removed deprecation notice
>> * Updated release note
>> * Renamed new fields to have 'mbuf' in the name
>> * Changed the type of new fields to uint16_t (instead of uint32_t)
>>
>>   doc/guides/rel_notes/release_18_08.rst | 6 ++++++
>>   lib/librte_cryptodev/rte_cryptodev.h   | 6 ++++++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/release_18_08.rst
>> b/doc/guides/rel_notes/release_18_08.rst
>> index 5bc23c5..fae0d26 100644
>> --- a/doc/guides/rel_notes/release_18_08.rst
>> +++ b/doc/guides/rel_notes/release_18_08.rst
>> @@ -70,6 +70,12 @@ ABI Changes
>>      Also, make sure to start the actual text at the margin.
>>      =========================================================
>>
>> +* cryptodev: Additional fields in rte_cryptodev_info.
>> +
>> +  Two new fields of type ``uint16_t`` added in ``rte_cryptodev_info``
>> +  structure: ``min_mbuf_headroom_req`` and ``min_mbuf_tailroom_req``.
>> + These  parameters specify the recommended headroom and tailroom for
>> + mbufs to be  processed by the PMD.
> I think the "cryptodev scheduler PMD" needs changes to take these new parameters into consideration.
> Scheduler_pmd_info_get should return the maximum number of these two fields on all the slaves
> (like what's done with max number of sessions).
>
> We need to close the subtree today, with all API changes done. Will you have time to make this change today?
>
> Thanks!
> Pablo
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/3] cryptodev: add min headroom and tailroom requirement
  @ 2018-07-10 10:26  0%     ` De Lara Guarch, Pablo
  2018-07-10 10:50  0%       ` Anoob Joseph
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2018-07-10 10:26 UTC (permalink / raw)
  To: Anoob Joseph, Doherty, Declan
  Cc: Akhil Goyal, Ankur Dwivedi, Jerin Jacob, Narayana Prasad, dev

Hi Anoob,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Anoob Joseph
> Sent: Wednesday, July 4, 2018 2:56 PM
> To: Doherty, Declan <declan.doherty@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Cc: Anoob Joseph <anoob.joseph@caviumnetworks.com>; Akhil Goyal
> <akhil.goyal@nxp.com>; Ankur Dwivedi
> <ankur.dwivedi@caviumnetworks.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>; Narayana Prasad
> <narayanaprasad.athreya@caviumnetworks.com>; dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v1 1/3] cryptodev: add min headroom and tailroom
> requirement
> 
> Enabling crypto devs to specify the minimum headroom and tailroom it expects
> in the mbuf. For net PMDs, standard headroom has to be honoured by
> applications, which is not strictly followed for crypto devs. This prevents crypto
> devs from using free space in mbuf (available as
> head/tailroom) for internal requirements in crypto operations. Addition of
> head/tailroom requirement will help PMDs to communicate such requirements
> to the application.
> 
> The availability and use of head/tailroom is an optimization if the hardware
> supports use of head/tailroom for crypto-op info. For devices that do not
> support using the head/tailroom, they can continue to operate without any
> performance-drop.
> 
> Signed-off-by: Anoob Joseph <anoob.joseph@caviumnetworks.com>
> ---
> v1:
> * Removed deprecation notice
> * Updated release note
> * Renamed new fields to have 'mbuf' in the name
> * Changed the type of new fields to uint16_t (instead of uint32_t)
> 
>  doc/guides/rel_notes/release_18_08.rst | 6 ++++++
>  lib/librte_cryptodev/rte_cryptodev.h   | 6 ++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_18_08.rst
> b/doc/guides/rel_notes/release_18_08.rst
> index 5bc23c5..fae0d26 100644
> --- a/doc/guides/rel_notes/release_18_08.rst
> +++ b/doc/guides/rel_notes/release_18_08.rst
> @@ -70,6 +70,12 @@ ABI Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
> 
> +* cryptodev: Additional fields in rte_cryptodev_info.
> +
> +  Two new fields of type ``uint16_t`` added in ``rte_cryptodev_info``
> +  structure: ``min_mbuf_headroom_req`` and ``min_mbuf_tailroom_req``.
> + These  parameters specify the recommended headroom and tailroom for
> + mbufs to be  processed by the PMD.

I think the "cryptodev scheduler PMD" needs changes to take these new parameters into consideration.
Scheduler_pmd_info_get should return the maximum number of these two fields on all the slaves
(like what's done with max number of sessions).

We need to close the subtree today, with all API changes done. Will you have time to make this change today?

Thanks!
Pablo

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] cryptodev: rename experimental private data APIs
  @ 2018-07-10  6:29  0% ` Gujjar, Abhinandan S
  0 siblings, 0 replies; 200+ results
From: Gujjar, Abhinandan S @ 2018-07-10  6:29 UTC (permalink / raw)
  To: Trahe, Fiona, dev
  Cc: De Lara Guarch, Pablo, jerin.jacob, Vangati, Narender, Akhil Goyal

Acked-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>


> -----Original Message-----
> From: Trahe, Fiona
> Sent: Friday, July 6, 2018 7:10 PM
> To: dev@dpdk.org
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Trahe, Fiona
> <fiona.trahe@intel.com>; Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>
> Subject: [PATCH] cryptodev: rename experimental private data APIs
> 
> The name private_data is confusing in these APIs:
> rte_cryptodev_sym_session_set_private_data()
> rte_cryptodev_sym_session_get_private_data()
> It refers to data added at the end of the session hdr for use by the application.
> The session already contains sess_private_data[index] which is used to store
> private pmd data and most references to private data refer to that.
> e.g. external apis
> rte_cryptodev_sym_get_private_session_size() and internal
> set/get_session_private_data() refer to sess_private_data[].
> 
> So rename to user_data, i.e.
> rte_cryptodev_sym_session_set_user_data()
> rte_cryptodev_sym_session_get_user_data()
> 
> Refers to changes introduced here:
> https://patches.dpdk.org/patch/38172/
> 
> Signed-off-by: Fiona Trahe <fiona.trahe@intel.com>
> ---
>  doc/guides/prog_guide/cryptodev_lib.rst        | 14 +++++++-------
>  doc/guides/prog_guide/event_crypto_adapter.rst |  6 +++---
>  doc/guides/rel_notes/release_18_08.rst         |  8 ++++++++
>  lib/librte_cryptodev/rte_cryptodev.c           | 16 ++++++++--------
>  lib/librte_cryptodev/rte_cryptodev.h           | 14 +++++++-------
>  lib/librte_cryptodev/rte_cryptodev_version.map |  4 ++--
> lib/librte_eventdev/rte_event_crypto_adapter.c |  4 ++--
>  test/test/test_event_crypto_adapter.c          |  8 ++++----
>  8 files changed, 41 insertions(+), 33 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/cryptodev_lib.rst
> b/doc/guides/prog_guide/cryptodev_lib.rst
> index 30f0bcf7a..3dbf4dde6 100644
> --- a/doc/guides/prog_guide/cryptodev_lib.rst
> +++ b/doc/guides/prog_guide/cryptodev_lib.rst
> @@ -302,24 +302,24 @@ enqueue call.
>  Private data
>  ~~~~~~~~~~~~
>  For session-based operations, the set and get API provides a mechanism for an -
> application to store and retrieve the private data information stored along with -
> the crypto session.
> +application to store and retrieve the private user data information
> +stored along with the crypto session.
> 
>  For example, suppose an application is submitting a crypto operation with a
> session -associated and wants to indicate private data information which is
> required to be
> +associated and wants to indicate private user data information which is
> +required to be
>  used after completion of the crypto operation. In this case, the application can
> use -the set API to set the private data and retrieve it using get API.
> +the set API to set the user data and retrieve it using get API.
> 
>  .. code-block:: c
> 
> -	int rte_cryptodev_sym_session_set_private_data(
> +	int rte_cryptodev_sym_session_set_user_data(
>  		struct rte_cryptodev_sym_session *sess,	void *data, uint16_t
> size);
> 
> -	void * rte_cryptodev_sym_session_get_private_data(
> +	void * rte_cryptodev_sym_session_get_user_data(
>  		struct rte_cryptodev_sym_session *sess);
> 
> 
> -For session-less mode, the private data information can be placed along with
> the
> +For session-less mode, the private user data information can be placed
> +along with the
>  ``struct rte_crypto_op``. The ``rte_crypto_op::private_data_offset`` indicates
> the  start of private data information. The offset is counted from the start of the
> rte_crypto_op including other crypto information such as the IVs (since there
> can diff --git a/doc/guides/prog_guide/event_crypto_adapter.rst
> b/doc/guides/prog_guide/event_crypto_adapter.rst
> index 5c1354dec..9fe09c805 100644
> --- a/doc/guides/prog_guide/event_crypto_adapter.rst
> +++ b/doc/guides/prog_guide/event_crypto_adapter.rst
> @@ -223,9 +223,9 @@ crypto security session or at an offset in the ``struct
> rte_crypto_op``.
>  The ``rte_crypto_op::private_data_offset`` is used to locate the request/
> response in the ``rte_crypto_op``.
> 
> -For crypto session, ``rte_cryptodev_sym_session_set_private_data()`` API
> +For crypto session, ``rte_cryptodev_sym_session_set_user_data()`` API
>  will be used to set request/response data. The same data will be obtained -by
> ``rte_cryptodev_sym_session_get_private_data()`` API.  The
> +by ``rte_cryptodev_sym_session_get_user_data()`` API.  The
>  RTE_EVENT_CRYPTO_ADAPTER_CAP_SESSION_PRIVATE_DATA capability
> indicates  whether HW or SW supports this feature.
> 
> @@ -257,7 +257,7 @@ the ``rte_crypto_op``.
>                  m_data.request_info.cdev_id = cdev_id;
>                  m_data.request_info.queue_pair_id = qp_id;
>                  /* Call set API to store private data information */
> -                rte_cryptodev_sym_session_set_private_data(
> +                rte_cryptodev_sym_session_set_user_data(
>                          op->sym->session,
>                          &m_data,
>                          sizeof(m_data)); diff --git
> a/doc/guides/rel_notes/release_18_08.rst
> b/doc/guides/rel_notes/release_18_08.rst
> index bc0124295..8f84a088c 100644
> --- a/doc/guides/rel_notes/release_18_08.rst
> +++ b/doc/guides/rel_notes/release_18_08.rst
> @@ -60,6 +60,14 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
> 
> +* **Renamed cryptodev experimental APIs.**
> +
> +  Used user_data instead of private_data in following APIs to avoid
> + confusion  with the existing session parameter ``sess_private_data[]`` and
> related APIs.
> +  ``rte_cryptodev_sym_session_set_private_data()`` changed to
> + ``rte_cryptodev_sym_session_set_user_data()``
> +  ``rte_cryptodev_sym_session_get_private_data()`` changed to
> + ``rte_cryptodev_sym_session_get_user_data()``
> 
>  ABI Changes
>  -----------
> diff --git a/lib/librte_cryptodev/rte_cryptodev.c
> b/lib/librte_cryptodev/rte_cryptodev.c
> index 7e5821246..88f4af5f6 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.c
> +++ b/lib/librte_cryptodev/rte_cryptodev.c
> @@ -1123,7 +1123,7 @@ rte_cryptodev_sym_session_create(struct
> rte_mempool *mp)
>  	}
> 
>  	/* Clear device session pointer.
> -	 * Include the flag indicating presence of private data
> +	 * Include the flag indicating presence of user data
>  	 */
>  	memset(sess, 0, (sizeof(void *) * nb_drivers) + sizeof(uint8_t));
> 
> @@ -1236,7 +1236,7 @@ rte_cryptodev_sym_get_header_session_size(void)
>  	/*
>  	 * Header contains pointers to the private data
>  	 * of all registered drivers, and a flag which
> -	 * indicates presence of private data
> +	 * indicates presence of user data
>  	 */
>  	return ((sizeof(void *) * nb_drivers) + sizeof(uint8_t));  } @@ -1277,31
> +1277,31 @@ rte_cryptodev_sym_get_private_session_size(uint8_t dev_id)  }
> 
>  int __rte_experimental
> -rte_cryptodev_sym_session_set_private_data(
> +rte_cryptodev_sym_session_set_user_data(
>  					struct rte_cryptodev_sym_session
> *sess,
>  					void *data,
>  					uint16_t size)
>  {
>  	uint16_t off_set = sizeof(void *) * nb_drivers;
> -	uint8_t *private_data_present = (uint8_t *)sess + off_set;
> +	uint8_t *user_data_present = (uint8_t *)sess + off_set;
> 
>  	if (sess == NULL)
>  		return -EINVAL;
> 
> -	*private_data_present = 1;
> +	*user_data_present = 1;
>  	off_set += sizeof(uint8_t);
>  	rte_memcpy((uint8_t *)sess + off_set, data, size);
>  	return 0;
>  }
> 
>  void * __rte_experimental
> -rte_cryptodev_sym_session_get_private_data(
> +rte_cryptodev_sym_session_get_user_data(
>  					struct rte_cryptodev_sym_session
> *sess)  {
>  	uint16_t off_set = sizeof(void *) * nb_drivers;
> -	uint8_t *private_data_present = (uint8_t *)sess + off_set;
> +	uint8_t *user_data_present = (uint8_t *)sess + off_set;
> 
> -	if (sess == NULL || !*private_data_present)
> +	if (sess == NULL || !*user_data_present)
>  		return NULL;
> 
>  	off_set += sizeof(uint8_t);
> diff --git a/lib/librte_cryptodev/rte_cryptodev.h
> b/lib/librte_cryptodev/rte_cryptodev.h
> index ccc0f73fd..5d4e690c2 100644
> --- a/lib/librte_cryptodev/rte_cryptodev.h
> +++ b/lib/librte_cryptodev/rte_cryptodev.h
> @@ -1041,35 +1041,35 @@ int rte_cryptodev_driver_id_get(const char
> *name);  const char *rte_cryptodev_driver_name_get(uint8_t driver_id);
> 
>  /**
> - * Set private data for a session.
> + * Store user data in a session.
>   *
>   * @param	sess		Session pointer allocated by
>   *				*rte_cryptodev_sym_session_create*.
> - * @param	data		Pointer to the private data.
> - * @param	size		Size of the private data.
> + * @param	data		Pointer to the user data.
> + * @param	size		Size of the user data.
>   *
>   * @return
>   *  - On success, zero.
>   *  - On failure, a negative value.
>   */
>  int __rte_experimental
> -rte_cryptodev_sym_session_set_private_data(
> +rte_cryptodev_sym_session_set_user_data(
>  					struct rte_cryptodev_sym_session
> *sess,
>  					void *data,
>  					uint16_t size);
> 
>  /**
> - * Get private data of a session.
> + * Get user data stored in a session.
>   *
>   * @param	sess		Session pointer allocated by
>   *				*rte_cryptodev_sym_session_create*.
>   *
>   * @return
> - *  - On success return pointer to private data.
> + *  - On success return pointer to user data.
>   *  - On failure returns NULL.
>   */
>  void * __rte_experimental
> -rte_cryptodev_sym_session_get_private_data(
> +rte_cryptodev_sym_session_get_user_data(
>  					struct rte_cryptodev_sym_session
> *sess);
> 
>  #ifdef __cplusplus
> diff --git a/lib/librte_cryptodev/rte_cryptodev_version.map
> b/lib/librte_cryptodev/rte_cryptodev_version.map
> index be8f4c1a7..c0ea9c875 100644
> --- a/lib/librte_cryptodev/rte_cryptodev_version.map
> +++ b/lib/librte_cryptodev/rte_cryptodev_version.map
> @@ -97,6 +97,6 @@ DPDK_18.05 {
>  EXPERIMENTAL {
>          global:
> 
> -	rte_cryptodev_sym_session_get_private_data;
> -	rte_cryptodev_sym_session_set_private_data;
> +	rte_cryptodev_sym_session_get_user_data;
> +	rte_cryptodev_sym_session_set_user_data;
>  };
> diff --git a/lib/librte_eventdev/rte_event_crypto_adapter.c
> b/lib/librte_eventdev/rte_event_crypto_adapter.c
> index ba63a87b7..11b28ca9b 100644
> --- a/lib/librte_eventdev/rte_event_crypto_adapter.c
> +++ b/lib/librte_eventdev/rte_event_crypto_adapter.c
> @@ -342,7 +342,7 @@ eca_enq_to_cryptodev(struct
> rte_event_crypto_adapter *adapter,
>  		if (crypto_op == NULL)
>  			continue;
>  		if (crypto_op->sess_type == RTE_CRYPTO_OP_WITH_SESSION) {
> -			m_data =
> rte_cryptodev_sym_session_get_private_data(
> +			m_data = rte_cryptodev_sym_session_get_user_data(
>  					crypto_op->sym->session);
>  			if (m_data == NULL) {
>  				rte_pktmbuf_free(crypto_op->sym->m_src);
> @@ -512,7 +512,7 @@ eca_ops_enqueue_burst(struct
> rte_event_crypto_adapter *adapter,
>  	for (i = 0; i < num; i++) {
>  		struct rte_event *ev = &events[nb_ev++];
>  		if (ops[i]->sess_type == RTE_CRYPTO_OP_WITH_SESSION) {
> -			m_data =
> rte_cryptodev_sym_session_get_private_data(
> +			m_data = rte_cryptodev_sym_session_get_user_data(
>  					ops[i]->sym->session);
>  		} else if (ops[i]->sess_type == RTE_CRYPTO_OP_SESSIONLESS
> &&
>  				ops[i]->private_data_offset) {
> diff --git a/test/test/test_event_crypto_adapter.c
> b/test/test/test_event_crypto_adapter.c
> index 066b0adef..de258c346 100644
> --- a/test/test/test_event_crypto_adapter.c
> +++ b/test/test/test_event_crypto_adapter.c
> @@ -205,12 +205,12 @@ test_op_forward_mode(uint8_t session_less)
>  		TEST_ASSERT_SUCCESS(ret, "Failed to get adapter
> capabilities\n");
> 
>  		if (cap &
> RTE_EVENT_CRYPTO_ADAPTER_CAP_SESSION_PRIVATE_DATA) {
> -			/* Fill in private date information */
> +			/* Fill in private user data information */
>  			rte_memcpy(&m_data.response_info, &response_info,
>  				sizeof(response_info));
>  			rte_memcpy(&m_data.request_info, &request_info,
>  				sizeof(request_info));
> -			rte_cryptodev_sym_session_set_private_data(sess,
> +			rte_cryptodev_sym_session_set_user_data(sess,
>  						&m_data, sizeof(m_data));
>  		}
> 
> @@ -389,10 +389,10 @@ test_op_new_mode(uint8_t session_less)
>  		TEST_ASSERT_SUCCESS(ret, "Failed to get adapter
> capabilities\n");
> 
>  		if (cap &
> RTE_EVENT_CRYPTO_ADAPTER_CAP_SESSION_PRIVATE_DATA) {
> -			/* Fill in private data information */
> +			/* Fill in private user data information */
>  			rte_memcpy(&m_data.response_info, &response_info,
>  				   sizeof(m_data));
> -			rte_cryptodev_sym_session_set_private_data(sess,
> +			rte_cryptodev_sym_session_set_user_data(sess,
>  						&m_data, sizeof(m_data));
>  		}
>  		rte_cryptodev_sym_session_init(TEST_CDEV_ID, sess,
> --
> 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: add request to map reserved physical memory
  @ 2018-07-09 20:44  0%               ` Scott Branden
  0 siblings, 0 replies; 200+ results
From: Scott Branden @ 2018-07-09 20:44 UTC (permalink / raw)
  To: Burakov, Anatoly, Srinath Mannam; +Cc: Ajit Khaparde, dev



On 18-07-09 09:02 AM, Burakov, Anatoly wrote:
> On 07-Jun-18 1:15 PM, Burakov, Anatoly wrote:
>> On 06-Jun-18 1:18 AM, Scott Branden wrote:
>>> Hi Anatoly,
>>>
>>>
>>> On 18-04-27 09:49 AM, Burakov, Anatoly wrote:
>>>> On 27-Apr-18 5:30 PM, Scott Branden wrote:
>>>>> Hi Anatoly,
>>>>>
>>>>> We'd appreciate your input so we can come to a solution of 
>>>>> supporting the necessary memory allocations?
>>>>>
>>>>
>>>> Hi Scott,
>>>>
>>>> I'm currently starting to work on a prototype that will be at least 
>>>> RFC'd (if not v1'd) during 18.08 timeframe. Basically, the idea is 
>>>> to create/destroy named malloc heaps dynamically, and allow user to 
>>>> request memory from them. You may then mmap() whatever you want and 
>>>> create a malloc heap out of it.
>>>>
>>>> Does that sound reasonable?
>>>>
>>> Is the plan still to have a patch for 18.08?
>>>
>>> Thanks,
>>>   Scott
>>>
>> Hi Scott,
>>
>> The plan is still to submit an RFC during 18.08 timeframe, but since 
>> it will be an ABI break, it will only be integrated in the next 
>> (18.11) release.
>>
> Hi Scott,
>
> You're welcome to offer feedback on the proposal :)
>
> http://patches.dpdk.org/project/dpdk/list/?series=453
Thanks, Srinath is looking into it.

Scott

^ permalink raw reply	[relevance 0%]

Results 9201-9400 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2017-10-05  9:49     [dpdk-dev] [PATCH v1 0/7] Flow API helpers enhancements Adrien Mazarguil
2018-08-03 13:36  3% ` [dpdk-dev] [PATCH v2 0/7] ethdev: add flow API object converter Adrien Mazarguil
2018-08-23 13:48  0%   ` Ferruh Yigit
2018-08-27 15:14  0%     ` Adrien Mazarguil
2018-08-24 10:58  0%   ` Ferruh Yigit
2018-08-27 14:12  0%     ` Adrien Mazarguil
2018-08-31  9:00  3%   ` [dpdk-dev] [PATCH v3 " Adrien Mazarguil
2018-08-31 11:32  0%     ` Nélio Laranjeiro
2018-01-15 19:05     [dpdk-dev] [PATCH] checkpatches.sh: Add checks for ABI symbol addition Neil Horman
2018-06-27 18:01     ` [dpdk-dev] [PATCH v9] " Neil Horman
2018-07-15 23:12  4%   ` Thomas Monjalon
2018-08-14  3:53  4%   ` Rao, Nikhil
2018-08-14 11:04  4%     ` Neil Horman
2018-08-15  6:10  2%       ` Nikhil Rao
2018-08-15 10:48  4%         ` Neil Horman
2018-08-16  6:19  4%           ` Rao, Nikhil
2018-08-16 10:42  4%             ` Neil Horman
2018-03-28  4:51     [dpdk-dev] [PATCH] eal: add request to map reserved physical memory Ajit Khaparde
2018-04-12 14:35     ` Burakov, Anatoly
2018-04-23  9:23       ` Srinath Mannam
2018-04-27 16:30         ` Scott Branden
2018-04-27 16:49           ` Burakov, Anatoly
2018-06-06  0:18             ` Scott Branden
2018-06-07 12:15               ` Burakov, Anatoly
2018-07-09 16:02                 ` Burakov, Anatoly
2018-07-09 20:44  0%               ` Scott Branden
2018-03-29 14:04     [dpdk-dev] [PATCH 0/2] support building ethtool example using meson Bruce Richardson
2018-03-29 14:04     ` [dpdk-dev] [PATCH 1/2] examples/ethtool: add to meson build Bruce Richardson
2018-07-12  7:54  3%   ` Thomas Monjalon
2018-07-12 10:46  0%     ` Bruce Richardson
2018-05-25 12:07     [dpdk-dev] [PATCH] doc: move and update experimental API description Shreyansh Jain
2018-05-25 12:22     ` Luca Boccassi
2018-05-25 15:37       ` Ferruh Yigit
2018-08-09 16:36  0%     ` Thomas Monjalon
2018-05-28  3:31     [dpdk-dev] Stable ABI status of rte_meter_[t|s]rtcm_profile_config Andy Green
2018-08-01 10:47  4% ` Kevin Traynor
2018-08-01 11:32  4%   ` Andy Green
2018-08-01 14:30  4%   ` Dumitrescu, Cristian
2018-08-01 14:36  4%     ` Kevin Traynor
2018-05-31 10:57     [dpdk-dev] [RFC 0/3] Make device mapping more reliable Anatoly Burakov
2018-08-14 10:13  3% ` Burakov, Anatoly
2018-05-31 15:35     [dpdk-dev] [PATCH] eal: move runtime config file to new location Anatoly Burakov
2018-07-13 10:44  9% ` [dpdk-dev] [PATCH v2] " Anatoly Burakov
2018-06-05  7:50     [dpdk-dev] [PATCH] doc: add SPDX and copyright to rel notes Hemant Agrawal
2018-07-27  4:54  4% ` [dpdk-dev] [PATCH v2] " Hemant Agrawal
2018-06-07 12:38     [dpdk-dev] [PATCH 00/22] enable hotplug on multi-process Qi Zhang
2018-07-09  3:36     ` [dpdk-dev] [PATCH v10 00/19] " Qi Zhang
2018-07-09  3:36       ` [dpdk-dev] [PATCH v10 05/19] eal: " Qi Zhang
2018-07-10 14:00         ` Burakov, Anatoly
2018-07-11  1:25           ` Zhang, Qi Z
2018-07-11  2:11             ` Zhang, Qi Z
2018-07-11  8:39  3%           ` Burakov, Anatoly
2018-07-11  3:08  1% ` [dpdk-dev] [PATCH v11 00/19] " Qi Zhang
2018-07-11  3:09  4%   ` [dpdk-dev] [PATCH v11 19/19] doc: update release notes for multi process hotplug Qi Zhang
2018-07-11 13:47  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
2018-07-11 13:48  4%   ` [dpdk-dev] [PATCH v12 19/19] doc: update release notes for multi process hotplug Qi Zhang
2018-07-12  1:14  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
2018-07-12  1:15  4%   ` [dpdk-dev] [PATCH v13 19/19] doc: update release notes for multi process hotplug Qi Zhang
2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v12 00/19] enable hotplug on multi-process Qi Zhang
2018-07-12  1:18  1% ` [dpdk-dev] [PATCH v13 " Qi Zhang
2018-08-10  0:42  1% ` [dpdk-dev] [PATCH v14 0/6] " Qi Zhang
2018-08-10  0:42       ` [dpdk-dev] [PATCH v14 4/6] drivers/net: enable hotplug on secondary process Qi Zhang
2018-08-12 10:59         ` Andrew Rybchenko
2018-08-15  1:14  3%       ` Zhang, Qi Z
2018-08-16  3:04  1% ` [dpdk-dev] [PATCH v15 0/7] enable hotplug on multi-process Qi Zhang
2018-08-16  3:04  4%   ` [dpdk-dev] [PATCH v15 7/7] doc: update release notes for mulit-process hotplug Qi Zhang
2018-09-28  4:23  1% ` [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process Qi Zhang
2018-09-28  4:23  2%   ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
2018-06-19  6:26     [dpdk-dev] [PATCH 0/2] add head/tailroom requirement for crypto PMDs Anoob Joseph
2018-07-04 13:55     ` [dpdk-dev] [PATCH v1 0/3] " Anoob Joseph
2018-07-04 13:55       ` [dpdk-dev] [PATCH v1 1/3] cryptodev: add min headroom and tailroom requirement Anoob Joseph
2018-07-10 10:26  0%     ` De Lara Guarch, Pablo
2018-07-10 10:50  0%       ` Anoob Joseph
2018-07-10 14:42       ` [dpdk-dev] [PATCH v2 0/4] add head/tailroom requirement for crypto PMDs Anoob Joseph
2018-07-10 14:42  5%     ` [dpdk-dev] [PATCH v2 1/4] cryptodev: add min headroom and tailroom requirement Anoob Joseph
2018-06-22 16:37     [dpdk-dev] DPDK 18.05 only works with up to 4 NUMAs systems Kumar, Ravi1
2018-06-25 16:16     ` Burakov, Anatoly
2018-06-28  7:03       ` Kumar, Ravi1
2018-06-28  8:42         ` Burakov, Anatoly
2018-07-14  9:44  0%       ` Kumar, Ravi1
2018-06-26  9:12     [dpdk-dev] [PATCH 1/2] eal: remove deprecated function returning mbuf pool ops name Olivier Matz
2018-06-26  9:56     ` [dpdk-dev] [PATCH v2 " Olivier Matz
2018-07-26 21:42  3%   ` Thomas Monjalon
2018-08-05 21:45  0%     ` Thomas Monjalon
2018-08-07 21:34  7%   ` [dpdk-dev] [PATCH v3 " Olivier Matz
2018-06-28 22:45     [dpdk-dev] [PATCH 00/10] kni: Interface detach and link status fixes Dan Gora
2018-06-29  1:54     ` Dan Gora
2018-06-29  1:55       ` [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface Dan Gora
2018-08-29 15:54         ` Stephen Hemminger
2018-08-29 21:02           ` Dan Gora
2018-08-29 22:00             ` Stephen Hemminger
2018-08-29 22:12               ` Dan Gora
2018-08-29 22:41                 ` Dan Gora
2018-08-29 23:10                   ` Stephen Hemminger
2018-08-30  9:49                     ` Igor Ryzhov
2018-08-30 21:41  3%                   ` Dan Gora
2018-08-30 22:09  3%                     ` Stephen Hemminger
2018-08-30 22:11  0%                       ` Dan Gora
2018-09-04  0:47  0%                         ` Dan Gora
2018-09-05 12:57  0%                           ` Stephen Hemminger
2018-09-11 21:45                                 ` Dan Gora
2018-09-11 21:52                                   ` Stephen Hemminger
2018-09-11 22:07                                     ` Dan Gora
2018-09-11 23:14  3%                                   ` Stephen Hemminger
2018-09-12  4:02  0%                                     ` Jason Wang
2018-07-06  6:42     [dpdk-dev] [PATCH 1/4] eventdev: add eth Tx adapter APIs Nikhil Rao
2018-07-10 12:17  3% ` Jerin Jacob
2018-07-16  8:34  0%   ` Rao, Nikhil
2018-07-06 13:17     [dpdk-dev] [RFC 00/11] Support externally allocated memory in DPDK Anatoly Burakov
2018-07-13 17:10  2% ` Burakov, Anatoly
2018-07-13 17:56  0%   ` Wiles, Keith
2018-07-19 10:58  0%     ` László Vadkerti
2018-07-26 13:48  0%       ` Burakov, Anatoly
2018-07-06 13:39     [dpdk-dev] [PATCH] cryptodev: rename experimental private data APIs Fiona Trahe
2018-07-10  6:29  0% ` Gujjar, Abhinandan S
2018-07-11 10:59     [dpdk-dev] [PATCH 1/2] mempool: remove deprecated functions Andrew Rybchenko
2018-07-26 21:34  3% ` Thomas Monjalon
2018-07-27 13:45  0%   ` Andrew Rybchenko
2018-07-27 14:38  0%     ` Thomas Monjalon
2018-07-13 10:27     [dpdk-dev] [PATCH v2 0/9] Support running DPDK without hugetlbfs mountpoint Anatoly Burakov
2018-06-01 17:15     ` [dpdk-dev] [PATCH " Anatoly Burakov
2018-07-13 10:27  5%   ` [dpdk-dev] [PATCH v2 8/9] doc: add deprecation notice for EAL command line options Anatoly Burakov
2018-07-18 21:06  3% [dpdk-dev] [PATCH] devtools: fix symbol check for filename with space Thomas Monjalon
2018-07-18 21:26  3% ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2018-07-19 11:14  0%   ` Neil Horman
2018-07-19 12:09  0%     ` Thomas Monjalon
2018-07-19 15:37  0%       ` Neil Horman
2018-07-20  9:37  0%         ` Thomas Monjalon
2018-07-20 11:41  3% [dpdk-dev] [PATCH] devtools: fix checkpatch " Thomas Monjalon
2018-07-20 18:25  0% ` Neil Horman
2018-07-20 20:56  0%   ` Thomas Monjalon
2018-07-27 13:46  2% [dpdk-dev] [PATCH v2 1/2] mempool: remove deprecated functions Andrew Rybchenko
2018-07-31 15:14  3% [dpdk-dev] [PATCH] devtools: check_symbol_change requires bash Stephen Hemminger
2018-08-01  5:22  3% [dpdk-dev] [PATCH] devtools: trap SIGINT is not recognizable to dash Gavin Hu
2018-08-01 10:40  0% ` [dpdk-dev] [dpdk-stable] " Mcnamara, John
2018-08-01 13:09  0%   ` Varghese, Vipin
2018-08-01 14:37  0%   ` Thomas Monjalon
2018-08-03 22:17  0% ` [dpdk-dev] " Stephen Hemminger
2018-08-04  6:42  0%   ` Gavin Hu
2018-08-01 12:07 13% [dpdk-dev] [PATCH] doc: add deprecation notice on external memory support Anatoly Burakov
2018-08-01 12:20  0% ` Wiles, Keith
2018-08-02  2:37  0% ` Wang, Zhihong
2018-08-02  3:38  0% ` Jerin Jacob
2018-08-02  5:58  0% ` Yongseok Koh
2018-08-02  7:56  0% ` Maxime Coquelin
2018-08-02  9:25  0% ` Shreyansh Jain
2018-08-05 23:41  0%   ` Thomas Monjalon
2018-08-01 16:06     [dpdk-dev] [PATCH] devtools: check_symbol_change requires bash Thomas Monjalon
2018-08-05  9:38  3% ` [dpdk-dev] [PATCH] devtools: fix symbol check for dash Thomas Monjalon
2018-08-09 12:14  0%   ` Ferruh Yigit
2018-08-09 15:14  0%     ` Neil Horman
2018-08-09 16:13  0%       ` Thomas Monjalon
2018-08-02 14:25     [dpdk-dev] [PATCH] kni: dynamically allocate memory for each KNI Igor Ryzhov
2018-08-27 17:06  4% ` Ferruh Yigit
2018-08-29  9:52  0%   ` Igor Ryzhov
2018-08-30 10:55  0%     ` Ferruh Yigit
2018-08-09 10:23     [dpdk-dev] [PATCH] kni: fix kni rx fifo producer synchronization Kiran Kumar
2018-08-16  9:55     ` [dpdk-dev] [PATCH v2] kni: fix kni Rx " Kiran Kumar
2018-08-27 14:07       ` Ferruh Yigit
2018-08-27 15:40         ` Gavin Hu
2018-08-28 10:43           ` Kokkilagadda, Kiran
2018-08-28 19:30             ` Gavin Hu
2018-08-29  4:59               ` Honnappa Nagarahalli
2018-08-29  5:49                 ` Kokkilagadda, Kiran
2018-08-29  7:34                   ` Ola Liljedahl
2018-08-29  8:28  4%                 ` Jerin Jacob
2018-08-29  8:47  3%                   ` Ola Liljedahl
2018-08-29  8:57  0%                     ` Jerin Jacob
2018-09-13 17:40  0%                       ` Honnappa Nagarahalli
2018-09-13 17:51  0%                         ` Jerin Jacob
2018-09-13 23:45  0%                           ` Honnappa Nagarahalli
2018-09-14  2:45  0%                             ` Jerin Jacob
2018-09-18 15:53  0%                               ` Ferruh Yigit
2018-09-19  5:37  0%                                 ` Honnappa Nagarahalli
2018-08-09 12:08 10% [dpdk-dev] [PATCH v1] doc: update release notes for 18.08 John McNamara
2018-08-11 22:11  6% [dpdk-dev] [PATCH] version: 18.11-rc0 Thomas Monjalon
2018-08-13  7:53     [dpdk-dev] [RFC] ethdev: add tail drop API for traffic management Rosen Xu
2018-08-13 19:23  3% ` Stephen Hemminger
2018-08-14  0:19     [dpdk-dev] [PATCH] ethdev: fix rte_eth_dev_owner_unset Stephen Hemminger
2018-08-16 22:44     ` [dpdk-dev] [PATCH 0/2] ethdev: minor ownership changes Stephen Hemminger
2018-08-16 22:44       ` [dpdk-dev] [PATCH 2/2] ethdev: make rte_eth_is_valid_owner_id return bool Stephen Hemminger
2018-08-21 10:20         ` Matan Azrad
2018-08-21 15:06  3%       ` Stephen Hemminger
2018-08-21 15:48  0%         ` Matan Azrad
2018-08-21 18:31  0%           ` Stephen Hemminger
2018-08-26  7:49  0%             ` Matan Azrad
2018-08-16 18:18     [dpdk-dev] 17.11.4 patches review and test Yongseok Koh
2018-08-21 10:07     ` Alejandro Lucero
2018-08-23  0:19  3%   ` Yongseok Koh
2018-08-23  1:23  0%     ` Yongseok Koh
2018-08-23 11:29  3%     ` Thomas Monjalon
2018-08-23 16:18  0%     ` Yongseok Koh
2018-08-24  8:51  0%       ` Alejandro Lucero
2018-08-24 14:31  0%         ` Yongseok Koh
2018-08-24 15:00  0%           ` Alejandro Lucero
2018-08-24 15:10  0%             ` Yongseok Koh
2018-08-17 10:51     [dpdk-dev] [PATCH v1 0/5] Enable hotplug in vfio Jeff Guo
2018-08-17 10:51     ` [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device Jeff Guo
2018-09-26 12:22  3%   ` Burakov, Anatoly
2018-08-22  7:25  2% [dpdk-dev] 18.05.1 patches review and test Christian Ehrhardt
2018-08-27  9:29  0% ` Christian Ehrhardt
2018-08-27 10:30  0%   ` [dpdk-dev] [dpdk-stable] " Marco Varlese
2018-08-27 10:31  0%   ` Marco Varlese
2018-09-05 14:41  0%   ` [dpdk-dev] " Christian Ehrhardt
2018-08-24 11:15     [dpdk-dev] support MAC changes when no live changes allowed Alejandro Lucero
2018-08-24 11:15  4% ` [dpdk-dev] [PATCH v2 3/3] doc: comment rte_eth_dev_start change Alejandro Lucero
2018-08-24 14:25     [dpdk-dev] [PATCH v3 0/2] support MAC changes when no live changes allowed Alejandro Lucero
2018-08-24 14:25  4% ` [dpdk-dev] [PATCH v3 1/2] ethdev: fix MAC changes when live change not supported Alejandro Lucero
2018-08-24 14:51 11% [dpdk-dev] [PATCH] ethdev: deprecate DEFERRED device state Ferruh Yigit
2018-08-27 15:00  0% ` Andrew Rybchenko
2018-08-24 16:47     [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority Konstantin Ananyev
2018-09-16  9:56     ` Thomas Monjalon
2018-09-25 12:22  3%   ` Luca Boccassi
2018-09-25 12:57  3%     ` Thomas Monjalon
2018-09-25 14:34  0%     ` Ananyev, Konstantin
2018-08-24 17:48  1% [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session Konstantin Ananyev
2018-08-27 12:24  3% [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries Anatoly Burakov
2018-09-19  8:56  3% ` Thomas Monjalon
2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
2018-08-31 16:51     [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries Qiaobin Fu
2018-09-02 22:05  2% ` Honnappa Nagarahalli
2018-09-04 19:36  4%   ` Michel Machado
2018-09-05 22:13  4%     ` Honnappa Nagarahalli
2018-09-06 14:28  3%       ` Michel Machado
2018-09-12 20:37  2%         ` Honnappa Nagarahalli
2018-09-20 19:50  0%           ` Michel Machado
2018-09-04 18:55     ` Wang, Yipeng1
2018-09-04 19:07       ` Michel Machado
2018-09-04 19:51         ` Wang, Yipeng1
2018-09-04 20:26           ` Michel Machado
2018-09-04 20:57             ` Wang, Yipeng1
2018-09-05 17:52               ` Michel Machado
2018-09-05 20:27  4%             ` Wang, Yipeng1
2018-09-06 13:34  4%               ` Michel Machado
2018-09-04 10:12     [dpdk-dev] [PATCH v2] ethdev: make default behavior CRC strip on Rx Ferruh Yigit
2018-09-24 17:31  4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
2018-09-24 17:12  0%   ` David Marchand
2018-09-05 12:21     [dpdk-dev] [PATCH 1/2] eventdev: fix port id argument in Rx adapter caps API Nikhil Rao
2018-09-25  8:49  4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
2018-09-25  9:15  0%   ` Jerin Jacob
2018-09-25  9:50  0%     ` Thomas Monjalon
2018-09-25  9:56  0%       ` Jerin Jacob
2018-09-25  9:49  4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
2018-09-05 14:44  1% [dpdk-dev] [dpdk-announce] DPDK 18.05.1 released Christian Ehrhardt
2018-09-05 16:41     [dpdk-dev] [RFC] ethdev: add min/max MTU to device info Stephen Hemminger
2018-09-06  6:29  3% ` Andrew Rybchenko
2018-09-06 10:52  3%   ` Stephen Hemminger
2018-09-06 17:12     [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12     ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
2018-09-28  1:00  3%   ` Wang, Yipeng1
2018-09-28  8:26  4%     ` Bruce Richardson
2018-09-28  8:55  4%       ` Van Haaren, Harry
2018-09-10  5:18  3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
2018-09-10  8:06  0% ` Andrew Rybchenko
2018-09-16  9:39  0%   ` Thomas Monjalon
2018-09-17  7:07  0%     ` Olivier Matz
2018-09-17 12:45  8% ` [dpdk-dev] [PATCH v2] " David Marchand
2018-09-19  8:34  0%   ` Thomas Monjalon
2018-09-10 20:04     [dpdk-dev] [PATCH 00/15] rename PMDs map files to match library name and add Meson files Luca Boccassi
2018-09-10 20:04     ` [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name Luca Boccassi
2018-09-11 13:06       ` Bruce Richardson
2018-09-11 13:09         ` Luca Boccassi
2018-09-11 13:30           ` Bruce Richardson
2018-09-11 13:38  3%         ` Luca Boccassi
2018-09-11 13:32  4%       ` Bruce Richardson
2018-09-11 13:41  4%         ` Luca Boccassi
2018-09-11 14:06  0%           ` Bruce Richardson
2018-09-11 16:05  0%             ` Luca Boccassi
2018-09-17 10:36     [dpdk-dev] [PATCH 00/11] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-09-26 18:04  2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04  2%   ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-09-21 16:13     [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-20 11:36     ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-09-19 13:56       ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-09-04 13:11         ` [dpdk-dev] [PATCH 00/16] " Anatoly Burakov
2018-09-19 13:56 16%       ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20  9:30  0%         ` Andrew Rybchenko
2018-09-20  9:54  0%           ` Burakov, Anatoly
2018-09-19 13:56  4%       ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-20 11:36 16%     ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20 11:36  4%     ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-21 16:13 16%   ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-21 16:13  4%   ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:21  2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40  2%   ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-09-27 10:40 16%   ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03  0%     ` Shreyansh Jain
2018-09-27 11:08  0%       ` Burakov, Anatoly
2018-09-27 11:12  0%         ` Shreyansh Jain
2018-09-27 11:29  0%           ` Burakov, Anatoly
2018-09-29  0:09  0%     ` Yongseok Koh
2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-27 13:14  0%     ` Alejandro Lucero
2018-09-27 13:21  0%       ` Burakov, Anatoly
2018-09-27 13:42  0%         ` Alejandro Lucero
2018-09-27 14:04  0%           ` Burakov, Anatoly
2018-09-27 10:41  9%   ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41  4%   ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:22  9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22  4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
2018-09-25 15:25  3% [dpdk-dev] [PATCH v1] doc: remove unused release note file John McNamara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).