DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <arybchenko@solarflare.com>
To: <xiangxia.m.yue@gmail.com>, <olivier.matz@6wind.com>,
	<gage.eads@intel.com>, <artem.andreev@oktetlabs.ru>,
	<jerinj@marvell.com>, <ndabilpuram@marvell.com>,
	<vattunuru@marvell.com>, <hemant.agrawal@nxp.com>,
	 <david.marchand@redhat.com>, <anatoly.burakov@intel.com>,
	<bruce.richardson@intel.com>
Cc: <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH dpdk-dev v3 2/2] mempool: use shared memzone for rte_mempool_ops
Date: Thu, 23 Apr 2020 16:38:44 +0300	[thread overview]
Message-ID: <3b1cb032-a1c2-6dee-af72-8ed690a9ad8c@solarflare.com> (raw)
In-Reply-To: <1586787714-13890-2-git-send-email-xiangxia.m.yue@gmail.com>

On 4/13/20 5:21 PM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> The order of mempool initiation affects mempool index in the
> rte_mempool_ops_table. For example, when building APPs with:
> 
> $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> 
> The "bucket" mempool will be registered firstly, and its index
> in table is 0 while the index of "ring" mempool is 1. DPDK
> uses the mk/rte.app.mk to build APPs, and others, for example,
> Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> The mempool lib linked in dpdk and Open vSwitch is different.
> 
> The mempool can be used between primary and secondary process,
> such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> ring which index in table is 0, but the index of "bucket" ring
> is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> mempool ops and malloc memory from mempool. The crash will occur:
> 
>     bucket_dequeue (access null and crash)
>     rte_mempool_get_ops (should get "ring_mp_mc",
>                          but get "bucket" mempool)
>     rte_mempool_ops_dequeue_bulk
>     ...
>     rte_pktmbuf_alloc
>     rte_pktmbuf_copy
>     pdump_copy
>     pdump_rx
>     rte_eth_rx_burst
> 
> To avoid the crash, there are some solution:
> * constructor priority: Different mempool uses different
>   priority in RTE_INIT, but it's not easy to maintain.
> 
> * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
>   be same as libdpdk.a/libdpdk.so, but when adding a new mempool
>   driver in future, we must make sure the order.
> 
> * register mempool orderly: Sort the mempool when registering,
>   so the lib linked will not affect the index in mempool table.
>   but the number of mempool libraries may be different.
> 
> * shared memzone: The primary process allocates a struct in
>   shared memory named memzone, When we register a mempool ops,
>   we first get a name and id from the shared struct: with the lock held,
>   lookup for the registered name and return its index, else
>   get the last id and copy the name in the struct.
> 
> Previous discussion: https://mails.dpdk.org/archives/dev/2020-March/159354.html
> 
> Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> Suggested-by: Jerin Jacob <jerinj@marvell.com>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> v2:
> * fix checkpatch warning
> ---
>  lib/librte_mempool/rte_mempool.h     | 28 +++++++++++-
>  lib/librte_mempool/rte_mempool_ops.c | 89 ++++++++++++++++++++++++++++--------
>  2 files changed, 96 insertions(+), 21 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index c90cf31467b2..2709b9e1d51b 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -50,6 +50,7 @@
>  #include <rte_ring.h>
>  #include <rte_memcpy.h>
>  #include <rte_common.h>
> +#include <rte_init.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -678,7 +679,6 @@ struct rte_mempool_ops {
>   */
>  struct rte_mempool_ops_table {
>  	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
> -	uint32_t num_ops;      /**< Number of used ops structs in the table. */
>  	/**
>  	 * Storage for all possible ops structs.
>  	 */
> @@ -910,6 +910,30 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
>   */
>  int rte_mempool_register_ops(const struct rte_mempool_ops *ops);
>  
> +struct rte_mempool_shared_ops {
> +	size_t num_mempool_ops;

Is there any specific reason to change type from uint32_t used
above to size_t? I think that uint32_t is better here since
it is just a number, not a size of memory or related value.

> +	struct {
> +		char name[RTE_MEMPOOL_OPS_NAMESIZE];
> +	} mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> +
> +	rte_spinlock_t mempool;
> +};
> +
> +static inline int
> +mempool_ops_register_cb(const void *arg)
> +{
> +	const struct rte_mempool_ops *h = (const struct rte_mempool_ops *)arg;
> +
> +	return rte_mempool_register_ops(h);
> +}
> +
> +static inline void
> +mempool_ops_register(const struct rte_mempool_ops *ops)
> +{
> +	rte_init_register(mempool_ops_register_cb, (const void *)ops,
> +			  RTE_INIT_PRE);
> +}
> +
>  /**
>   * Macro to statically register the ops of a mempool handler.
>   * Note that the rte_mempool_register_ops fails silently here when
> @@ -918,7 +942,7 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
>  #define MEMPOOL_REGISTER_OPS(ops)				\
>  	RTE_INIT(mp_hdlr_init_##ops)				\
>  	{							\
> -		rte_mempool_register_ops(&ops);			\
> +		mempool_ops_register(&ops);			\
>  	}
>  
>  /**
> diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
> index 22c5251eb068..b10fda662db6 100644
> --- a/lib/librte_mempool/rte_mempool_ops.c
> +++ b/lib/librte_mempool/rte_mempool_ops.c
> @@ -14,43 +14,95 @@
>  /* indirect jump table to support external memory pools. */
>  struct rte_mempool_ops_table rte_mempool_ops_table = {
>  	.sl =  RTE_SPINLOCK_INITIALIZER,
> -	.num_ops = 0
>  };
>  
> -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> -int
> -rte_mempool_register_ops(const struct rte_mempool_ops *h)
> +static int
> +rte_mempool_register_shared_ops(const char *name)
>  {
> -	struct rte_mempool_ops *ops;
> -	int16_t ops_index;
> +	static bool mempool_shared_ops_inited;
> +	struct rte_mempool_shared_ops *shared_ops;
> +	const struct rte_memzone *mz;
> +	uint32_t ops_index = 0;
> +

I think we should sanity check 'name' here to be not
empty string (see review notes below).

> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
> +	    !mempool_shared_ops_inited) {
> +
> +		mz = rte_memzone_reserve("mempool_ops_shared",
> +					 sizeof(*shared_ops), 0, 0);
> +		if (mz == NULL)
> +			return -ENOMEM;
> +
> +		shared_ops = mz->addr;
> +		shared_ops->num_mempool_ops = 0;
> +		rte_spinlock_init(&shared_ops->mempool);
> +
> +		mempool_shared_ops_inited = true;
> +	} else {
> +		mz = rte_memzone_lookup("mempool_ops_shared");
> +		if (mz == NULL)
> +			return -ENOMEM;
> +
> +		shared_ops = mz->addr;
> +	}
>  
> -	rte_spinlock_lock(&rte_mempool_ops_table.sl);
> +	rte_spinlock_lock(&shared_ops->mempool);
>  
> -	if (rte_mempool_ops_table.num_ops >=
> -			RTE_MEMPOOL_MAX_OPS_IDX) {
> -		rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> +	if (shared_ops->num_mempool_ops >= RTE_MEMPOOL_MAX_OPS_IDX) {
> +		rte_spinlock_unlock(&shared_ops->mempool);
>  		RTE_LOG(ERR, MEMPOOL,
>  			"Maximum number of mempool ops structs exceeded\n");
>  		return -ENOSPC;
>  	}
>  
> +	while (shared_ops->mempool_ops[ops_index].name[0]) {

Please, compare with '\0' as DPDK style guide says.

> +		if (!strcmp(name, shared_ops->mempool_ops[ops_index].name)) {
> +			rte_spinlock_unlock(&shared_ops->mempool);
> +			return ops_index;
> +		}
> +
> +		ops_index++;
> +	}
> +
> +	strlcpy(shared_ops->mempool_ops[ops_index].name, name,
> +		sizeof(shared_ops->mempool_ops[0].name));
> +
> +	shared_ops->num_mempool_ops++;
> +
> +	rte_spinlock_unlock(&shared_ops->mempool);
> +	return ops_index;
> +}
> +
> +/* add a new ops struct in rte_mempool_ops_table, return its index. */
> +int
> +rte_mempool_register_ops(const struct rte_mempool_ops *h)
> +{
> +	struct rte_mempool_ops *ops;
> +	int16_t ops_index;
> +
>  	if (h->alloc == NULL || h->enqueue == NULL ||
> -			h->dequeue == NULL || h->get_count == NULL) {
> -		rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> +	    h->dequeue == NULL || h->get_count == NULL) {

Changing formatting just makes review a bit more harder.

>  		RTE_LOG(ERR, MEMPOOL,
>  			"Missing callback while registering mempool ops\n");
> +		rte_errno = EINVAL;

Why is it done in the patch? For me it looks like logically
different change if it is really required (properly motivated).

>  		return -EINVAL;
>  	}
>  
>  	if (strlen(h->name) >= sizeof(ops->name) - 1) {
> -		rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> -		RTE_LOG(DEBUG, EAL, "%s(): mempool_ops <%s>: name too long\n",
> -				__func__, h->name);
> +		RTE_LOG(ERR, MEMPOOL,
> +			"The registering  mempool_ops <%s>: name too long\n",
> +			h->name);

Why do you change from DEBUG to ERR here? It it not
directly related to the purpose of the patch.

>  		rte_errno = EEXIST;
>  		return -EEXIST;
>  	}
>  
> -	ops_index = rte_mempool_ops_table.num_ops++;
> +	ops_index = rte_mempool_register_shared_ops(h->name);
> +	if (ops_index < 0) {
> +		rte_errno = -ops_index;
> +		return ops_index;
> +	}
> +
> +	rte_spinlock_lock(&rte_mempool_ops_table.sl);
> +
>  	ops = &rte_mempool_ops_table.ops[ops_index];
>  	strlcpy(ops->name, h->name, sizeof(ops->name));
>  	ops->alloc = h->alloc;
> @@ -165,9 +217,8 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
>  	if (mp->flags & MEMPOOL_F_POOL_CREATED)
>  		return -EEXIST;
>  
> -	for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
> -		if (!strcmp(name,
> -				rte_mempool_ops_table.ops[i].name)) {
> +	for (i = 0; i < RTE_MEMPOOL_MAX_OPS_IDX; i++) {
> +		if (!strcmp(name, rte_mempool_ops_table.ops[i].name)) {

Since rte_mempool_ops_table is filled in which zeros,
name string is empty by default. So, request with empty name
will match the first unused entry. I guess it is not what we
want here. I think we should handle empty string before the
loop and return -EINVAL.

>  			ops = &rte_mempool_ops_table.ops[i];
>  			break;
>  		}
> 


  parent reply	other threads:[~2020-04-23 13:39 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-02  1:57 [dpdk-dev] [PATCH] mempool: sort the rte_mempool_ops by name xiangxia.m.yue
2020-03-02 13:45 ` Jerin Jacob
2020-03-04 13:17   ` Tonghao Zhang
2020-03-04 13:33     ` Jerin Jacob
2020-03-04 14:46       ` Tonghao Zhang
2020-03-04 15:14         ` Jerin Jacob
2020-03-04 15:25           ` Tonghao Zhang
2020-03-05  8:20 ` [dpdk-dev] [PATCH dpdk-dev v2] " xiangxia.m.yue
2020-03-05 16:57   ` Olivier Matz
2020-03-06 13:36 ` [dpdk-dev] [PATCH dpdk-dev v3] " xiangxia.m.yue
2020-03-06 13:37   ` Jerin Jacob
2020-03-07 12:51     ` Andrew Rybchenko
2020-03-07 12:54       ` Andrew Rybchenko
2020-03-09  3:01         ` Tonghao Zhang
2020-03-09  8:27           ` Olivier Matz
2020-03-09  8:55             ` Tonghao Zhang
2020-03-09  9:05               ` Olivier Matz
2020-03-09 13:15               ` David Marchand
2020-03-16  7:43                 ` Tonghao Zhang
2020-03-16  7:55                   ` Olivier Matz
2020-03-24  9:35             ` Andrew Rybchenko
2020-03-24 12:41               ` Tonghao Zhang
2020-04-09 10:52 ` [dpdk-dev] [PATCH dpdk-dev 1/2] eal: introduce last-init queue for libraries initialization xiangxia.m.yue
2020-04-09 10:53   ` [dpdk-dev] [PATCH dpdk-dev 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-09 11:31   ` [dpdk-dev] [PATCH dpdk-dev 1/2] eal: introduce last-init queue for libraries initialization Jerin Jacob
2020-04-09 15:04     ` Tonghao Zhang
2020-04-09 15:02 ` [dpdk-dev] [PATCH dpdk-dev v2 1/2] eal: introduce rte-init " xiangxia.m.yue
2020-04-09 15:02   ` [dpdk-dev] [PATCH dpdk-dev v2 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-10  6:18   ` [dpdk-dev] [PATCH dpdk-dev v2 1/2] eal: introduce rte-init queue for libraries initialization Jerin Jacob
2020-04-10 13:11     ` Jerin Jacob
2020-04-12  3:20       ` Tonghao Zhang
2020-04-12  3:32         ` Tonghao Zhang
2020-04-13 11:32           ` Jerin Jacob
2020-04-13 14:21 ` [dpdk-dev] [PATCH dpdk-dev v3 " xiangxia.m.yue
2020-04-13 14:21   ` [dpdk-dev] [PATCH dpdk-dev v3 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-16 22:27     ` Thomas Monjalon
2020-04-27  8:03       ` Tonghao Zhang
2020-04-27 11:40         ` Thomas Monjalon
2020-04-27 12:51           ` Tonghao Zhang
2020-04-28 13:22             ` Tonghao Zhang
2020-05-04  7:42               ` Olivier Matz
2021-03-25 14:24                 ` David Marchand
2020-04-23 13:38     ` Andrew Rybchenko [this message]
2020-04-27  5:23       ` Tonghao Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b1cb032-a1c2-6dee-af72-8ed690a9ad8c@solarflare.com \
    --to=arybchenko@solarflare.com \
    --cc=anatoly.burakov@intel.com \
    --cc=artem.andreev@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=gage.eads@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerinj@marvell.com \
    --cc=ndabilpuram@marvell.com \
    --cc=olivier.matz@6wind.com \
    --cc=vattunuru@marvell.com \
    --cc=xiangxia.m.yue@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).