From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
To: Andrew Rybchenko <arybchenko@solarflare.com>
Cc: Olivier Matz <olivier.matz@6wind.com>,
Gage Eads <gage.eads@intel.com>,
"Artem V. Andreev" <artem.andreev@oktetlabs.ru>,
Jerin Jacob <jerinj@marvell.com>,
Nithin Dabilpuram <ndabilpuram@marvell.com>,
Vamsi Attunuru <vattunuru@marvell.com>,
Hemant Agrawal <hemant.agrawal@nxp.com>,
David Marchand <david.marchand@redhat.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>,
Bruce Richardson <bruce.richardson@intel.com>,
dpdk-dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH dpdk-dev v3 2/2] mempool: use shared memzone for rte_mempool_ops
Date: Mon, 27 Apr 2020 13:23:17 +0800 [thread overview]
Message-ID: <CAMDZJNW-YCgPk5N5uaULX1yfaCWb5qMkVV3L5raKjAhU8NNvOw@mail.gmail.com> (raw)
In-Reply-To: <3b1cb032-a1c2-6dee-af72-8ed690a9ad8c@solarflare.com>
On Thu, Apr 23, 2020 at 9:38 PM Andrew Rybchenko
<arybchenko@solarflare.com> wrote:
>
> On 4/13/20 5:21 PM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > The order of mempool initiation affects mempool index in the
> > rte_mempool_ops_table. For example, when building APPs with:
> >
> > $ gcc -lrte_mempool_bucket -lrte_mempool_ring ...
> >
> > The "bucket" mempool will be registered firstly, and its index
> > in table is 0 while the index of "ring" mempool is 1. DPDK
> > uses the mk/rte.app.mk to build APPs, and others, for example,
> > Open vSwitch, use the libdpdk.a or libdpdk.so to build it.
> > The mempool lib linked in dpdk and Open vSwitch is different.
> >
> > The mempool can be used between primary and secondary process,
> > such as dpdk-pdump and pdump-pmd/Open vSwitch(pdump enabled).
> > There will be a crash because dpdk-pdump creates the "ring_mp_mc"
> > ring which index in table is 0, but the index of "bucket" ring
> > is 0 in Open vSwitch. If Open vSwitch use the index 0 to get
> > mempool ops and malloc memory from mempool. The crash will occur:
> >
> > bucket_dequeue (access null and crash)
> > rte_mempool_get_ops (should get "ring_mp_mc",
> > but get "bucket" mempool)
> > rte_mempool_ops_dequeue_bulk
> > ...
> > rte_pktmbuf_alloc
> > rte_pktmbuf_copy
> > pdump_copy
> > pdump_rx
> > rte_eth_rx_burst
> >
> > To avoid the crash, there are some solution:
> > * constructor priority: Different mempool uses different
> > priority in RTE_INIT, but it's not easy to maintain.
> >
> > * change mk/rte.app.mk: Change the order in mk/rte.app.mk to
> > be same as libdpdk.a/libdpdk.so, but when adding a new mempool
> > driver in future, we must make sure the order.
> >
> > * register mempool orderly: Sort the mempool when registering,
> > so the lib linked will not affect the index in mempool table.
> > but the number of mempool libraries may be different.
> >
> > * shared memzone: The primary process allocates a struct in
> > shared memory named memzone, When we register a mempool ops,
> > we first get a name and id from the shared struct: with the lock held,
> > lookup for the registered name and return its index, else
> > get the last id and copy the name in the struct.
> >
> > Previous discussion: https://mails.dpdk.org/archives/dev/2020-March/159354.html
> >
> > Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> > Suggested-by: Jerin Jacob <jerinj@marvell.com>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > ---
> > v2:
> > * fix checkpatch warning
> > ---
> > lib/librte_mempool/rte_mempool.h | 28 +++++++++++-
> > lib/librte_mempool/rte_mempool_ops.c | 89 ++++++++++++++++++++++++++++--------
> > 2 files changed, 96 insertions(+), 21 deletions(-)
> >
> > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> > index c90cf31467b2..2709b9e1d51b 100644
> > --- a/lib/librte_mempool/rte_mempool.h
> > +++ b/lib/librte_mempool/rte_mempool.h
> > @@ -50,6 +50,7 @@
> > #include <rte_ring.h>
> > #include <rte_memcpy.h>
> > #include <rte_common.h>
> > +#include <rte_init.h>
> >
> > #ifdef __cplusplus
> > extern "C" {
> > @@ -678,7 +679,6 @@ struct rte_mempool_ops {
> > */
> > struct rte_mempool_ops_table {
> > rte_spinlock_t sl; /**< Spinlock for add/delete. */
> > - uint32_t num_ops; /**< Number of used ops structs in the table. */
> > /**
> > * Storage for all possible ops structs.
> > */
> > @@ -910,6 +910,30 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> > */
> > int rte_mempool_register_ops(const struct rte_mempool_ops *ops);
> >
> > +struct rte_mempool_shared_ops {
> > + size_t num_mempool_ops;
>
> Is there any specific reason to change type from uint32_t used
> above to size_t? I think that uint32_t is better here since
> it is just a number, not a size of memory or related value.
Thanks for you review. busy to commit other patch, so that be delayed.
I will change that in next version.
> > + struct {
> > + char name[RTE_MEMPOOL_OPS_NAMESIZE];
> > + } mempool_ops[RTE_MEMPOOL_MAX_OPS_IDX];
> > +
> > + rte_spinlock_t mempool;
> > +};
> > +
> > +static inline int
> > +mempool_ops_register_cb(const void *arg)
> > +{
> > + const struct rte_mempool_ops *h = (const struct rte_mempool_ops *)arg;
> > +
> > + return rte_mempool_register_ops(h);
> > +}
> > +
> > +static inline void
> > +mempool_ops_register(const struct rte_mempool_ops *ops)
> > +{
> > + rte_init_register(mempool_ops_register_cb, (const void *)ops,
> > + RTE_INIT_PRE);
> > +}
> > +
> > /**
> > * Macro to statically register the ops of a mempool handler.
> > * Note that the rte_mempool_register_ops fails silently here when
> > @@ -918,7 +942,7 @@ int rte_mempool_ops_get_info(const struct rte_mempool *mp,
> > #define MEMPOOL_REGISTER_OPS(ops) \
> > RTE_INIT(mp_hdlr_init_##ops) \
> > { \
> > - rte_mempool_register_ops(&ops); \
> > + mempool_ops_register(&ops); \
> > }
> >
> > /**
> > diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
> > index 22c5251eb068..b10fda662db6 100644
> > --- a/lib/librte_mempool/rte_mempool_ops.c
> > +++ b/lib/librte_mempool/rte_mempool_ops.c
> > @@ -14,43 +14,95 @@
> > /* indirect jump table to support external memory pools. */
> > struct rte_mempool_ops_table rte_mempool_ops_table = {
> > .sl = RTE_SPINLOCK_INITIALIZER,
> > - .num_ops = 0
> > };
> >
> > -/* add a new ops struct in rte_mempool_ops_table, return its index. */
> > -int
> > -rte_mempool_register_ops(const struct rte_mempool_ops *h)
> > +static int
> > +rte_mempool_register_shared_ops(const char *name)
> > {
> > - struct rte_mempool_ops *ops;
> > - int16_t ops_index;
> > + static bool mempool_shared_ops_inited;
> > + struct rte_mempool_shared_ops *shared_ops;
> > + const struct rte_memzone *mz;
> > + uint32_t ops_index = 0;
> > +
>
> I think we should sanity check 'name' here to be not
> empty string (see review notes below).
OK
> > + if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
> > + !mempool_shared_ops_inited) {
> > +
> > + mz = rte_memzone_reserve("mempool_ops_shared",
> > + sizeof(*shared_ops), 0, 0);
> > + if (mz == NULL)
> > + return -ENOMEM;
> > +
> > + shared_ops = mz->addr;
> > + shared_ops->num_mempool_ops = 0;
> > + rte_spinlock_init(&shared_ops->mempool);
> > +
> > + mempool_shared_ops_inited = true;
> > + } else {
> > + mz = rte_memzone_lookup("mempool_ops_shared");
> > + if (mz == NULL)
> > + return -ENOMEM;
> > +
> > + shared_ops = mz->addr;
> > + }
> >
> > - rte_spinlock_lock(&rte_mempool_ops_table.sl);
> > + rte_spinlock_lock(&shared_ops->mempool);
> >
> > - if (rte_mempool_ops_table.num_ops >=
> > - RTE_MEMPOOL_MAX_OPS_IDX) {
> > - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> > + if (shared_ops->num_mempool_ops >= RTE_MEMPOOL_MAX_OPS_IDX) {
> > + rte_spinlock_unlock(&shared_ops->mempool);
> > RTE_LOG(ERR, MEMPOOL,
> > "Maximum number of mempool ops structs exceeded\n");
> > return -ENOSPC;
> > }
> >
> > + while (shared_ops->mempool_ops[ops_index].name[0]) {
>
> Please, compare with '\0' as DPDK style guide says.
>
> > + if (!strcmp(name, shared_ops->mempool_ops[ops_index].name)) {
> > + rte_spinlock_unlock(&shared_ops->mempool);
> > + return ops_index;
> > + }
> > +
> > + ops_index++;
> > + }
> > +
> > + strlcpy(shared_ops->mempool_ops[ops_index].name, name,
> > + sizeof(shared_ops->mempool_ops[0].name));
> > +
> > + shared_ops->num_mempool_ops++;
> > +
> > + rte_spinlock_unlock(&shared_ops->mempool);
> > + return ops_index;
> > +}
> > +
> > +/* add a new ops struct in rte_mempool_ops_table, return its index. */
> > +int
> > +rte_mempool_register_ops(const struct rte_mempool_ops *h)
> > +{
> > + struct rte_mempool_ops *ops;
> > + int16_t ops_index;
> > +
> > if (h->alloc == NULL || h->enqueue == NULL ||
> > - h->dequeue == NULL || h->get_count == NULL) {
> > - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> > + h->dequeue == NULL || h->get_count == NULL) {
>
> Changing formatting just makes review a bit more harder.
>
> > RTE_LOG(ERR, MEMPOOL,
> > "Missing callback while registering mempool ops\n");
> > + rte_errno = EINVAL;
>
> Why is it done in the patch? For me it looks like logically
> different change if it is really required (properly motivated).
I guess that should add the err code to rte_rrno ? but I am fine, not
to do that.
> > return -EINVAL;
> > }
> >
> > if (strlen(h->name) >= sizeof(ops->name) - 1) {
> > - rte_spinlock_unlock(&rte_mempool_ops_table.sl);
> > - RTE_LOG(DEBUG, EAL, "%s(): mempool_ops <%s>: name too long\n",
> > - __func__, h->name);
> > + RTE_LOG(ERR, MEMPOOL,
> > + "The registering mempool_ops <%s>: name too long\n",
> > + h->name);
>
> Why do you change from DEBUG to ERR here? It it not
> directly related to the purpose of the patch.
Yes, because it is an error, so change that type. I change it in
separate patch ?
> > rte_errno = EEXIST;
> > return -EEXIST;
> > }
> >
> > - ops_index = rte_mempool_ops_table.num_ops++;
> > + ops_index = rte_mempool_register_shared_ops(h->name);
> > + if (ops_index < 0) {
> > + rte_errno = -ops_index;
> > + return ops_index;
> > + }
> > +
> > + rte_spinlock_lock(&rte_mempool_ops_table.sl);
> > +
> > ops = &rte_mempool_ops_table.ops[ops_index];
> > strlcpy(ops->name, h->name, sizeof(ops->name));
> > ops->alloc = h->alloc;
> > @@ -165,9 +217,8 @@ struct rte_mempool_ops_table rte_mempool_ops_table = {
> > if (mp->flags & MEMPOOL_F_POOL_CREATED)
> > return -EEXIST;
> >
> > - for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
> > - if (!strcmp(name,
> > - rte_mempool_ops_table.ops[i].name)) {
> > + for (i = 0; i < RTE_MEMPOOL_MAX_OPS_IDX; i++) {
> > + if (!strcmp(name, rte_mempool_ops_table.ops[i].name)) {
>
> Since rte_mempool_ops_table is filled in which zeros,
> name string is empty by default. So, request with empty name
> will match the first unused entry. I guess it is not what we
> want here. I think we should handle empty string before the
> loop and return -EINVAL.
OK, thanks.
> > ops = &rte_mempool_ops_table.ops[i];
> > break;
> > }
> >
>
--
Best regards, Tonghao
prev parent reply other threads:[~2020-04-27 5:23 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-02 1:57 [dpdk-dev] [PATCH] mempool: sort the rte_mempool_ops by name xiangxia.m.yue
2020-03-02 13:45 ` Jerin Jacob
2020-03-04 13:17 ` Tonghao Zhang
2020-03-04 13:33 ` Jerin Jacob
2020-03-04 14:46 ` Tonghao Zhang
2020-03-04 15:14 ` Jerin Jacob
2020-03-04 15:25 ` Tonghao Zhang
2020-03-05 8:20 ` [dpdk-dev] [PATCH dpdk-dev v2] " xiangxia.m.yue
2020-03-05 16:57 ` Olivier Matz
2020-03-06 13:36 ` [dpdk-dev] [PATCH dpdk-dev v3] " xiangxia.m.yue
2020-03-06 13:37 ` Jerin Jacob
2020-03-07 12:51 ` Andrew Rybchenko
2020-03-07 12:54 ` Andrew Rybchenko
2020-03-09 3:01 ` Tonghao Zhang
2020-03-09 8:27 ` Olivier Matz
2020-03-09 8:55 ` Tonghao Zhang
2020-03-09 9:05 ` Olivier Matz
2020-03-09 13:15 ` David Marchand
2020-03-16 7:43 ` Tonghao Zhang
2020-03-16 7:55 ` Olivier Matz
2020-03-24 9:35 ` Andrew Rybchenko
2020-03-24 12:41 ` Tonghao Zhang
2020-04-09 10:52 ` [dpdk-dev] [PATCH dpdk-dev 1/2] eal: introduce last-init queue for libraries initialization xiangxia.m.yue
2020-04-09 10:53 ` [dpdk-dev] [PATCH dpdk-dev 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-09 11:31 ` [dpdk-dev] [PATCH dpdk-dev 1/2] eal: introduce last-init queue for libraries initialization Jerin Jacob
2020-04-09 15:04 ` Tonghao Zhang
2020-04-09 15:02 ` [dpdk-dev] [PATCH dpdk-dev v2 1/2] eal: introduce rte-init " xiangxia.m.yue
2020-04-09 15:02 ` [dpdk-dev] [PATCH dpdk-dev v2 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-10 6:18 ` [dpdk-dev] [PATCH dpdk-dev v2 1/2] eal: introduce rte-init queue for libraries initialization Jerin Jacob
2020-04-10 13:11 ` Jerin Jacob
2020-04-12 3:20 ` Tonghao Zhang
2020-04-12 3:32 ` Tonghao Zhang
2020-04-13 11:32 ` Jerin Jacob
2020-04-13 14:21 ` [dpdk-dev] [PATCH dpdk-dev v3 " xiangxia.m.yue
2020-04-13 14:21 ` [dpdk-dev] [PATCH dpdk-dev v3 2/2] mempool: use shared memzone for rte_mempool_ops xiangxia.m.yue
2020-04-16 22:27 ` Thomas Monjalon
2020-04-27 8:03 ` Tonghao Zhang
2020-04-27 11:40 ` Thomas Monjalon
2020-04-27 12:51 ` Tonghao Zhang
2020-04-28 13:22 ` Tonghao Zhang
2020-05-04 7:42 ` Olivier Matz
2021-03-25 14:24 ` David Marchand
2020-04-23 13:38 ` Andrew Rybchenko
2020-04-27 5:23 ` Tonghao Zhang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMDZJNW-YCgPk5N5uaULX1yfaCWb5qMkVV3L5raKjAhU8NNvOw@mail.gmail.com \
--to=xiangxia.m.yue@gmail.com \
--cc=anatoly.burakov@intel.com \
--cc=artem.andreev@oktetlabs.ru \
--cc=arybchenko@solarflare.com \
--cc=bruce.richardson@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=gage.eads@intel.com \
--cc=hemant.agrawal@nxp.com \
--cc=jerinj@marvell.com \
--cc=ndabilpuram@marvell.com \
--cc=olivier.matz@6wind.com \
--cc=vattunuru@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).