From: "Lilijun (Jerry)" <jerry.lilijun@huawei.com>
To: "Wang, Yipeng1" <yipeng1.wang@intel.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"'dev@dpdk.org'" <dev@dpdk.org>
Cc: wangyunjian <wangyunjian@huawei.com>,
xudingke <xudingke@huawei.com>,
"'stable@dpdk.org'" <stable@dpdk.org>, nd <nd@arm.com>,
nd <nd@arm.com>
Subject: [dpdk-dev] 答复: [PATCH] lib/librte_hash: add rte_hash_del_key_fixed without compact
Date: Wed, 13 May 2020 01:28:27 +0000 [thread overview]
Message-ID: <40280F65B1B0B44E8089ED31C01616EBA49C65F4@dggeml529-mbx.china.huawei.com> (raw)
In-Reply-To: <MW3PR11MB46846D04E1223D7708388898C3BE0@MW3PR11MB4684.namprd11.prod.outlook.com>
> -----邮件原件-----
> 发件人: Wang, Yipeng1 [mailto:yipeng1.wang@intel.com]
> 发送时间: 2020年5月13日 7:41
> 收件人: Lilijun (Jerry) <jerry.lilijun@huawei.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; 'dev@dpdk.org' <dev@dpdk.org>
> 抄送: wangyunjian <wangyunjian@huawei.com>; xudingke
> <xudingke@huawei.com>; 'stable@dpdk.org' <stable@dpdk.org>; nd
> <nd@arm.com>; nd <nd@arm.com>
> 主题: RE: [dpdk-dev] [PATCH] lib/librte_hash: add rte_hash_del_key_fixed
> without compact
>
> > -----Original Message-----
> > From: Lilijun (Jerry) <jerry.lilijun@huawei.com>
> > Sent: Tuesday, May 5, 2020 6:10 PM
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > 'dev@dpdk.org' <dev@dpdk.org>
> > Cc: wangyunjian <wangyunjian@huawei.com>; xudingke
> > <xudingke@huawei.com>; 'stable@dpdk.org' <stable@dpdk.org>; nd
> > <nd@arm.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>; nd
> <nd@arm.com>
> > Subject: 答复: [dpdk-dev] [PATCH] lib/librte_hash: add
> > rte_hash_del_key_fixed without compact
> >
> > > -----邮件原件-----
> > > 发件人: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > > 发送时间: 2020年5月6日 7:18
> > > 收件人: Lilijun (Jerry) <jerry.lilijun@huawei.com>; 'dev@dpdk.org'
> > > <dev@dpdk.org>
> > > 抄送: wangyunjian <wangyunjian@huawei.com>; xudingke
> > > <xudingke@huawei.com>; 'stable@dpdk.org' <stable@dpdk.org>; nd
> > > <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > > yipeng1.wang@intel.com; nd <nd@arm.com>
> > > 主题: RE: [dpdk-dev] [PATCH] lib/librte_hash: add
> > > rte_hash_del_key_fixed without compact
> > >
> > > <snip>
> > >
> > > Adding Yipeng, maintainer for hash library
> > >
> > > >
> > > > Thanks for your reply.
> > > >
> > > > Using rte_hash iterate and delete keys is to free the related
> > > > data's
> > memory.
> > > > There are two reasons why rte_hash_reset() is not properly:
> > > > 1) the reset function just clear all keys, the key's related data are
> leaked.
> > > That is a good point. I think this should be documented in the API.
>
> [Yipeng]
> By leaking, do you mean that you keep data in separate places and the
> pointers to them are missed from table after reset? Can you keep data in an
> array and iterate that array instead?
[Lilijun (Jerry)]
Yes, the data pointers in rte hash table are missed after reset.
The solution using an external array to keep all data can avoid this problem, but it may introduce some extra memories cost.
It's better rte_hash can support iterate when inserting or deleting.
> > >
> > > > 2) In some cases, I don't need delete all keys. Just some
> > > > selected keys and data are deleted and released.
> [Yipeng]
> Could you keep a candidate list of keys you want to delete in another data
> structure so you don’t need to iterate the whole hash table?
>
[Lilijun (Jerry)]
I need iterate the hash table at first and decide which key are candidate, then delete the key/data instantly.
Of course, here I can keep those candidate keys and datas in a temporary list and delete them after the iterate is finished.
This may be a second choice for me although it became more complicated.
> > > I understand the problem you have pointed out and understand how to
> > > reproduce it. But, the use case is not clear to me. Can you please
> > > explain the use case?
> > [Lilijun (Jerry)]
> >
> > As you know, the dpdk rte_hash use a fixed size table to store all
> keys/datas.
> > The memory used by hash table is related with this fixed size.
> > In my case, normally the count of keys is about 100,000 but sometimes
> > the count may burst up to 30,000,000.
> > In order to save memory usage, I create a small hash table with
> > 100,000 size and replace to a bigger one with 30,000,000 size when
> > there are more keys to be stored. Also when the key's count reduced to
> > less than 100,000, I replace the hash table with a small one to save the
> memory.
> [Yipeng]
> Could you tell me more on the use case? Since insertion would also invalidate
> the Iterator, do you insert keys only to new table during resizing?
>
[Lilijun (Jerry)]
Yes, Insert only to new table. Because the resize process need take a write lock and the old table's key insertion are prevented by the lock now.
Do you mean the key insertion may change other key's position by cuckoo hash algorithm and invalidate the iterator?
That maybe a new question I haven't met yet.
> >
> > >
> > > >
> > > > Jerry.
> > > >
> > > > -----邮件原件-----
> > > > 发件人: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> > > > 发送时间: 2020年4月29日 4:46
> > > > 收件人: Lilijun (Jerry) <jerry.lilijun@huawei.com>; 'dev@dpdk.org'
> > > > <dev@dpdk.org>
> > > > 抄送: wangyunjian <wangyunjian@huawei.com>; xudingke
> > > > <xudingke@huawei.com>; 'stable@dpdk.org' <stable@dpdk.org>; nd
> > > > <nd@arm.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>;
> > > nd
> > > > <nd@arm.com>
> > > > 主题: RE: [dpdk-dev] [PATCH] lib/librte_hash: add
> > > > rte_hash_del_key_fixed without compact
> > > >
> > > > <snip>
> > > >
> > > > Hi Jerry,
> > > > Few questions inline.
> > > >
> > > > > Subject: [dpdk-dev] [PATCH] lib/librte_hash: add
> > > > > rte_hash_del_key_fixed without compact
> > > > >
> > > > > The keys idx are stored in rte_hash main bucket key slots and
> > > > > extend bucket key stots.
> > > > > We iterate every no empty Keys in h->buckets and h->buckets_ext
> > > > > from start to last.
> > > > > When deleting keys the function __rte_hash_compact_ll() may move
> > > > > last_bkt's key to previous bucket in order to compact extend bucket
> list.
> > > > > If the previous bucket has been iterated, the moved key may be
> > > > > missed for users.
> > > > > Then those missed keys are leaked and rte_hash table can't be
> cleanup.
> > > > > So we add a new API rte_hash_del_key_fixed() used in iterate
> > > > > loop to avoid this bugs.
> > > > >
> > > > > ---
> > > > > lib/librte_hash/rte_cuckoo_hash.c | 19 ++++++++++++++-----
> > > > > lib/librte_hash/rte_hash.h | 5 +++++
> > > > > lib/librte_hash/rte_hash_version.map | 1 +
> > > > > 3 files changed, 20 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > index b52630b..2da3c1d 100644
> > > > > --- a/lib/librte_hash/rte_cuckoo_hash.c
> > > > > +++ b/lib/librte_hash/rte_cuckoo_hash.c
> > > > > @@ -1523,7 +1523,7 @@ search_and_remove(const struct rte_hash
> > > > > *h, const void *key,
> > > > >
> > > > > static inline int32_t
> > > > > __rte_hash_del_key_with_hash(const struct rte_hash *h, const
> > > > > void
> > > *key,
> > > > > - hash_sig_t sig)
> > > > > + hash_sig_t sig,
> uint8_t
> > > > > compact)
> > > > > {
> > > > > uint32_t prim_bucket_idx, sec_bucket_idx;
> > > > > struct rte_hash_bucket *prim_bkt, *sec_bkt, *prev_bkt,
> > > > > *last_bkt;
> > > > @@
> > > > > -1541,7 +1541,8 @@ __rte_hash_del_key_with_hash(const struct
> > > > > rte_hash *h, const void *key,
> > > > > /* look for key in primary bucket */
> > > > > ret = search_and_remove(h, key, prim_bkt, short_sig, &pos);
> > > > > if (ret != -1) {
> > > > > - __rte_hash_compact_ll(h, prim_bkt, pos);
> > > > > + if (compact)
> > > > > + __rte_hash_compact_ll(h, prim_bkt, pos);
> > > > > last_bkt = prim_bkt->next;
> > > > > prev_bkt = prim_bkt;
> > > > > goto return_bkt;
> > > > > @@ -1553,7 +1554,8 @@ __rte_hash_del_key_with_hash(const
> struct
> > > > > rte_hash *h, const void *key,
> > > > > FOR_EACH_BUCKET(cur_bkt, sec_bkt) {
> > > > > ret = search_and_remove(h, key, cur_bkt, short_sig,
> &pos);
> > > > > if (ret != -1) {
> > > > > - __rte_hash_compact_ll(h, cur_bkt, pos);
> > > > > + if (compact)
> > > > > + __rte_hash_compact_ll(h, cur_bkt,
> pos);
> > > > > last_bkt = sec_bkt->next;
> > > > > prev_bkt = sec_bkt;
> > > > > goto return_bkt;
> > > > > @@ -1607,14 +1609,21 @@ rte_hash_del_key_with_hash(const
> struct
> > > > > rte_hash *h,
> > > > > const void *key, hash_sig_t sig) {
> > > > > RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> > > > > - return __rte_hash_del_key_with_hash(h, key, sig);
> > > > > + return __rte_hash_del_key_with_hash(h, key, sig, 1);
> > > > > }
> > > > >
> > > > > int32_t
> > > > > rte_hash_del_key(const struct rte_hash *h, const void *key) {
> > > > > RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> > > > > - return __rte_hash_del_key_with_hash(h, key,
> rte_hash_hash(h,
> > > key));
> > > > > + return __rte_hash_del_key_with_hash(h, key,
> rte_hash_hash(h,
> > > key),
> > > > > 1);
> > > > > +}
> > > > > +
> > > > > +int32_t
> > > > > +rte_hash_del_key_fixed(const struct rte_hash *h, const void *key)
> {
> > > > > + RETURN_IF_TRUE(((h == NULL) || (key == NULL)), -EINVAL);
> > > > > + return __rte_hash_del_key_with_hash(h, key,
> rte_hash_hash(h,
> > > key),
> > > > > 0);
> > > > > }
> > > > >
> > > > > int
> > > > > diff --git a/lib/librte_hash/rte_hash.h
> > > > > b/lib/librte_hash/rte_hash.h index eceb365..9b71d8a 100644
> > > > > --- a/lib/librte_hash/rte_hash.h
> > > > > +++ b/lib/librte_hash/rte_hash.h
> > > > > @@ -297,6 +297,11 @@ rte_hash_add_key_with_hash(const struct
> > > > rte_hash
> > > > > *h, const void *key, hash_sig_t int32_t rte_hash_del_key(const
> > > > > struct rte_hash *h, const void *key);
> > > > >
> > > > > +
> > > > > +/* for without compact */
> > > > > +int32_t
> > > > > +rte_hash_del_key_fixed(const struct rte_hash *h, const void
> > > > > +*key);
> > > > > +
> > > > > /**
> > > > > * Remove a key from an existing hash table.
> > > > > * This operation is not multi-thread safe diff --git
> > > > > a/lib/librte_hash/rte_hash_version.map
> > > > > b/lib/librte_hash/rte_hash_version.map
> > > > > index 30cc086..1941d17 100644
> > > > > --- a/lib/librte_hash/rte_hash_version.map
> > > > > +++ b/lib/librte_hash/rte_hash_version.map
> > > > > @@ -11,6 +11,7 @@ DPDK_20.0 {
> > > > > rte_hash_count;
> > > > > rte_hash_create;
> > > > > rte_hash_del_key;
> > > > > + rte_hash_del_key_fixed;
> > > > > rte_hash_del_key_with_hash;
> > > > > rte_hash_find_existing;
> > > > > rte_hash_free;
> > > > > --
> > > > > 2.19.1
> > > > >
> > > > > -----邮件原件-----
> > > > > 发件人: Lilijun (Jerry)
> > > > > 发送时间: 2020年4月18日 18:00
> > > > > 收件人: 'dev@dpdk.org' <dev@dpdk.org>; 'stable@dpdk.org'
> > > > > <stable@dpdk.org>
> > > > > 主题: rte_hash bug: can't iterate all entries when deleting keys
> > > > > in rte_hash iterate loop.
> > > > >
> > > > > Hi all,
> > > > >
> > > > > In my test, entries can't be cleanup in rte_hash table when
> > > > > deleting keys in rte_hash iterate loop. The test steps:
> > > > > 1. create a hash table table1 with limit 30000, ext bucket
> > > > > enabled, and insert 30000 entries into this hash table.
> > > > > 2. create a larger hash table table2 with limit 60000, ,
> > > > > ext bucket
> > > > enabled.
> > > > > 3. iterate all entries of table1 and insert them to the table2.
> > > > > Insert new
> > > > > 10000 entries to this table2.
> > > > > 4. Then flush all entries from table2 by deleting keys in
> > > > > rte_hash iterate loop. But there are still some keys leaked in table2.
> > > > Is there any reason for flushing table2 in this manner?
> > > > Is it possible to use 'rte_hash_reset' instead?
> > > >
> > > > >
> > > > > From my analysis, the keys idx are stored in rte_hash main
> > > > > bucket key slots and extend bucket key stots.
> > > > > We iterate every no empty Keys in h->buckets and
> > > > > h->buckets_ext from start to last.
> > > > > When deleting keys the function __rte_hash_compact_ll() may
> > > > > move last_bkt's key to previous bucket in order to compact
> > > > > extend
> > bucket list.
> > > > > If the previous bucket has been iterated, the moved key may
> > > > > be missed for users.
> > > > > Then those missed keys are leaked and rte_hash table can't
> > > > > be
> > cleanup.
> > > > >
> > > > > Now I retry the iterate and delete keys, that can avoid this bug.
> > > > >
> > > > > Is there any ideas or solutions on this bug? Thanks.
> > > > >
> > > > > Jerry.
> [Wang, Yipeng]
> Thanks Honnappa for adding me to the recipient list, I am sorry that I missed
> this patch previously.
> In future you could add me to the recipient list for hash related
> patches/questions! Thanks!
>
> Although it is not a bug to me, I think this is a valid issue and thanks for
> bringing it up.
> I see the iterate function as like the iterator from C++ stl. Any
> insertion/delete should invalidate the current iterator (insertion too, since
> insertion could move things around in cuckoo path).
> From this point of view, by design there is no guarantee that the "next"
> pointer is still valid after any insertion/deletion during the iteration. We at
> least should clarify this in the documentation.
>
> That said, in c++ stl there is indeed a way to iterate and delete since it
> provides an erase function returning the next valid iterator.
> I think we don’t have such capability in rte_hash now as you pointed out, and
> this is a gap you want to fill.
>
> However, the proposed API " rte_hash_del_key_fixed" exposes the
> internals of the implementation.
> The internal of hash lib is supposed to be a black box to the user. Although
> the issue is caused by "compaction", the API should not expose that. I am
> thinking a function like the "erase" from stl which returns the next valid
> iterator should be a better way. What do you think?
>
[Lilijun (Jerry)]
Yes, I am agree with your views. Some "erase" function may be better for this cases.
> Resizing is an important feature that we always think to add into rte_hash.
> Feel free to propose resizing Features too into rte_hash as you are already
> doing similar thing right now.
>
[Lilijun (Jerry)]
Yes, the memory cost by rte hash may be waste when the keys usage is not high in common.
Resize features can figure out that problem in rte_hash. Thanks for your advice.
> BTW, you can add a [RFC] prefix to the subject line for future RFC patches.
>
> Other questions/comments are inlined.
>
next prev parent reply other threads:[~2020-05-13 1:28 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-27 2:28 [dpdk-dev] " Lilijun (Jerry)
2020-04-28 20:46 ` Honnappa Nagarahalli
2020-04-29 1:07 ` [dpdk-dev] 答复: " Lilijun (Jerry)
2020-05-05 23:17 ` [dpdk-dev] " Honnappa Nagarahalli
2020-05-06 1:09 ` [dpdk-dev] 答复: " Lilijun (Jerry)
2020-05-12 23:41 ` [dpdk-dev] " Wang, Yipeng1
2020-05-13 1:28 ` Lilijun (Jerry) [this message]
2020-05-14 17:44 ` Wang, Yipeng1
2021-03-24 21:33 ` [dpdk-dev] [dpdk-stable] 答复: " Thomas Monjalon
2021-03-24 23:25 ` Wang, Yipeng1
2021-03-25 8:04 ` Thomas Monjalon
2020-05-13 19:27 ` [dpdk-dev] " Honnappa Nagarahalli
2020-05-14 0:55 ` [dpdk-dev] 答复: " Lilijun (Jerry)
2020-05-14 1:21 ` [dpdk-dev] " Honnappa Nagarahalli
2020-05-14 1:51 ` [dpdk-dev] 答复: " Lilijun (Jerry)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40280F65B1B0B44E8089ED31C01616EBA49C65F4@dggeml529-mbx.china.huawei.com \
--to=jerry.lilijun@huawei.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=dev@dpdk.org \
--cc=nd@arm.com \
--cc=stable@dpdk.org \
--cc=wangyunjian@huawei.com \
--cc=xudingke@huawei.com \
--cc=yipeng1.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).