From: Yongseok Koh <yskoh@mellanox.com>
To: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: "Adrien Mazarguil" <adrien.mazarguil@6wind.com>,
"Nélio Laranjeiro" <nelio.laranjeiro@6wind.com>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 4/4] net/mlx4: add new Memory Region support
Date: Thu, 10 May 2018 03:00:44 +0000 [thread overview]
Message-ID: <2DDD3185-6284-4BA3-A187-AED4C96407EB@mellanox.com> (raw)
In-Reply-To: <274843e9-ed82-9576-baf8-a704babf64c5@intel.com>
> On May 9, 2018, at 4:12 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> On 5/9/2018 12:09 PM, Yongseok Koh wrote:
>> This is the new design of Memory Region (MR) for mlx PMD, in order to:
>> - Accommodate the new memory hotplug model.
>> - Support non-contiguous Mempool.
>>
>> There are multiple layers for MR search.
>>
>> L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most
>> Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized
>> array by linear search. L0/L1 is in an inline function -
>> mlx4_mr_lookup_cache().
>>
>> If L1 misses, the bottom-half function is called to look up the address
>> from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh()
>> and it is not an inline function. Data structure for L2 is the Binary Tree.
>>
>> If L2 misses, the search falls into the slowest path which takes locks in
>> order to access global device cache (priv->mr.cache) which is also a B-tree
>> and caches the original MR list (priv->mr.mr_list) of the device. Unless
>> the global cache is overflowed, it is all-inclusive of the MR list. This is
>> L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and
>> can't be expanded on the fly due to deadlock. Refer to the comments in the
>> code for the details - mr_lookup_dev(). If L3 is overflowed, the list will
>> have to be searched directly bypassing the cache although it is slower.
>>
>> If L3 misses, a new MR for the address should be created -
>> mlx4_mr_create(). When it creates a new MR, it tries to register adjacent
>> memsegs as much as possible which are virtually contiguous around the
>> address. This must take two locks - memory_hotplug_lock and
>> priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any
>> allocation/free of memory inside.
>>
>> In the free callback of the memory hotplug event, freed space is searched
>> from the MR list and corresponding bits are cleared from the bitmap of MRs.
>> This can fragment a MR and the MR will have multiple search entries in the
>> caches. Once there's a change by the event, the global cache must be
>> rebuilt and all the per-queue caches will be flushed as well. If memory is
>> frequently freed in run-time, that may cause jitter on dataplane processing
>> in the worst case by incurring MR cache flush and rebuild. But, it would be
>> the least probable scenario.
>>
>> To guarantee the most optimal performance, it is highly recommended to use
>> an EAL option - '--socket-mem'. Then, the reserved memory will be pinned
>> and won't be freed dynamically. And it is also recommended to configure
>> per-lcore cache of Mempool. Even though there're many MRs for a device or
>> MRs are highly fragmented, the cache of Mempool will be much helpful to
>> reduce misses on per-queue caches anyway.
>>
>> '--legacy-mem' is also supported.
>>
>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>
> <...>
>
>> +/**
>> + * Insert an entry to B-tree lookup table.
>> + *
>> + * @param bt
>> + * Pointer to B-tree structure.
>> + * @param entry
>> + * Pointer to new entry to insert.
>> + *
>> + * @return
>> + * 0 on success, -1 on failure.
>> + */
>> +static int
>> +mr_btree_insert(struct mlx4_mr_btree *bt, struct mlx4_mr_cache *entry)
>> +{
>> + struct mlx4_mr_cache *lkp_tbl;
>> + uint16_t idx = 0;
>> + size_t shift;
>> +
>> + assert(bt != NULL);
>> + assert(bt->len <= bt->size);
>> + assert(bt->len > 0);
>> + lkp_tbl = *bt->table;
>> + /* Find out the slot for insertion. */
>> + if (mr_btree_lookup(bt, &idx, entry->start) != UINT32_MAX) {
>> + DEBUG("abort insertion to B-tree(%p):"
>> + " already exist at idx=%u [0x%lx, 0x%lx) lkey=0x%x",
>> + (void *)bt, idx, entry->start, entry->end, entry->lkey);
>
> This and various other logs causing 32bits build error because of %lx usage. Can
> you please check them?
>
> I am feeling sad to complain a patch like this just because of log format issue,
> we should find a solution to this issue as community, either checkpatch checks
> or automated 32bit builds, I don't know.
Bummer. I have to change my bad habit of using %lx. And we will add 32-bit build
check to our internal system to filter this kind of mistakes beforehand.
Will work with Shahaf to fix it and rebase next-net-mlx.
Thanks,
Yongseok
next prev parent reply other threads:[~2018-05-10 3:00 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-02 23:16 [dpdk-dev] [PATCH 0/5] net/mlx: " Yongseok Koh
2018-05-02 23:16 ` [dpdk-dev] [PATCH 1/5] net/mlx5: trim debug messages for reference counters Yongseok Koh
2018-05-06 6:37 ` Shahaf Shuler
2018-05-07 21:37 ` Yongseok Koh
2018-05-02 23:16 ` [dpdk-dev] [PATCH 2/5] net/mlx5: remove Memory Region support Yongseok Koh
2018-05-06 6:41 ` Shahaf Shuler
2018-05-02 23:16 ` [dpdk-dev] [PATCH 3/5] net/mlx5: add new " Yongseok Koh
2018-05-03 8:21 ` Burakov, Anatoly
2018-05-06 12:53 ` Shahaf Shuler
2018-05-08 1:52 ` Yongseok Koh
2018-05-02 23:16 ` [dpdk-dev] [PATCH 4/5] net/mlx4: remove " Yongseok Koh
2018-05-02 23:16 ` [dpdk-dev] [PATCH 5/5] net/mlx4: add new " Yongseok Koh
2018-05-09 11:09 ` [dpdk-dev] [PATCH v2 0/4] net/mlx: " Yongseok Koh
2018-05-09 11:09 ` [dpdk-dev] [PATCH v2 1/4] net/mlx5: remove " Yongseok Koh
2018-05-09 12:03 ` Shahaf Shuler
2018-05-09 11:09 ` [dpdk-dev] [PATCH v2 2/4] net/mlx5: add new " Yongseok Koh
2018-05-09 11:09 ` [dpdk-dev] [PATCH v2 3/4] net/mlx4: remove " Yongseok Koh
2018-05-09 11:09 ` [dpdk-dev] [PATCH v2 4/4] net/mlx4: add new " Yongseok Koh
2018-05-09 23:12 ` Ferruh Yigit
2018-05-10 3:00 ` Yongseok Koh [this message]
2018-05-10 6:01 ` Yongseok Koh
2018-05-10 19:29 ` Ferruh Yigit
2018-05-15 9:00 ` Nélio Laranjeiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2DDD3185-6284-4BA3-A187-AED4C96407EB@mellanox.com \
--to=yskoh@mellanox.com \
--cc=adrien.mazarguil@6wind.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=nelio.laranjeiro@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).