From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: "Lilijun (Jerry, Cloud Networking)" <jerry.lilijun@huawei.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "jerry.zhang@intel.com" <jerry.zhang@intel.com>,
"ian.stokes@intel.com" <ian.stokes@intel.com>
Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem
Date: Tue, 12 Mar 2019 11:02:09 +0000 [thread overview]
Message-ID: <58a8100f-4ea1-88e5-c8dd-76ae57c09b12@intel.com> (raw)
In-Reply-To: <40280F65B1B0B44E8089ED31C01616EBA43F936C@dggeml529-mbx.china.huawei.com>
On 12-Mar-19 1:47 AM, Lilijun (Jerry, Cloud Networking) wrote:
> Hi Anatoly,
>
>> -----Original Message-----
>> From: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
>> Sent: Friday, March 08, 2019 5:38 PM
>> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>;
>> dev@dpdk.org
>> Cc: jerry.zhang@intel.com; ian.stokes@intel.com
>> Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for
>> legacy mem
>>
>> On 08-Mar-19 5:38 AM, Lilijun wrote:
>>> Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA
>> spaces increase to above 30G.
>>> Here we can unmap the unneed VA spaces in rte_memseg_list.
>>>
>>> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
>>> ---
>>> lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
>>> 1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> index 32feb41..56abdd2 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> @@ -1626,8 +1626,19 @@ void numa_error(char *where)
>>> if (msl->base_va == NULL)
>>> continue;
>>> /* skip lists where there is at least one page allocated */
>>> - if (msl->memseg_arr.count > 0)
>>> + if (msl->memseg_arr.count > 0) {
>>> + if (internal_config.legacy_mem) {
>>> + struct rte_fbarray *arr = &msl->memseg_arr;
>>> + int idx = rte_fbarray_find_next_free(arr, 0);
>>> +
>>> + while (idx >= 0) {
>>> + void *va = (void*)((char*)msl-
>>> base_va + idx * msl->page_sz);
>>> + munmap(va, msl->page_sz);
>>> + idx = rte_fbarray_find_next_free(arr,
>> idx + 1);
>>> + }
>>
>> I am not entirely convinced this change is safe to do. Technically, this space is
>> marked as free, so correctly written code should not attempt to access it,
>> however it is still potentially dangerous to have memory area that is
>> supposed to be allocated (according to data structures'
>> parameters), but isn't.
>>
>> If you are deallocating the VA space, ideally you should also resize the
>> memseg list (as in, change its length), because that leftover memory area is
>> no longer valid. However, this then presents us with a mismatch between
>> (va_start + len) and (va_start + page_sz * memseg_arr.len), which may
>> break things further.
>
> Yes, you're right, here we need resize the memseg length. I will update it if this patch is needed.
Resizing memseg list is not the best course of action because fbarray
itself doesn't support resizing, so you'll end up with a mismatch
between length of memory and length of fbarray backing the memseg list.
See below suggestion for implementation.
>>
>> May i ask what is the purpose of this change? I mean, i understand the part
>> about unused VA space sitting there, but what is the consequence of that?
>> This isn't 32-bit codepath, and in 64-bit there's plenty of address space to go
>> around, and this memory doesn't take up any system resources anyway
>> because it is read-only anonymous memory, and is therefore backed by zero
>> page instead of real pages. So, what's wrong with just leaving it there?
>
> This change will cause a issues: when dpdk apps crashed, the coredump file will become too large.
>
> Thanks.
You must have different default coredump settings than i do, because i
haven't seen Linux attempting to dump the entire address space before (i
have seen FreeBSD do that, mind you...).
>
>>
>> I don't see any advantage of this change, and i see plenty of disadvantages,
>> so for now i'm inclined to NACK this particular patch.
>>
>> _However_, i should note that if you feel this is very important feature to
>> have and would still like to implement it, my advise would be to look at how
>> 32-bit code works, and model the 64-bit implementation after that, because
>> 32-bit codepath does exactly what you propose, and doesn't leave unused
>> address space.
The above is the way to go as far as implementing this particular
feature goes: this has to be done at memseg list allocation time, not
post-factum, when memseg lists are already allocated.
>>
>>> + } > continue;
>>> + }
>>> /* this is an unused list, deallocate it */
>>> mem_sz = msl->len;
>>> munmap(msl->base_va, mem_sz);
>>>
>>
>>
>> --
>> Thanks,
>> Anatoly
--
Thanks,
Anatoly
prev parent reply other threads:[~2019-03-12 11:02 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-08 5:38 Lilijun
2019-03-08 9:37 ` Burakov, Anatoly
2019-03-12 1:47 ` Lilijun (Jerry, Cloud Networking)
2019-03-12 11:02 ` Burakov, Anatoly [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58a8100f-4ea1-88e5-c8dd-76ae57c09b12@intel.com \
--to=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=ian.stokes@intel.com \
--cc=jerry.lilijun@huawei.com \
--cc=jerry.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).