DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem
@ 2019-03-08  5:38 Lilijun
  2019-03-08  9:37 ` Burakov, Anatoly
  0 siblings, 1 reply; 4+ messages in thread
From: Lilijun @ 2019-03-08  5:38 UTC (permalink / raw)
  To: dev; +Cc: jerry.zhang, ian.stokes, Lilijun

Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA spaces increase to above 30G.
Here we can unmap the unneed VA spaces in rte_memseg_list.

Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 32feb41..56abdd2 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -1626,8 +1626,19 @@ void numa_error(char *where)
 		if (msl->base_va == NULL)
 			continue;
 		/* skip lists where there is at least one page allocated */
-		if (msl->memseg_arr.count > 0)
+		if (msl->memseg_arr.count > 0) {
+			if (internal_config.legacy_mem) {
+				struct rte_fbarray *arr = &msl->memseg_arr;
+				int idx = rte_fbarray_find_next_free(arr, 0);
+
+				while (idx >= 0) {
+					void *va = (void*)((char*)msl->base_va + idx * msl->page_sz);
+					munmap(va, msl->page_sz);
+					idx = rte_fbarray_find_next_free(arr, idx + 1);
+				}
+			}
 			continue;
+		}
 		/* this is an unused list, deallocate it */
 		mem_sz = msl->len;
 		munmap(msl->base_va, mem_sz);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem
  2019-03-08  5:38 [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem Lilijun
@ 2019-03-08  9:37 ` Burakov, Anatoly
  2019-03-12  1:47   ` Lilijun (Jerry, Cloud Networking)
  0 siblings, 1 reply; 4+ messages in thread
From: Burakov, Anatoly @ 2019-03-08  9:37 UTC (permalink / raw)
  To: Lilijun, dev; +Cc: jerry.zhang, ian.stokes

On 08-Mar-19 5:38 AM, Lilijun wrote:
> Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA spaces increase to above 30G.
> Here we can unmap the unneed VA spaces in rte_memseg_list.
> 
> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> ---
>   lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 32feb41..56abdd2 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1626,8 +1626,19 @@ void numa_error(char *where)
>   		if (msl->base_va == NULL)
>   			continue;
>   		/* skip lists where there is at least one page allocated */
> -		if (msl->memseg_arr.count > 0)
> +		if (msl->memseg_arr.count > 0) {
> +			if (internal_config.legacy_mem) {
> +				struct rte_fbarray *arr = &msl->memseg_arr;
> +				int idx = rte_fbarray_find_next_free(arr, 0);
> +
> +				while (idx >= 0) {
> +					void *va = (void*)((char*)msl->base_va + idx * msl->page_sz);
> +					munmap(va, msl->page_sz);
> +					idx = rte_fbarray_find_next_free(arr, idx + 1);
> +				}

I am not entirely convinced this change is safe to do. Technically, this 
space is marked as free, so correctly written code should not attempt to 
access it, however it is still potentially dangerous to have memory area 
that is supposed to be allocated (according to data structures' 
parameters), but isn't.

If you are deallocating the VA space, ideally you should also resize the 
memseg list (as in, change its length), because that leftover memory 
area is no longer valid. However, this then presents us with a mismatch 
between (va_start + len) and (va_start + page_sz * memseg_arr.len), 
which may break things further.

May i ask what is the purpose of this change? I mean, i understand the 
part about unused VA space sitting there, but what is the consequence of 
that? This isn't 32-bit codepath, and in 64-bit there's plenty of 
address space to go around, and this memory doesn't take up any system 
resources anyway because it is read-only anonymous memory, and is 
therefore backed by zero page instead of real pages. So, what's wrong 
with just leaving it there?

I don't see any advantage of this change, and i see plenty of 
disadvantages, so for now i'm inclined to NACK this particular patch.

_However_, i should note that if you feel this is very important feature 
to have and would still like to implement it, my advise would be to look 
at how 32-bit code works, and model the 64-bit implementation after 
that, because 32-bit codepath does exactly what you propose, and doesn't 
leave unused address space.

> +			} >   			continue;
> +		}
>   		/* this is an unused list, deallocate it */
>   		mem_sz = msl->len;
>   		munmap(msl->base_va, mem_sz);
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem
  2019-03-08  9:37 ` Burakov, Anatoly
@ 2019-03-12  1:47   ` Lilijun (Jerry, Cloud Networking)
  2019-03-12 11:02     ` Burakov, Anatoly
  0 siblings, 1 reply; 4+ messages in thread
From: Lilijun (Jerry, Cloud Networking) @ 2019-03-12  1:47 UTC (permalink / raw)
  To: Burakov, Anatoly, dev; +Cc: jerry.zhang, ian.stokes

Hi Anatoly,

> -----Original Message-----
> From: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
> Sent: Friday, March 08, 2019 5:38 PM
> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>;
> dev@dpdk.org
> Cc: jerry.zhang@intel.com; ian.stokes@intel.com
> Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for
> legacy mem
> 
> On 08-Mar-19 5:38 AM, Lilijun wrote:
> > Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA
> spaces increase to above 30G.
> > Here we can unmap the unneed VA spaces in rte_memseg_list.
> >
> > Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
> > ---
> >   lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
> >   1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > index 32feb41..56abdd2 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> > @@ -1626,8 +1626,19 @@ void numa_error(char *where)
> >   		if (msl->base_va == NULL)
> >   			continue;
> >   		/* skip lists where there is at least one page allocated */
> > -		if (msl->memseg_arr.count > 0)
> > +		if (msl->memseg_arr.count > 0) {
> > +			if (internal_config.legacy_mem) {
> > +				struct rte_fbarray *arr = &msl->memseg_arr;
> > +				int idx = rte_fbarray_find_next_free(arr, 0);
> > +
> > +				while (idx >= 0) {
> > +					void *va = (void*)((char*)msl-
> >base_va + idx * msl->page_sz);
> > +					munmap(va, msl->page_sz);
> > +					idx = rte_fbarray_find_next_free(arr,
> idx + 1);
> > +				}
> 
> I am not entirely convinced this change is safe to do. Technically, this space is
> marked as free, so correctly written code should not attempt to access it,
> however it is still potentially dangerous to have memory area that is
> supposed to be allocated (according to data structures'
> parameters), but isn't.
> 
> If you are deallocating the VA space, ideally you should also resize the
> memseg list (as in, change its length), because that leftover memory area is
> no longer valid. However, this then presents us with a mismatch between
> (va_start + len) and (va_start + page_sz * memseg_arr.len), which may
> break things further.

Yes, you're right, here we need resize the memseg length. I will update it if this patch is needed.
> 
> May i ask what is the purpose of this change? I mean, i understand the part
> about unused VA space sitting there, but what is the consequence of that?
> This isn't 32-bit codepath, and in 64-bit there's plenty of address space to go
> around, and this memory doesn't take up any system resources anyway
> because it is read-only anonymous memory, and is therefore backed by zero
> page instead of real pages. So, what's wrong with just leaving it there?

This change will cause a issues:  when dpdk apps crashed, the coredump file will become too large.

Thanks.

> 
> I don't see any advantage of this change, and i see plenty of disadvantages,
> so for now i'm inclined to NACK this particular patch.
> 
> _However_, i should note that if you feel this is very important feature to
> have and would still like to implement it, my advise would be to look at how
> 32-bit code works, and model the 64-bit implementation after that, because
> 32-bit codepath does exactly what you propose, and doesn't leave unused
> address space.
> 
> > +			} >   			continue;
> > +		}
> >   		/* this is an unused list, deallocate it */
> >   		mem_sz = msl->len;
> >   		munmap(msl->base_va, mem_sz);
> >
> 
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem
  2019-03-12  1:47   ` Lilijun (Jerry, Cloud Networking)
@ 2019-03-12 11:02     ` Burakov, Anatoly
  0 siblings, 0 replies; 4+ messages in thread
From: Burakov, Anatoly @ 2019-03-12 11:02 UTC (permalink / raw)
  To: Lilijun (Jerry, Cloud Networking), dev; +Cc: jerry.zhang, ian.stokes

On 12-Mar-19 1:47 AM, Lilijun (Jerry, Cloud Networking) wrote:
> Hi Anatoly,
> 
>> -----Original Message-----
>> From: Burakov, Anatoly [mailto:anatoly.burakov@intel.com]
>> Sent: Friday, March 08, 2019 5:38 PM
>> To: Lilijun (Jerry, Cloud Networking) <jerry.lilijun@huawei.com>;
>> dev@dpdk.org
>> Cc: jerry.zhang@intel.com; ian.stokes@intel.com
>> Subject: Re: [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for
>> legacy mem
>>
>> On 08-Mar-19 5:38 AM, Lilijun wrote:
>>> Comparing dpdk VA spaces to dpdk 16.11, the dpdk app process's VA
>> spaces increase to above 30G.
>>> Here we can unmap the unneed VA spaces in rte_memseg_list.
>>>
>>> Signed-off-by: Lilijun <jerry.lilijun@huawei.com>
>>> ---
>>>    lib/librte_eal/linuxapp/eal/eal_memory.c | 13 ++++++++++++-
>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> index 32feb41..56abdd2 100644
>>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>>> @@ -1626,8 +1626,19 @@ void numa_error(char *where)
>>>    		if (msl->base_va == NULL)
>>>    			continue;
>>>    		/* skip lists where there is at least one page allocated */
>>> -		if (msl->memseg_arr.count > 0)
>>> +		if (msl->memseg_arr.count > 0) {
>>> +			if (internal_config.legacy_mem) {
>>> +				struct rte_fbarray *arr = &msl->memseg_arr;
>>> +				int idx = rte_fbarray_find_next_free(arr, 0);
>>> +
>>> +				while (idx >= 0) {
>>> +					void *va = (void*)((char*)msl-
>>> base_va + idx * msl->page_sz);
>>> +					munmap(va, msl->page_sz);
>>> +					idx = rte_fbarray_find_next_free(arr,
>> idx + 1);
>>> +				}
>>
>> I am not entirely convinced this change is safe to do. Technically, this space is
>> marked as free, so correctly written code should not attempt to access it,
>> however it is still potentially dangerous to have memory area that is
>> supposed to be allocated (according to data structures'
>> parameters), but isn't.
>>
>> If you are deallocating the VA space, ideally you should also resize the
>> memseg list (as in, change its length), because that leftover memory area is
>> no longer valid. However, this then presents us with a mismatch between
>> (va_start + len) and (va_start + page_sz * memseg_arr.len), which may
>> break things further.
> 
> Yes, you're right, here we need resize the memseg length. I will update it if this patch is needed.

Resizing memseg list is not the best course of action because fbarray 
itself doesn't support resizing, so you'll end up with a mismatch 
between length of memory and length of fbarray backing the memseg list.

See below suggestion for implementation.

>>
>> May i ask what is the purpose of this change? I mean, i understand the part
>> about unused VA space sitting there, but what is the consequence of that?
>> This isn't 32-bit codepath, and in 64-bit there's plenty of address space to go
>> around, and this memory doesn't take up any system resources anyway
>> because it is read-only anonymous memory, and is therefore backed by zero
>> page instead of real pages. So, what's wrong with just leaving it there?
> 
> This change will cause a issues:  when dpdk apps crashed, the coredump file will become too large.
> 
> Thanks.

You must have different default coredump settings than i do, because i 
haven't seen Linux attempting to dump the entire address space before (i 
have seen FreeBSD do that, mind you...).

> 
>>
>> I don't see any advantage of this change, and i see plenty of disadvantages,
>> so for now i'm inclined to NACK this particular patch.
>>
>> _However_, i should note that if you feel this is very important feature to
>> have and would still like to implement it, my advise would be to look at how
>> 32-bit code works, and model the 64-bit implementation after that, because
>> 32-bit codepath does exactly what you propose, and doesn't leave unused
>> address space.

The above is the way to go as far as implementing this particular 
feature goes: this has to be done at memseg list allocation time, not 
post-factum, when memseg lists are already allocated.

>>
>>> +			} >   			continue;
>>> +		}
>>>    		/* this is an unused list, deallocate it */
>>>    		mem_sz = msl->len;
>>>    		munmap(msl->base_va, mem_sz);
>>>
>>
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-12 11:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08  5:38 [dpdk-dev] [PATCH] eal: unmap unneed dpdk VA spaces for legacy mem Lilijun
2019-03-08  9:37 ` Burakov, Anatoly
2019-03-12  1:47   ` Lilijun (Jerry, Cloud Networking)
2019-03-12 11:02     ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).