DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Big spike in DPDK VSZ
@ 2020-01-30  7:48 siddarth rai
  2020-01-30  8:51 ` David Marchand
  2020-02-02  9:22 ` David Marchand
  0 siblings, 2 replies; 16+ messages in thread
From: siddarth rai @ 2020-01-30  7:48 UTC (permalink / raw)
  To: dev

Hi,

I have been using DPDK 19.08 and I notice the process VSZ is huge.

I tried running the test PMD. It takes 64G VSZ and if I use the
'--in-memory' option it takes up to 188G.

Is there anyway to disable allocation of such huge VSZ in DPDK ?
This is resulting in huge core files and I suspect that the problem will
compound on multi-NUMA machines.

Please help

Regards,
Siddarth

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30  7:48 [dpdk-dev] Big spike in DPDK VSZ siddarth rai
@ 2020-01-30  8:51 ` David Marchand
  2020-01-30 10:47   ` siddarth rai
  2020-02-04 10:23   ` Burakov, Anatoly
  2020-02-02  9:22 ` David Marchand
  1 sibling, 2 replies; 16+ messages in thread
From: David Marchand @ 2020-01-30  8:51 UTC (permalink / raw)
  To: siddarth rai, Burakov, Anatoly; +Cc: dev

On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>
> I tried running the test PMD. It takes 64G VSZ and if I use the
> '--in-memory' option it takes up to 188G.
>
> Is there anyway to disable allocation of such huge VSZ in DPDK ?

*Disclaimer* I don't know the arcanes of the mem subsystem.

I suppose this is due to the memory allocator in dpdk that reserves
unused virtual space (for memory hotplug + multiprocess).

If this is the case, maybe we could do something to enhance the
situation for applications that won't care about multiprocess.
Like inform dpdk that the application won't use multiprocess and skip
those reservations.

Or another idea would be to limit those reservations to what is passed
via --socket-limit.

Anatoly?



--
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30  8:51 ` David Marchand
@ 2020-01-30 10:47   ` siddarth rai
  2020-01-30 13:15     ` Meunier, Julien (Nokia - FR/Paris-Saclay)
  2020-03-10 15:26     ` David Marchand
  2020-02-04 10:23   ` Burakov, Anatoly
  1 sibling, 2 replies; 16+ messages in thread
From: siddarth rai @ 2020-01-30 10:47 UTC (permalink / raw)
  To: David Marchand; +Cc: Burakov, Anatoly, dev

Hi,

I did some further experiments and found out that version 18.02.2 doesn't
have the problem, but the 18.05.1 release has it.

Would really appreciate if someone can help, if there is a patch to get
over this issue in the DPDK code ?
This is becoming a huge practical issue for me as on multi NUMA setup, the
VSZ goes above 400G and I can't get core files to debug crashes in my app.

Regards,
Siddarth


Regards,
Siddarth

On Thu, Jan 30, 2020 at 2:21 PM David Marchand <david.marchand@redhat.com>
wrote:

> On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> > I have been using DPDK 19.08 and I notice the process VSZ is huge.
> >
> > I tried running the test PMD. It takes 64G VSZ and if I use the
> > '--in-memory' option it takes up to 188G.
> >
> > Is there anyway to disable allocation of such huge VSZ in DPDK ?
>
> *Disclaimer* I don't know the arcanes of the mem subsystem.
>
> I suppose this is due to the memory allocator in dpdk that reserves
> unused virtual space (for memory hotplug + multiprocess).
>
> If this is the case, maybe we could do something to enhance the
> situation for applications that won't care about multiprocess.
> Like inform dpdk that the application won't use multiprocess and skip
> those reservations.
>
> Or another idea would be to limit those reservations to what is passed
> via --socket-limit.
>
> Anatoly?
>
>
>
> --
> David Marchand
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30 10:47   ` siddarth rai
@ 2020-01-30 13:15     ` Meunier, Julien (Nokia - FR/Paris-Saclay)
  2020-01-31 12:14       ` siddarth rai
  2020-03-10 15:26     ` David Marchand
  1 sibling, 1 reply; 16+ messages in thread
From: Meunier, Julien (Nokia - FR/Paris-Saclay) @ 2020-01-30 13:15 UTC (permalink / raw)
  To: siddarth rai, David Marchand; +Cc: Burakov, Anatoly, dev

Hi,

I noticed also this same behavior since DPDK 18.05. As David said, it is related to the virtual space management in DPDK.
Please check the commit 66cc45e293ed ("mem: replace memseg with memseg lists") which introduces this new memory management.

If you use mlockall in your application, all virtual space are locked, and if you dump PageTable in /proc/meminfo, you will see a huge memory usage on Kernel side.
I am not an expert of the memory management topic, especially in the kernel, but what I observed is that mlockall locks also unused virtual memory space.

For testpmd, you can use in the testpmd command line the flag --no-mlockall.

For your application, you can use the flag MCL_ONFAULT (kernel >  4.4).
man mlockall::
    
       Mark all current (with MCL_CURRENT) or future (with MCL_FUTURE)
       mappings to lock pages when they are faulted in. When used with
       MCL_CURRENT, all present pages are locked, but mlockall() will not
       fault in non-present pages. When used with MCL_FUTURE, all future
       mappings will be marked to lock pages when they are faulted in, but
       they will not be populated by the lock when the mapping is created.
       MCL_ONFAULT must be used with either MCL_CURRENT or MCL_FUTURE or
       both.

These options will not reduce the VSZ, but at least, will not allocate unused memory.
Otherwise, you need to customize your DPDK .config in order to configure RTE_MAX_MEM_MB are related parameters for your specific application.

---
Julien Meunier

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of siddarth rai
> Sent: Thursday, January 30, 2020 11:48 AM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; dev <dev@dpdk.org>
> Subject: Re: [dpdk-dev] Big spike in DPDK VSZ
> 
> Hi,
> 
> I did some further experiments and found out that version 18.02.2 doesn't have
> the problem, but the 18.05.1 release has it.
> 
> Would really appreciate if someone can help, if there is a patch to get over this
> issue in the DPDK code ?
> This is becoming a huge practical issue for me as on multi NUMA setup, the VSZ
> goes above 400G and I can't get core files to debug crashes in my app.
> 
> Regards,
> Siddarth
> 
> 
> Regards,
> Siddarth
> 
> On Thu, Jan 30, 2020 at 2:21 PM David Marchand
> <david.marchand@redhat.com>
> wrote:
> 
> > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> > > I have been using DPDK 19.08 and I notice the process VSZ is huge.
> > >
> > > I tried running the test PMD. It takes 64G VSZ and if I use the
> > > '--in-memory' option it takes up to 188G.
> > >
> > > Is there anyway to disable allocation of such huge VSZ in DPDK ?
> >
> > *Disclaimer* I don't know the arcanes of the mem subsystem.
> >
> > I suppose this is due to the memory allocator in dpdk that reserves
> > unused virtual space (for memory hotplug + multiprocess).
> >
> > If this is the case, maybe we could do something to enhance the
> > situation for applications that won't care about multiprocess.
> > Like inform dpdk that the application won't use multiprocess and skip
> > those reservations.
> >
> > Or another idea would be to limit those reservations to what is passed
> > via --socket-limit.
> >
> > Anatoly?
> >
> >
> >
> > --
> > David Marchand
> >
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30 13:15     ` Meunier, Julien (Nokia - FR/Paris-Saclay)
@ 2020-01-31 12:14       ` siddarth rai
  0 siblings, 0 replies; 16+ messages in thread
From: siddarth rai @ 2020-01-31 12:14 UTC (permalink / raw)
  To: Meunier, Julien (Nokia - FR/Paris-Saclay)
  Cc: David Marchand, Burakov, Anatoly, dev

Hi,

I have created a ticket for the same -
https://bugs.dpdk.org/show_bug.cgi?id=386

Regards,
Siddarth

On Thu, Jan 30, 2020 at 6:45 PM Meunier, Julien (Nokia - FR/Paris-Saclay) <
julien.meunier@nokia.com> wrote:

> Hi,
>
> I noticed also this same behavior since DPDK 18.05. As David said, it is
> related to the virtual space management in DPDK.
> Please check the commit 66cc45e293ed ("mem: replace memseg with memseg
> lists") which introduces this new memory management.
>
> If you use mlockall in your application, all virtual space are locked, and
> if you dump PageTable in /proc/meminfo, you will see a huge memory usage on
> Kernel side.
> I am not an expert of the memory management topic, especially in the
> kernel, but what I observed is that mlockall locks also unused virtual
> memory space.
>
> For testpmd, you can use in the testpmd command line the flag
> --no-mlockall.
>
> For your application, you can use the flag MCL_ONFAULT (kernel >  4.4).
> man mlockall::
>
>        Mark all current (with MCL_CURRENT) or future (with MCL_FUTURE)
>        mappings to lock pages when they are faulted in. When used with
>        MCL_CURRENT, all present pages are locked, but mlockall() will not
>        fault in non-present pages. When used with MCL_FUTURE, all future
>        mappings will be marked to lock pages when they are faulted in, but
>        they will not be populated by the lock when the mapping is created.
>        MCL_ONFAULT must be used with either MCL_CURRENT or MCL_FUTURE or
>        both.
>
> These options will not reduce the VSZ, but at least, will not allocate
> unused memory.
> Otherwise, you need to customize your DPDK .config in order to configure
> RTE_MAX_MEM_MB are related parameters for your specific application.
>
> ---
> Julien Meunier
>
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of siddarth rai
> > Sent: Thursday, January 30, 2020 11:48 AM
> > To: David Marchand <david.marchand@redhat.com>
> > Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; dev <dev@dpdk.org>
> > Subject: Re: [dpdk-dev] Big spike in DPDK VSZ
> >
> > Hi,
> >
> > I did some further experiments and found out that version 18.02.2
> doesn't have
> > the problem, but the 18.05.1 release has it.
> >
> > Would really appreciate if someone can help, if there is a patch to get
> over this
> > issue in the DPDK code ?
> > This is becoming a huge practical issue for me as on multi NUMA setup,
> the VSZ
> > goes above 400G and I can't get core files to debug crashes in my app.
> >
> > Regards,
> > Siddarth
> >
> >
> > Regards,
> > Siddarth
> >
> > On Thu, Jan 30, 2020 at 2:21 PM David Marchand
> > <david.marchand@redhat.com>
> > wrote:
> >
> > > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> > > > I have been using DPDK 19.08 and I notice the process VSZ is huge.
> > > >
> > > > I tried running the test PMD. It takes 64G VSZ and if I use the
> > > > '--in-memory' option it takes up to 188G.
> > > >
> > > > Is there anyway to disable allocation of such huge VSZ in DPDK ?
> > >
> > > *Disclaimer* I don't know the arcanes of the mem subsystem.
> > >
> > > I suppose this is due to the memory allocator in dpdk that reserves
> > > unused virtual space (for memory hotplug + multiprocess).
> > >
> > > If this is the case, maybe we could do something to enhance the
> > > situation for applications that won't care about multiprocess.
> > > Like inform dpdk that the application won't use multiprocess and skip
> > > those reservations.
> > >
> > > Or another idea would be to limit those reservations to what is passed
> > > via --socket-limit.
> > >
> > > Anatoly?
> > >
> > >
> > >
> > > --
> > > David Marchand
> > >
> > >
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30  7:48 [dpdk-dev] Big spike in DPDK VSZ siddarth rai
  2020-01-30  8:51 ` David Marchand
@ 2020-02-02  9:22 ` David Marchand
  2020-02-04 10:20   ` Burakov, Anatoly
  1 sibling, 1 reply; 16+ messages in thread
From: David Marchand @ 2020-02-02  9:22 UTC (permalink / raw)
  To: siddarth rai, Burakov, Anatoly; +Cc: dev, julien.meunier

On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
>
> Hi,
>
> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>
> I tried running the test PMD. It takes 64G VSZ and if I use the
> '--in-memory' option it takes up to 188G.
>
> Is there anyway to disable allocation of such huge VSZ in DPDK ?
> This is resulting in huge core files and I suspect that the problem will
> compound on multi-NUMA machines.

For this particular issue, it might be interesting to look at madvise stuff:

       MADV_DONTDUMP (since Linux 3.4)
              Exclude from a core dump those pages in the range
specified by addr and length.  This is useful in applications that
have large areas of memory that are known not to be useful in a core
dump.
              The effect of MADV_DONTDUMP takes precedence over the
bit mask that is set via the /proc/PID/coredump_filter file (see
core(5)).

(FreeBSD seems to have a MADV_NOCORE flag too).


-- 
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-02  9:22 ` David Marchand
@ 2020-02-04 10:20   ` Burakov, Anatoly
  0 siblings, 0 replies; 16+ messages in thread
From: Burakov, Anatoly @ 2020-02-04 10:20 UTC (permalink / raw)
  To: David Marchand, siddarth rai; +Cc: dev, julien.meunier

On 02-Feb-20 9:22 AM, David Marchand wrote:
> On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
>>
>> Hi,
>>
>> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>>
>> I tried running the test PMD. It takes 64G VSZ and if I use the
>> '--in-memory' option it takes up to 188G.
>>
>> Is there anyway to disable allocation of such huge VSZ in DPDK ?
>> This is resulting in huge core files and I suspect that the problem will
>> compound on multi-NUMA machines.
> 
> For this particular issue, it might be interesting to look at madvise stuff:
> 
>         MADV_DONTDUMP (since Linux 3.4)
>                Exclude from a core dump those pages in the range
> specified by addr and length.  This is useful in applications that
> have large areas of memory that are known not to be useful in a core
> dump.
>                The effect of MADV_DONTDUMP takes precedence over the
> bit mask that is set via the /proc/PID/coredump_filter file (see
> core(5)).
> 
> (FreeBSD seems to have a MADV_NOCORE flag too).
> 
> 

This is perhaps the best option, as this problem isn't going away any 
time soon. We've had this problem on FreeBSD already, which is why 
--no-mlockall exists in testpmd and is default on FreeBSD, and which is 
why we advise all our validation teams to disable core dumping on 
FreeBSD when running DPDK. If a similar (but perhaps less invasive) 
solution exists for Linux, that's great.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30  8:51 ` David Marchand
  2020-01-30 10:47   ` siddarth rai
@ 2020-02-04 10:23   ` Burakov, Anatoly
  2020-02-04 10:55     ` siddarth rai
  2020-02-11  8:11     ` David Marchand
  1 sibling, 2 replies; 16+ messages in thread
From: Burakov, Anatoly @ 2020-02-04 10:23 UTC (permalink / raw)
  To: David Marchand, siddarth rai; +Cc: dev

On 30-Jan-20 8:51 AM, David Marchand wrote:
> On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
>> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>>
>> I tried running the test PMD. It takes 64G VSZ and if I use the
>> '--in-memory' option it takes up to 188G.
>>
>> Is there anyway to disable allocation of such huge VSZ in DPDK ?
> 
> *Disclaimer* I don't know the arcanes of the mem subsystem.
> 
> I suppose this is due to the memory allocator in dpdk that reserves
> unused virtual space (for memory hotplug + multiprocess).

Yes, that's correct. In order to guarantee memory reservation succeeding 
at all times, we need to reserve all possible memory in advance. 
Otherwise we may end up in a situation where primary process has 
allocated a page, but the secondary can't map it because the address 
space is already occupied by something else.

> 
> If this is the case, maybe we could do something to enhance the
> situation for applications that won't care about multiprocess.
> Like inform dpdk that the application won't use multiprocess and skip
> those reservations.

You're welcome to try this, but i assure you, avoiding these 
reservations is a lot of work, because you'd be adding a yet another 
path to an already overly complex allocator :)

> 
> Or another idea would be to limit those reservations to what is passed
> via --socket-limit.
> 
> Anatoly?

I have a patchset in the works that does this and was planning to submit 
it to 19.08, but things got in the way and it's still sitting there 
collecting bit rot. This may be reason enough to resurrect it and finish 
it up :)

> 
> 
> 
> --
> David Marchand
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 10:23   ` Burakov, Anatoly
@ 2020-02-04 10:55     ` siddarth rai
  2020-02-04 11:13       ` Burakov, Anatoly
  2020-02-11  8:11     ` David Marchand
  1 sibling, 1 reply; 16+ messages in thread
From: siddarth rai @ 2020-02-04 10:55 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: David Marchand, dev

Hi Anatoly,

I don't need a secondary process.

I tried out Julien's suggestion and set the param 'RTE_MAX_MEM_MB' value to
8192 (the original value was over 500K). This works as a cap.
The virtual size dropped down to less than 8G. So this seems to be working
for me.

I have a few queries/concerns though.
Is it safe to reduce the  RTE_MAX_MEM_MB to such a low value ? Can I reduce
it further ? What will be the impact of doing so ? Will it limit the
maximum size of mbuf pool which I create ?

Regards,
Siddarth

On Tue, Feb 4, 2020 at 3:53 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 30-Jan-20 8:51 AM, David Marchand wrote:
> > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> >> I have been using DPDK 19.08 and I notice the process VSZ is huge.
> >>
> >> I tried running the test PMD. It takes 64G VSZ and if I use the
> >> '--in-memory' option it takes up to 188G.
> >>
> >> Is there anyway to disable allocation of such huge VSZ in DPDK ?
> >
> > *Disclaimer* I don't know the arcanes of the mem subsystem.
> >
> > I suppose this is due to the memory allocator in dpdk that reserves
> > unused virtual space (for memory hotplug + multiprocess).
>
> Yes, that's correct. In order to guarantee memory reservation succeeding
> at all times, we need to reserve all possible memory in advance.
> Otherwise we may end up in a situation where primary process has
> allocated a page, but the secondary can't map it because the address
> space is already occupied by something else.
>
> >
> > If this is the case, maybe we could do something to enhance the
> > situation for applications that won't care about multiprocess.
> > Like inform dpdk that the application won't use multiprocess and skip
> > those reservations.
>
> You're welcome to try this, but i assure you, avoiding these
> reservations is a lot of work, because you'd be adding a yet another
> path to an already overly complex allocator :)
>
> >
> > Or another idea would be to limit those reservations to what is passed
> > via --socket-limit.
> >
> > Anatoly?
>
> I have a patchset in the works that does this and was planning to submit
> it to 19.08, but things got in the way and it's still sitting there
> collecting bit rot. This may be reason enough to resurrect it and finish
> it up :)
>
> >
> >
> >
> > --
> > David Marchand
> >
>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 10:55     ` siddarth rai
@ 2020-02-04 11:13       ` Burakov, Anatoly
  2020-02-04 11:57         ` siddarth rai
  0 siblings, 1 reply; 16+ messages in thread
From: Burakov, Anatoly @ 2020-02-04 11:13 UTC (permalink / raw)
  To: siddarth rai; +Cc: David Marchand, dev

On 04-Feb-20 10:55 AM, siddarth rai wrote:
> Hi Anatoly,
> 
> I don't need a secondary process.

I understand that you don't, however that doesn't negate the fact that 
the codepath expects that you do.

> 
> I tried out Julien's suggestion and set the param 'RTE_MAX_MEM_MB' value 
> to 8192 (the original value was over 500K). This works as a cap.
> The virtual size dropped down to less than 8G. So this seems to be 
> working for me.
> 
> I have a few queries/concerns though.
> Is it safe to reduce the RTE_MAX_MEM_MB to such a low value ? Can I 
> reduce it further ? What will be the impact of doing so ? Will it limit 
> the maximum size of mbuf pool which I create ?

It depends on your use case. The maximum size of mempool is limited as 
is, the better question is where to place that limit. In my experience, 
testpmd mempools are typically around 400MB per socket, so an 8G upper 
limit should not interfere with testpmd very much. However, depending on 
what else is there and what kind of allocations you may do, it may have 
other effects.

Currently, the size of each internal per-NUMA node, per-page size page 
table is dictated by three constraints: maximum amount of memory per 
page table (so that we don't attempt to reserve thousands of 1G pages), 
maximum number of pages per page table (so that we aren't left with a 
few hundred megabytes' worth of 2M pages), and total maximum amount of 
memory (which places an upper limit on the sum of all page tables' 
memory amounts).

You have lowered the latter to 8G which means that, depending on your 
system configuration, you will have at most 2G to 4G per page table. It 
is not possible to limit it further (for example, skip reservation on 
certain nodes or certain page sizes). Whether it will have an effect on 
your actual workload will depend on your use case.

> 
> Regards,
> Siddarth
> 
> On Tue, Feb 4, 2020 at 3:53 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 30-Jan-20 8:51 AM, David Marchand wrote:
>      > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com
>     <mailto:siddsr@gmail.com>> wrote:
>      >> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>      >>
>      >> I tried running the test PMD. It takes 64G VSZ and if I use the
>      >> '--in-memory' option it takes up to 188G.
>      >>
>      >> Is there anyway to disable allocation of such huge VSZ in DPDK ?
>      >
>      > *Disclaimer* I don't know the arcanes of the mem subsystem.
>      >
>      > I suppose this is due to the memory allocator in dpdk that reserves
>      > unused virtual space (for memory hotplug + multiprocess).
> 
>     Yes, that's correct. In order to guarantee memory reservation
>     succeeding
>     at all times, we need to reserve all possible memory in advance.
>     Otherwise we may end up in a situation where primary process has
>     allocated a page, but the secondary can't map it because the address
>     space is already occupied by something else.
> 
>      >
>      > If this is the case, maybe we could do something to enhance the
>      > situation for applications that won't care about multiprocess.
>      > Like inform dpdk that the application won't use multiprocess and skip
>      > those reservations.
> 
>     You're welcome to try this, but i assure you, avoiding these
>     reservations is a lot of work, because you'd be adding a yet another
>     path to an already overly complex allocator :)
> 
>      >
>      > Or another idea would be to limit those reservations to what is
>     passed
>      > via --socket-limit.
>      >
>      > Anatoly?
> 
>     I have a patchset in the works that does this and was planning to
>     submit
>     it to 19.08, but things got in the way and it's still sitting there
>     collecting bit rot. This may be reason enough to resurrect it and
>     finish
>     it up :)
> 
>      >
>      >
>      >
>      > --
>      > David Marchand
>      >
> 
> 
>     -- 
>     Thanks,
>     Anatoly
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 11:13       ` Burakov, Anatoly
@ 2020-02-04 11:57         ` siddarth rai
  2020-02-04 12:07           ` siddarth rai
  0 siblings, 1 reply; 16+ messages in thread
From: siddarth rai @ 2020-02-04 11:57 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: David Marchand, dev

Hi,

Thanks for the clarification

Regards,
Siddarth

On Tue, Feb 4, 2020 at 4:43 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:

> On 04-Feb-20 10:55 AM, siddarth rai wrote:
> > Hi Anatoly,
> >
> > I don't need a secondary process.
>
> I understand that you don't, however that doesn't negate the fact that
> the codepath expects that you do.
>
> >
> > I tried out Julien's suggestion and set the param 'RTE_MAX_MEM_MB' value
> > to 8192 (the original value was over 500K). This works as a cap.
> > The virtual size dropped down to less than 8G. So this seems to be
> > working for me.
> >
> > I have a few queries/concerns though.
> > Is it safe to reduce the RTE_MAX_MEM_MB to such a low value ? Can I
> > reduce it further ? What will be the impact of doing so ? Will it limit
> > the maximum size of mbuf pool which I create ?
>
> It depends on your use case. The maximum size of mempool is limited as
> is, the better question is where to place that limit. In my experience,
> testpmd mempools are typically around 400MB per socket, so an 8G upper
> limit should not interfere with testpmd very much. However, depending on
> what else is there and what kind of allocations you may do, it may have
> other effects.
>
> Currently, the size of each internal per-NUMA node, per-page size page
> table is dictated by three constraints: maximum amount of memory per
> page table (so that we don't attempt to reserve thousands of 1G pages),
> maximum number of pages per page table (so that we aren't left with a
> few hundred megabytes' worth of 2M pages), and total maximum amount of
> memory (which places an upper limit on the sum of all page tables'
> memory amounts).
>
> You have lowered the latter to 8G which means that, depending on your
> system configuration, you will have at most 2G to 4G per page table. It
> is not possible to limit it further (for example, skip reservation on
> certain nodes or certain page sizes). Whether it will have an effect on
> your actual workload will depend on your use case.
>
> >
> > Regards,
> > Siddarth
> >
> > On Tue, Feb 4, 2020 at 3:53 PM Burakov, Anatoly
> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >
> >     On 30-Jan-20 8:51 AM, David Marchand wrote:
> >      > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com
> >     <mailto:siddsr@gmail.com>> wrote:
> >      >> I have been using DPDK 19.08 and I notice the process VSZ is
> huge.
> >      >>
> >      >> I tried running the test PMD. It takes 64G VSZ and if I use the
> >      >> '--in-memory' option it takes up to 188G.
> >      >>
> >      >> Is there anyway to disable allocation of such huge VSZ in DPDK ?
> >      >
> >      > *Disclaimer* I don't know the arcanes of the mem subsystem.
> >      >
> >      > I suppose this is due to the memory allocator in dpdk that
> reserves
> >      > unused virtual space (for memory hotplug + multiprocess).
> >
> >     Yes, that's correct. In order to guarantee memory reservation
> >     succeeding
> >     at all times, we need to reserve all possible memory in advance.
> >     Otherwise we may end up in a situation where primary process has
> >     allocated a page, but the secondary can't map it because the address
> >     space is already occupied by something else.
> >
> >      >
> >      > If this is the case, maybe we could do something to enhance the
> >      > situation for applications that won't care about multiprocess.
> >      > Like inform dpdk that the application won't use multiprocess and
> skip
> >      > those reservations.
> >
> >     You're welcome to try this, but i assure you, avoiding these
> >     reservations is a lot of work, because you'd be adding a yet another
> >     path to an already overly complex allocator :)
> >
> >      >
> >      > Or another idea would be to limit those reservations to what is
> >     passed
> >      > via --socket-limit.
> >      >
> >      > Anatoly?
> >
> >     I have a patchset in the works that does this and was planning to
> >     submit
> >     it to 19.08, but things got in the way and it's still sitting there
> >     collecting bit rot. This may be reason enough to resurrect it and
> >     finish
> >     it up :)
> >
> >      >
> >      >
> >      >
> >      > --
> >      > David Marchand
> >      >
> >
> >
> >     --
> >     Thanks,
> >     Anatoly
> >
>
>
> --
> Thanks,
> Anatoly
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 11:57         ` siddarth rai
@ 2020-02-04 12:07           ` siddarth rai
  2020-02-04 16:18             ` Burakov, Anatoly
  0 siblings, 1 reply; 16+ messages in thread
From: siddarth rai @ 2020-02-04 12:07 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: David Marchand, dev

Hi Anatoly,

You mentioned that the maximum size of mempool is limited.
Can you tell what is the limit and where is it specified?

Regards,
Siddarth

On Tue, Feb 4, 2020 at 5:27 PM siddarth rai <siddsr@gmail.com> wrote:

> Hi,
>
> Thanks for the clarification
>
> Regards,
> Siddarth
>
> On Tue, Feb 4, 2020 at 4:43 PM Burakov, Anatoly <anatoly.burakov@intel.com>
> wrote:
>
>> On 04-Feb-20 10:55 AM, siddarth rai wrote:
>> > Hi Anatoly,
>> >
>> > I don't need a secondary process.
>>
>> I understand that you don't, however that doesn't negate the fact that
>> the codepath expects that you do.
>>
>> >
>> > I tried out Julien's suggestion and set the param 'RTE_MAX_MEM_MB'
>> value
>> > to 8192 (the original value was over 500K). This works as a cap.
>> > The virtual size dropped down to less than 8G. So this seems to be
>> > working for me.
>> >
>> > I have a few queries/concerns though.
>> > Is it safe to reduce the RTE_MAX_MEM_MB to such a low value ? Can I
>> > reduce it further ? What will be the impact of doing so ? Will it limit
>> > the maximum size of mbuf pool which I create ?
>>
>> It depends on your use case. The maximum size of mempool is limited as
>> is, the better question is where to place that limit. In my experience,
>> testpmd mempools are typically around 400MB per socket, so an 8G upper
>> limit should not interfere with testpmd very much. However, depending on
>> what else is there and what kind of allocations you may do, it may have
>> other effects.
>>
>> Currently, the size of each internal per-NUMA node, per-page size page
>> table is dictated by three constraints: maximum amount of memory per
>> page table (so that we don't attempt to reserve thousands of 1G pages),
>> maximum number of pages per page table (so that we aren't left with a
>> few hundred megabytes' worth of 2M pages), and total maximum amount of
>> memory (which places an upper limit on the sum of all page tables'
>> memory amounts).
>>
>> You have lowered the latter to 8G which means that, depending on your
>> system configuration, you will have at most 2G to 4G per page table. It
>> is not possible to limit it further (for example, skip reservation on
>> certain nodes or certain page sizes). Whether it will have an effect on
>> your actual workload will depend on your use case.
>>
>> >
>> > Regards,
>> > Siddarth
>> >
>> > On Tue, Feb 4, 2020 at 3:53 PM Burakov, Anatoly
>> > <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>> >
>> >     On 30-Jan-20 8:51 AM, David Marchand wrote:
>> >      > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com
>> >     <mailto:siddsr@gmail.com>> wrote:
>> >      >> I have been using DPDK 19.08 and I notice the process VSZ is
>> huge.
>> >      >>
>> >      >> I tried running the test PMD. It takes 64G VSZ and if I use the
>> >      >> '--in-memory' option it takes up to 188G.
>> >      >>
>> >      >> Is there anyway to disable allocation of such huge VSZ in DPDK ?
>> >      >
>> >      > *Disclaimer* I don't know the arcanes of the mem subsystem.
>> >      >
>> >      > I suppose this is due to the memory allocator in dpdk that
>> reserves
>> >      > unused virtual space (for memory hotplug + multiprocess).
>> >
>> >     Yes, that's correct. In order to guarantee memory reservation
>> >     succeeding
>> >     at all times, we need to reserve all possible memory in advance.
>> >     Otherwise we may end up in a situation where primary process has
>> >     allocated a page, but the secondary can't map it because the address
>> >     space is already occupied by something else.
>> >
>> >      >
>> >      > If this is the case, maybe we could do something to enhance the
>> >      > situation for applications that won't care about multiprocess.
>> >      > Like inform dpdk that the application won't use multiprocess and
>> skip
>> >      > those reservations.
>> >
>> >     You're welcome to try this, but i assure you, avoiding these
>> >     reservations is a lot of work, because you'd be adding a yet another
>> >     path to an already overly complex allocator :)
>> >
>> >      >
>> >      > Or another idea would be to limit those reservations to what is
>> >     passed
>> >      > via --socket-limit.
>> >      >
>> >      > Anatoly?
>> >
>> >     I have a patchset in the works that does this and was planning to
>> >     submit
>> >     it to 19.08, but things got in the way and it's still sitting there
>> >     collecting bit rot. This may be reason enough to resurrect it and
>> >     finish
>> >     it up :)
>> >
>> >      >
>> >      >
>> >      >
>> >      > --
>> >      > David Marchand
>> >      >
>> >
>> >
>> >     --
>> >     Thanks,
>> >     Anatoly
>> >
>>
>>
>> --
>> Thanks,
>> Anatoly
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 12:07           ` siddarth rai
@ 2020-02-04 16:18             ` Burakov, Anatoly
  0 siblings, 0 replies; 16+ messages in thread
From: Burakov, Anatoly @ 2020-02-04 16:18 UTC (permalink / raw)
  To: siddarth rai; +Cc: David Marchand, dev

On 04-Feb-20 12:07 PM, siddarth rai wrote:
> Hi Anatoly,
> 
> You mentioned that the maximum size of mempool is limited.
> Can you tell what is the limit and where is it specified?
> 
> Regards,
> Siddarth
> 

The mempool size itself isn't limited.

However, due to the nature of how memory subsystem works, there is 
always an upper limit to how much memory you can reserve (because once 
you run out of pre-reserved address space, you will effectively run out 
of memory).

Given that the current hardcoded upper limit for reserved address space 
is 128GB, this is effectively the limit you are referring to - you won't 
be able to create a mempool larger than 128GB because there is not 
enough reserved address space in which to put a bigger mempool.

The same applies to any other memory allocation, with a caveat that 
while a mempool can be non-contiguous in terms of VA memory (i.e. 
consist of several discontiguous VA areas), a single unit of memory 
allocation (i.e. a call to rte_malloc or rte_memzone_create) can only be 
as big as the largest contiguous chunk of address space.

Given that the current hardcoded limit is 16GB for 2M pages, and 32GB 
for 1GB pages, each allocation can be at most 16GB or 32GB long, 
depending on the page size from which you are allocating.

So, on a system with 1G and 2M pages, a mempool can only be as big as 
128GB, and each individual chunk of memory in that mempool can only be 
as big as 16GB for 2M pages, or 32GB for 1G pages.

These are big numbers, so in practice no one hits these limitations. 
Again, this is just the price you have to pay for supporting dynamic 
memory allocation in secondary processes. There is simply no other way 
to guarantee that all shared memory will reside in the same address 
space in all processes.

Notably, device hotplug doesn't provide such guarantee, which is why 
device hotplug (or even initialization) can fail in a secondary process. 
However, in device hotplug i suppose this is acceptable (or the 
community thinks it is). In the memory subsystem, i chose to be 
conservative and to always guarantee correctness, at a cost of placing 
an upper limit on memory allocations.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-04 10:23   ` Burakov, Anatoly
  2020-02-04 10:55     ` siddarth rai
@ 2020-02-11  8:11     ` David Marchand
  2020-02-11 10:28       ` Burakov, Anatoly
  1 sibling, 1 reply; 16+ messages in thread
From: David Marchand @ 2020-02-11  8:11 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: siddarth rai, dev

Hello Anatoly,

On Tue, Feb 4, 2020 at 11:23 AM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 30-Jan-20 8:51 AM, David Marchand wrote:
> > On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
> >> I have been using DPDK 19.08 and I notice the process VSZ is huge.
> >>
> >> I tried running the test PMD. It takes 64G VSZ and if I use the
> >> '--in-memory' option it takes up to 188G.
> >>
> >> Is there anyway to disable allocation of such huge VSZ in DPDK ?
> >
> > *Disclaimer* I don't know the arcanes of the mem subsystem.
> >
> > I suppose this is due to the memory allocator in dpdk that reserves
> > unused virtual space (for memory hotplug + multiprocess).
>
> Yes, that's correct. In order to guarantee memory reservation succeeding
> at all times, we need to reserve all possible memory in advance.
> Otherwise we may end up in a situation where primary process has
> allocated a page, but the secondary can't map it because the address
> space is already occupied by something else.
>
> >
> > If this is the case, maybe we could do something to enhance the
> > situation for applications that won't care about multiprocess.
> > Like inform dpdk that the application won't use multiprocess and skip
> > those reservations.
>
> You're welcome to try this, but i assure you, avoiding these
> reservations is a lot of work, because you'd be adding a yet another
> path to an already overly complex allocator :)

Went and looked at the EAL options we have.
I understand that the --in-memory (essentially the --no-shconf
functionality) option should be enough.
I can see that Siddarth uses it.

But the problem is that the memory allocator still reserves big va
areas for memory hotplug.


>
> >
> > Or another idea would be to limit those reservations to what is passed
> > via --socket-limit.
> >
> > Anatoly?
>
> I have a patchset in the works that does this and was planning to submit
> it to 19.08, but things got in the way and it's still sitting there
> collecting bit rot. This may be reason enough to resurrect it and finish
> it up :)

Does this patchset contain more than just taking --socket-limit into account?


--
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-02-11  8:11     ` David Marchand
@ 2020-02-11 10:28       ` Burakov, Anatoly
  0 siblings, 0 replies; 16+ messages in thread
From: Burakov, Anatoly @ 2020-02-11 10:28 UTC (permalink / raw)
  To: David Marchand; +Cc: siddarth rai, dev

On 11-Feb-20 8:11 AM, David Marchand wrote:
> Hello Anatoly,
> 
> On Tue, Feb 4, 2020 at 11:23 AM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
>>
>> On 30-Jan-20 8:51 AM, David Marchand wrote:
>>> On Thu, Jan 30, 2020 at 8:48 AM siddarth rai <siddsr@gmail.com> wrote:
>>>> I have been using DPDK 19.08 and I notice the process VSZ is huge.
>>>>
>>>> I tried running the test PMD. It takes 64G VSZ and if I use the
>>>> '--in-memory' option it takes up to 188G.
>>>>
>>>> Is there anyway to disable allocation of such huge VSZ in DPDK ?
>>>
>>> *Disclaimer* I don't know the arcanes of the mem subsystem.
>>>
>>> I suppose this is due to the memory allocator in dpdk that reserves
>>> unused virtual space (for memory hotplug + multiprocess).
>>
>> Yes, that's correct. In order to guarantee memory reservation succeeding
>> at all times, we need to reserve all possible memory in advance.
>> Otherwise we may end up in a situation where primary process has
>> allocated a page, but the secondary can't map it because the address
>> space is already occupied by something else.
>>
>>>
>>> If this is the case, maybe we could do something to enhance the
>>> situation for applications that won't care about multiprocess.
>>> Like inform dpdk that the application won't use multiprocess and skip
>>> those reservations.
>>
>> You're welcome to try this, but i assure you, avoiding these
>> reservations is a lot of work, because you'd be adding a yet another
>> path to an already overly complex allocator :)
> 
> Went and looked at the EAL options we have.
> I understand that the --in-memory (essentially the --no-shconf
> functionality) option should be enough.
> I can see that Siddarth uses it.
> 
> But the problem is that the memory allocator still reserves big va
> areas for memory hotplug.

Yes, because even though we don't *share* this memory, we still grow it 
dynamically. We could have *not* done it, but that would be adding 
another code path - currently, we're piggybacking off existing 
infrastructure.

> 
> 
>>
>>>
>>> Or another idea would be to limit those reservations to what is passed
>>> via --socket-limit.
>>>
>>> Anatoly?
>>
>> I have a patchset in the works that does this and was planning to submit
>> it to 19.08, but things got in the way and it's still sitting there
>> collecting bit rot. This may be reason enough to resurrect it and finish
>> it up :)
> 
> Does this patchset contain more than just taking --socket-limit into account?

No, it actually doesn't rely on any of the existing options. It has to 
do with tweaking the config options related to memseg list sizings 
(essentially like adjusting config options at compile time, only without 
hardcoding them).

> 
> 
> --
> David Marchand
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [dpdk-dev] Big spike in DPDK VSZ
  2020-01-30 10:47   ` siddarth rai
  2020-01-30 13:15     ` Meunier, Julien (Nokia - FR/Paris-Saclay)
@ 2020-03-10 15:26     ` David Marchand
  1 sibling, 0 replies; 16+ messages in thread
From: David Marchand @ 2020-03-10 15:26 UTC (permalink / raw)
  To: siddarth rai; +Cc: Burakov, Anatoly, dev

On Thu, Jan 30, 2020 at 11:48 AM siddarth rai <siddsr@gmail.com> wrote:
> I did some further experiments and found out that version 18.02.2 doesn't have the problem, but the 18.05.1 release has it.
>
> Would really appreciate if someone can help, if there is a patch to get over this issue in the DPDK code ?
> This is becoming a huge practical issue for me as on multi NUMA setup, the VSZ goes above 400G and I can't get core files to debug crashes in my app.

Wrt coredumps, could you try this patch ?
http://patchwork.dpdk.org/patch/66469/


-- 
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-10 15:26 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30  7:48 [dpdk-dev] Big spike in DPDK VSZ siddarth rai
2020-01-30  8:51 ` David Marchand
2020-01-30 10:47   ` siddarth rai
2020-01-30 13:15     ` Meunier, Julien (Nokia - FR/Paris-Saclay)
2020-01-31 12:14       ` siddarth rai
2020-03-10 15:26     ` David Marchand
2020-02-04 10:23   ` Burakov, Anatoly
2020-02-04 10:55     ` siddarth rai
2020-02-04 11:13       ` Burakov, Anatoly
2020-02-04 11:57         ` siddarth rai
2020-02-04 12:07           ` siddarth rai
2020-02-04 16:18             ` Burakov, Anatoly
2020-02-11  8:11     ` David Marchand
2020-02-11 10:28       ` Burakov, Anatoly
2020-02-02  9:22 ` David Marchand
2020-02-04 10:20   ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).