DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] rte_mempool_create fails with ENOMEM
@ 2014-12-18 13:25 Newman Poborsky
  2014-12-18 14:21 ` Alex Markuze
  2014-12-18 17:42 ` Ananyev, Konstantin
  0 siblings, 2 replies; 9+ messages in thread
From: Newman Poborsky @ 2014-12-18 13:25 UTC (permalink / raw)
  To: dev

Hi,

could someone please provide any explanation why sometimes mempool creation
fails with ENOMEM?

I run my test app several times without any problems and then I start
getting ENOMEM error when creating mempool that are used for packets. I try
to delete everything from /mnt/huge, I increase the number of huge pages,
remount /mnt/huge but nothing helps.

There is more than enough memory on server. I tried to debug
rte_mempool_create() call and it seems that after server is restarted free
mem segments are bigger than 2MB, but after running test app for several
times, it seems that all free mem segments have a size of 2MB, and since I
am requesting 8MB for my packet mempool, this fails.  I'm not really sure
that this conclusion is correct.

Does anybody have any idea what to check and how running my test app
several times affects hugepages?

For me, this doesn't make any since because after test app exits, resources
should be freed, right?

This has been driving me crazy for days now. I tried reading a bit more
theory about hugepages, but didn't find out anything that could help me.
Maybe it's something else and completely trivial, but I can't figure it
out, so any help is appreciated.

Thank you!

BR,
Newman P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-18 13:25 [dpdk-dev] rte_mempool_create fails with ENOMEM Newman Poborsky
@ 2014-12-18 14:21 ` Alex Markuze
  2014-12-18 17:42 ` Ananyev, Konstantin
  1 sibling, 0 replies; 9+ messages in thread
From: Alex Markuze @ 2014-12-18 14:21 UTC (permalink / raw)
  To: Newman Poborsky; +Cc: dev

I've Also seen a similar issue when trying to run a dpdk app which
allocates huge pools(~0.5GB) after a memory heavy operation on the machine.

I've come to the same conclusion as you did, that internal fragmentation is
causing pool creation failures.
It seems that the rte_mempool_xmem_create/rte_memzone_reserve_aligned are
attempting to create physicaly contiguous pools. Which may offer a slight
performance gain(?) but may cause unpredictable allocation issues which is
a big risk for DC deployments where hundreds or even thousands of machines
may be deployed with a dpdk app and fail inexplicably.

I didn't really get the chance to digg into the memory managment internals
of DPDK, so feel free to correct me where I'm off.

Thanks.

On Thu, Dec 18, 2014 at 3:25 PM, Newman Poborsky <newman555p@gmail.com>
wrote:
>
> Hi,
>
> could someone please provide any explanation why sometimes mempool creation
> fails with ENOMEM?
>
> I run my test app several times without any problems and then I start
> getting ENOMEM error when creating mempool that are used for packets. I try
> to delete everything from /mnt/huge, I increase the number of huge pages,
> remount /mnt/huge but nothing helps.
>
> There is more than enough memory on server. I tried to debug
> rte_mempool_create() call and it seems that after server is restarted free
> mem segments are bigger than 2MB, but after running test app for several
> times, it seems that all free mem segments have a size of 2MB, and since I
> am requesting 8MB for my packet mempool, this fails.  I'm not really sure
> that this conclusion is correct.
>
> Does anybody have any idea what to check and how running my test app
> several times affects hugepages?
>
> For me, this doesn't make any since because after test app exits, resources
> should be freed, right?
>
> This has been driving me crazy for days now. I tried reading a bit more
> theory about hugepages, but didn't find out anything that could help me.
> Maybe it's something else and completely trivial, but I can't figure it
> out, so any help is appreciated.
>
> Thank you!
>
> BR,
> Newman P.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-18 13:25 [dpdk-dev] rte_mempool_create fails with ENOMEM Newman Poborsky
  2014-12-18 14:21 ` Alex Markuze
@ 2014-12-18 17:42 ` Ananyev, Konstantin
  2014-12-18 20:03   ` Ananyev, Konstantin
  1 sibling, 1 reply; 9+ messages in thread
From: Ananyev, Konstantin @ 2014-12-18 17:42 UTC (permalink / raw)
  To: Newman Poborsky, dev

Hi 

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
> Sent: Thursday, December 18, 2014 1:26 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
> 
> Hi,
> 
> could someone please provide any explanation why sometimes mempool creation
> fails with ENOMEM?
> 
> I run my test app several times without any problems and then I start
> getting ENOMEM error when creating mempool that are used for packets. I try
> to delete everything from /mnt/huge, I increase the number of huge pages,
> remount /mnt/huge but nothing helps.
> 
> There is more than enough memory on server. I tried to debug
> rte_mempool_create() call and it seems that after server is restarted free
> mem segments are bigger than 2MB, but after running test app for several
> times, it seems that all free mem segments have a size of 2MB, and since I
> am requesting 8MB for my packet mempool, this fails.  I'm not really sure
> that this conclusion is correct.

Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
single physically continuous chunk of memory.
If no such chunk exist, then it would fail.
Why physically continuous?
Main reason - to make things easier for us, as in that case we don't have to worry
about situation when mbuf crosses page boundary. 
So you can overcome that problem like that:
Allocate max amount of memory you would need to hold all mbufs in worst case (all pages physically disjoint)
using rte_malloc().
Figure out it's physical mappings.
Call  rte_mempool_xmem_create().
You can look at: app/test-pmd/mempool_anon.c as a reference.
It uses same approach to create mempool over 4K pages.

We probably add similar function into mempool API (create_scatter_mempool or something)
or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
Though right now it is not there.

Another quick alternative - use 1G pages.

Konstantin

> 
> Does anybody have any idea what to check and how running my test app
> several times affects hugepages?
> 
> For me, this doesn't make any since because after test app exits, resources
> should be freed, right?
> 
> This has been driving me crazy for days now. I tried reading a bit more
> theory about hugepages, but didn't find out anything that could help me.
> Maybe it's something else and completely trivial, but I can't figure it
> out, so any help is appreciated.
> 
> Thank you!
> 
> BR,
> Newman P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-18 17:42 ` Ananyev, Konstantin
@ 2014-12-18 20:03   ` Ananyev, Konstantin
  2014-12-19 20:13     ` Newman Poborsky
  0 siblings, 1 reply; 9+ messages in thread
From: Ananyev, Konstantin @ 2014-12-18 20:03 UTC (permalink / raw)
  To: Ananyev, Konstantin, Newman Poborsky, dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Thursday, December 18, 2014 5:43 PM
> To: Newman Poborsky; dev@dpdk.org
> Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
> 
> Hi
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
> > Sent: Thursday, December 18, 2014 1:26 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
> >
> > Hi,
> >
> > could someone please provide any explanation why sometimes mempool creation
> > fails with ENOMEM?
> >
> > I run my test app several times without any problems and then I start
> > getting ENOMEM error when creating mempool that are used for packets. I try
> > to delete everything from /mnt/huge, I increase the number of huge pages,
> > remount /mnt/huge but nothing helps.
> >
> > There is more than enough memory on server. I tried to debug
> > rte_mempool_create() call and it seems that after server is restarted free
> > mem segments are bigger than 2MB, but after running test app for several
> > times, it seems that all free mem segments have a size of 2MB, and since I
> > am requesting 8MB for my packet mempool, this fails.  I'm not really sure
> > that this conclusion is correct.
> 
> Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
> single physically continuous chunk of memory.
> If no such chunk exist, then it would fail.
> Why physically continuous?
> Main reason - to make things easier for us, as in that case we don't have to worry
> about situation when mbuf crosses page boundary.
> So you can overcome that problem like that:
> Allocate max amount of memory you would need to hold all mbufs in worst case (all pages physically disjoint)
> using rte_malloc().

Actually my wrong: rte_malloc()s wouldn't help you here.
You probably need to allocate some external (not managed by EAL) memory in that case,
may be mmap() with MAP_HUGETLB, or something similar.

> Figure out it's physical mappings.
> Call  rte_mempool_xmem_create().
> You can look at: app/test-pmd/mempool_anon.c as a reference.
> It uses same approach to create mempool over 4K pages.
> 
> We probably add similar function into mempool API (create_scatter_mempool or something)
> or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
> Though right now it is not there.
> 
> Another quick alternative - use 1G pages.
> 
> Konstantin
> 
> >
> > Does anybody have any idea what to check and how running my test app
> > several times affects hugepages?
> >
> > For me, this doesn't make any since because after test app exits, resources
> > should be freed, right?
> >
> > This has been driving me crazy for days now. I tried reading a bit more
> > theory about hugepages, but didn't find out anything that could help me.
> > Maybe it's something else and completely trivial, but I can't figure it
> > out, so any help is appreciated.
> >
> > Thank you!
> >
> > BR,
> > Newman P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-18 20:03   ` Ananyev, Konstantin
@ 2014-12-19 20:13     ` Newman Poborsky
  2014-12-20  1:34       ` Stephen Hemminger
  0 siblings, 1 reply; 9+ messages in thread
From: Newman Poborsky @ 2014-12-19 20:13 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
konstantin.ananyev@intel.com> wrote:

>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > Sent: Thursday, December 18, 2014 5:43 PM
> > To: Newman Poborsky; dev@dpdk.org
> > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
> >
> > Hi
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
> > > Sent: Thursday, December 18, 2014 1:26 PM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
> > >
> > > Hi,
> > >
> > > could someone please provide any explanation why sometimes mempool
> creation
> > > fails with ENOMEM?
> > >
> > > I run my test app several times without any problems and then I start
> > > getting ENOMEM error when creating mempool that are used for packets.
> I try
> > > to delete everything from /mnt/huge, I increase the number of huge
> pages,
> > > remount /mnt/huge but nothing helps.
> > >
> > > There is more than enough memory on server. I tried to debug
> > > rte_mempool_create() call and it seems that after server is restarted
> free
> > > mem segments are bigger than 2MB, but after running test app for
> several
> > > times, it seems that all free mem segments have a size of 2MB, and
> since I
> > > am requesting 8MB for my packet mempool, this fails.  I'm not really
> sure
> > > that this conclusion is correct.
> >
> > Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
> > single physically continuous chunk of memory.
> > If no such chunk exist, then it would fail.
> > Why physically continuous?
> > Main reason - to make things easier for us, as in that case we don't
> have to worry
> > about situation when mbuf crosses page boundary.
> > So you can overcome that problem like that:
> > Allocate max amount of memory you would need to hold all mbufs in worst
> case (all pages physically disjoint)
> > using rte_malloc().
>
> Actually my wrong: rte_malloc()s wouldn't help you here.
> You probably need to allocate some external (not managed by EAL) memory in
> that case,
> may be mmap() with MAP_HUGETLB, or something similar.
>
> > Figure out it's physical mappings.
> > Call  rte_mempool_xmem_create().
> > You can look at: app/test-pmd/mempool_anon.c as a reference.
> > It uses same approach to create mempool over 4K pages.
> >
> > We probably add similar function into mempool API
> (create_scatter_mempool or something)
> > or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
> > Though right now it is not there.
> >
> > Another quick alternative - use 1G pages.
> >
> > Konstantin
>


Ok, thanks for the explanation. I understand that this is probably an OS
question more than DPDK, but is there a way to again allocate a contiguous
memory for n-th run of my test app?  It seems that hugepages get
divded/separated to individual 2MB hugepage. Shouldn't OS's memory
management system try to group those hupages back to one contiguous chunk
once my app/process is done?   Again, I know very little about Linux memory
management and hugepages, so forgive me if this is a stupid question.
Is rebooting the OS the only way to deal with this problem?  Or should I
just try to use 1GB hugepages?

p.s. Konstantin, sorry for the double reply, I accidentally forgot to
include dev list in my first reply  :)

Newman

>
> > >
> > > Does anybody have any idea what to check and how running my test app
> > > several times affects hugepages?
> > >
> > > For me, this doesn't make any since because after test app exits,
> resources
> > > should be freed, right?
> > >
> > > This has been driving me crazy for days now. I tried reading a bit more
> > > theory about hugepages, but didn't find out anything that could help
> me.
> > > Maybe it's something else and completely trivial, but I can't figure it
> > > out, so any help is appreciated.
> > >
> > > Thank you!
> > >
> > > BR,
> > > Newman P.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-19 20:13     ` Newman Poborsky
@ 2014-12-20  1:34       ` Stephen Hemminger
  2014-12-22 10:48         ` Newman Poborsky
  0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2014-12-20  1:34 UTC (permalink / raw)
  To: Newman Poborsky; +Cc: dev

You can reserve hugepages on the kernel cmdline (GRUB).

On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>
wrote:

> On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
> konstantin.ananyev@intel.com> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> > > Sent: Thursday, December 18, 2014 5:43 PM
> > > To: Newman Poborsky; dev@dpdk.org
> > > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
> > >
> > > Hi
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
> > > > Sent: Thursday, December 18, 2014 1:26 PM
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
> > > >
> > > > Hi,
> > > >
> > > > could someone please provide any explanation why sometimes mempool
> > creation
> > > > fails with ENOMEM?
> > > >
> > > > I run my test app several times without any problems and then I start
> > > > getting ENOMEM error when creating mempool that are used for packets.
> > I try
> > > > to delete everything from /mnt/huge, I increase the number of huge
> > pages,
> > > > remount /mnt/huge but nothing helps.
> > > >
> > > > There is more than enough memory on server. I tried to debug
> > > > rte_mempool_create() call and it seems that after server is restarted
> > free
> > > > mem segments are bigger than 2MB, but after running test app for
> > several
> > > > times, it seems that all free mem segments have a size of 2MB, and
> > since I
> > > > am requesting 8MB for my packet mempool, this fails.  I'm not really
> > sure
> > > > that this conclusion is correct.
> > >
> > > Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
> > > single physically continuous chunk of memory.
> > > If no such chunk exist, then it would fail.
> > > Why physically continuous?
> > > Main reason - to make things easier for us, as in that case we don't
> > have to worry
> > > about situation when mbuf crosses page boundary.
> > > So you can overcome that problem like that:
> > > Allocate max amount of memory you would need to hold all mbufs in worst
> > case (all pages physically disjoint)
> > > using rte_malloc().
> >
> > Actually my wrong: rte_malloc()s wouldn't help you here.
> > You probably need to allocate some external (not managed by EAL) memory
> in
> > that case,
> > may be mmap() with MAP_HUGETLB, or something similar.
> >
> > > Figure out it's physical mappings.
> > > Call  rte_mempool_xmem_create().
> > > You can look at: app/test-pmd/mempool_anon.c as a reference.
> > > It uses same approach to create mempool over 4K pages.
> > >
> > > We probably add similar function into mempool API
> > (create_scatter_mempool or something)
> > > or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
> > > Though right now it is not there.
> > >
> > > Another quick alternative - use 1G pages.
> > >
> > > Konstantin
> >
>
>
> Ok, thanks for the explanation. I understand that this is probably an OS
> question more than DPDK, but is there a way to again allocate a contiguous
> memory for n-th run of my test app?  It seems that hugepages get
> divded/separated to individual 2MB hugepage. Shouldn't OS's memory
> management system try to group those hupages back to one contiguous chunk
> once my app/process is done?   Again, I know very little about Linux memory
> management and hugepages, so forgive me if this is a stupid question.
> Is rebooting the OS the only way to deal with this problem?  Or should I
> just try to use 1GB hugepages?
>
> p.s. Konstantin, sorry for the double reply, I accidentally forgot to
> include dev list in my first reply  :)
>
> Newman
>
> >
> > > >
> > > > Does anybody have any idea what to check and how running my test app
> > > > several times affects hugepages?
> > > >
> > > > For me, this doesn't make any since because after test app exits,
> > resources
> > > > should be freed, right?
> > > >
> > > > This has been driving me crazy for days now. I tried reading a bit
> more
> > > > theory about hugepages, but didn't find out anything that could help
> > me.
> > > > Maybe it's something else and completely trivial, but I can't figure
> it
> > > > out, so any help is appreciated.
> > > >
> > > > Thank you!
> > > >
> > > > BR,
> > > > Newman P.
> >
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-20  1:34       ` Stephen Hemminger
@ 2014-12-22 10:48         ` Newman Poborsky
  2015-01-08  8:19           ` Newman Poborsky
  0 siblings, 1 reply; 9+ messages in thread
From: Newman Poborsky @ 2014-12-22 10:48 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> You can reserve hugepages on the kernel cmdline (GRUB).

Great, thanks, I'll try that!

Newman

>
> On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>
> wrote:
>>
>> On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
>> konstantin.ananyev@intel.com> wrote:
>>
>> >
>> >
>> > > -----Original Message-----
>> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
>> > > Konstantin
>> > > Sent: Thursday, December 18, 2014 5:43 PM
>> > > To: Newman Poborsky; dev@dpdk.org
>> > > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
>> > >
>> > > Hi
>> > >
>> > > > -----Original Message-----
>> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
>> > > > Sent: Thursday, December 18, 2014 1:26 PM
>> > > > To: dev@dpdk.org
>> > > > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
>> > > >
>> > > > Hi,
>> > > >
>> > > > could someone please provide any explanation why sometimes mempool
>> > creation
>> > > > fails with ENOMEM?
>> > > >
>> > > > I run my test app several times without any problems and then I
>> > > > start
>> > > > getting ENOMEM error when creating mempool that are used for
>> > > > packets.
>> > I try
>> > > > to delete everything from /mnt/huge, I increase the number of huge
>> > pages,
>> > > > remount /mnt/huge but nothing helps.
>> > > >
>> > > > There is more than enough memory on server. I tried to debug
>> > > > rte_mempool_create() call and it seems that after server is
>> > > > restarted
>> > free
>> > > > mem segments are bigger than 2MB, but after running test app for
>> > several
>> > > > times, it seems that all free mem segments have a size of 2MB, and
>> > since I
>> > > > am requesting 8MB for my packet mempool, this fails.  I'm not really
>> > sure
>> > > > that this conclusion is correct.
>> > >
>> > > Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
>> > > single physically continuous chunk of memory.
>> > > If no such chunk exist, then it would fail.
>> > > Why physically continuous?
>> > > Main reason - to make things easier for us, as in that case we don't
>> > have to worry
>> > > about situation when mbuf crosses page boundary.
>> > > So you can overcome that problem like that:
>> > > Allocate max amount of memory you would need to hold all mbufs in
>> > > worst
>> > case (all pages physically disjoint)
>> > > using rte_malloc().
>> >
>> > Actually my wrong: rte_malloc()s wouldn't help you here.
>> > You probably need to allocate some external (not managed by EAL) memory
>> > in
>> > that case,
>> > may be mmap() with MAP_HUGETLB, or something similar.
>> >
>> > > Figure out it's physical mappings.
>> > > Call  rte_mempool_xmem_create().
>> > > You can look at: app/test-pmd/mempool_anon.c as a reference.
>> > > It uses same approach to create mempool over 4K pages.
>> > >
>> > > We probably add similar function into mempool API
>> > (create_scatter_mempool or something)
>> > > or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
>> > > Though right now it is not there.
>> > >
>> > > Another quick alternative - use 1G pages.
>> > >
>> > > Konstantin
>> >
>>
>>
>> Ok, thanks for the explanation. I understand that this is probably an OS
>> question more than DPDK, but is there a way to again allocate a contiguous
>> memory for n-th run of my test app?  It seems that hugepages get
>> divded/separated to individual 2MB hugepage. Shouldn't OS's memory
>> management system try to group those hupages back to one contiguous chunk
>> once my app/process is done?   Again, I know very little about Linux
>> memory
>> management and hugepages, so forgive me if this is a stupid question.
>> Is rebooting the OS the only way to deal with this problem?  Or should I
>> just try to use 1GB hugepages?
>>
>> p.s. Konstantin, sorry for the double reply, I accidentally forgot to
>> include dev list in my first reply  :)
>>
>> Newman
>>
>> >
>> > > >
>> > > > Does anybody have any idea what to check and how running my test app
>> > > > several times affects hugepages?
>> > > >
>> > > > For me, this doesn't make any since because after test app exits,
>> > resources
>> > > > should be freed, right?
>> > > >
>> > > > This has been driving me crazy for days now. I tried reading a bit
>> > > > more
>> > > > theory about hugepages, but didn't find out anything that could help
>> > me.
>> > > > Maybe it's something else and completely trivial, but I can't figure
>> > > > it
>> > > > out, so any help is appreciated.
>> > > >
>> > > > Thank you!
>> > > >
>> > > > BR,
>> > > > Newman P.
>> >
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2014-12-22 10:48         ` Newman Poborsky
@ 2015-01-08  8:19           ` Newman Poborsky
  2015-01-10 19:26             ` Liran Zvibel
  0 siblings, 1 reply; 9+ messages in thread
From: Newman Poborsky @ 2015-01-08  8:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

I finally found the time to try this and I noticed that on a server
with 1 NUMA node, this works, but if  server has 2 NUMA nodes than by
default memory policy, reserved hugepages are divided on each node and
again DPDK test app fails for the reason already mentioned. I found
out that 'solution' for this is to deallocate hugepages on node1
(after boot) and leave them only on node0:
echo 0 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

Could someone please explain what changes when there are hugepages on
both nodes? Does this cause some memory fragmentation so that there
aren't enough contiguous segments? If so, how?

Thanks!

Newman

On Mon, Dec 22, 2014 at 11:48 AM, Newman Poborsky <newman555p@gmail.com> wrote:
> On Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> You can reserve hugepages on the kernel cmdline (GRUB).
>
> Great, thanks, I'll try that!
>
> Newman
>
>>
>> On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>
>> wrote:
>>>
>>> On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
>>> konstantin.ananyev@intel.com> wrote:
>>>
>>> >
>>> >
>>> > > -----Original Message-----
>>> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
>>> > > Konstantin
>>> > > Sent: Thursday, December 18, 2014 5:43 PM
>>> > > To: Newman Poborsky; dev@dpdk.org
>>> > > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
>>> > >
>>> > > Hi
>>> > >
>>> > > > -----Original Message-----
>>> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
>>> > > > Sent: Thursday, December 18, 2014 1:26 PM
>>> > > > To: dev@dpdk.org
>>> > > > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
>>> > > >
>>> > > > Hi,
>>> > > >
>>> > > > could someone please provide any explanation why sometimes mempool
>>> > creation
>>> > > > fails with ENOMEM?
>>> > > >
>>> > > > I run my test app several times without any problems and then I
>>> > > > start
>>> > > > getting ENOMEM error when creating mempool that are used for
>>> > > > packets.
>>> > I try
>>> > > > to delete everything from /mnt/huge, I increase the number of huge
>>> > pages,
>>> > > > remount /mnt/huge but nothing helps.
>>> > > >
>>> > > > There is more than enough memory on server. I tried to debug
>>> > > > rte_mempool_create() call and it seems that after server is
>>> > > > restarted
>>> > free
>>> > > > mem segments are bigger than 2MB, but after running test app for
>>> > several
>>> > > > times, it seems that all free mem segments have a size of 2MB, and
>>> > since I
>>> > > > am requesting 8MB for my packet mempool, this fails.  I'm not really
>>> > sure
>>> > > > that this conclusion is correct.
>>> > >
>>> > > Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
>>> > > single physically continuous chunk of memory.
>>> > > If no such chunk exist, then it would fail.
>>> > > Why physically continuous?
>>> > > Main reason - to make things easier for us, as in that case we don't
>>> > have to worry
>>> > > about situation when mbuf crosses page boundary.
>>> > > So you can overcome that problem like that:
>>> > > Allocate max amount of memory you would need to hold all mbufs in
>>> > > worst
>>> > case (all pages physically disjoint)
>>> > > using rte_malloc().
>>> >
>>> > Actually my wrong: rte_malloc()s wouldn't help you here.
>>> > You probably need to allocate some external (not managed by EAL) memory
>>> > in
>>> > that case,
>>> > may be mmap() with MAP_HUGETLB, or something similar.
>>> >
>>> > > Figure out it's physical mappings.
>>> > > Call  rte_mempool_xmem_create().
>>> > > You can look at: app/test-pmd/mempool_anon.c as a reference.
>>> > > It uses same approach to create mempool over 4K pages.
>>> > >
>>> > > We probably add similar function into mempool API
>>> > (create_scatter_mempool or something)
>>> > > or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
>>> > > Though right now it is not there.
>>> > >
>>> > > Another quick alternative - use 1G pages.
>>> > >
>>> > > Konstantin
>>> >
>>>
>>>
>>> Ok, thanks for the explanation. I understand that this is probably an OS
>>> question more than DPDK, but is there a way to again allocate a contiguous
>>> memory for n-th run of my test app?  It seems that hugepages get
>>> divded/separated to individual 2MB hugepage. Shouldn't OS's memory
>>> management system try to group those hupages back to one contiguous chunk
>>> once my app/process is done?   Again, I know very little about Linux
>>> memory
>>> management and hugepages, so forgive me if this is a stupid question.
>>> Is rebooting the OS the only way to deal with this problem?  Or should I
>>> just try to use 1GB hugepages?
>>>
>>> p.s. Konstantin, sorry for the double reply, I accidentally forgot to
>>> include dev list in my first reply  :)
>>>
>>> Newman
>>>
>>> >
>>> > > >
>>> > > > Does anybody have any idea what to check and how running my test app
>>> > > > several times affects hugepages?
>>> > > >
>>> > > > For me, this doesn't make any since because after test app exits,
>>> > resources
>>> > > > should be freed, right?
>>> > > >
>>> > > > This has been driving me crazy for days now. I tried reading a bit
>>> > > > more
>>> > > > theory about hugepages, but didn't find out anything that could help
>>> > me.
>>> > > > Maybe it's something else and completely trivial, but I can't figure
>>> > > > it
>>> > > > out, so any help is appreciated.
>>> > > >
>>> > > > Thank you!
>>> > > >
>>> > > > BR,
>>> > > > Newman P.
>>> >
>>
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
  2015-01-08  8:19           ` Newman Poborsky
@ 2015-01-10 19:26             ` Liran Zvibel
  0 siblings, 0 replies; 9+ messages in thread
From: Liran Zvibel @ 2015-01-10 19:26 UTC (permalink / raw)
  To: Newman Poborsky, dev

Hi Newman,

There are two options, either one of your pools is very large, and
just does not fit in half of the memory,
so if the physical memory must be split it just can never work, or
what you’re seeing is localized to your
environment, and just when allocating from both NUMAs the huge pages
just happen to be to scattered
for your pools to be allocated.

In any case, we also have to deal with large pools that don’t always
fit into consecutive huge pages as
allocated by the kernel. I have created a small patch to DPDK itself,
then some more code that can live
as part of the dpdk application that does the scattered allocation.

I’m going to send both parts here (the change to the DPDK and the user
part). I don’t know what are the
rules that allow pushing to the repository, so I won’t try to do so.

First — the DPDK patch, that just makes sure that the huge pates are
mapped in a continuous virtual memory,
and then the memory segments are allocated continuously in virtual
memory: I’m attaching full mbox content to make it easier
for you to use if you’d like. I created it against 1.7.1, since that
is the version we’re  using. If you’d like, I can also create it
against 1.8.0

====================================================

>From 10ebc74eda2c3fe9e5a34815e0f7ee1f44d99aa3 Mon Sep 17 00:00:00 2001
From: Liran Zvibel <liran@weka.io>
Date: Sat, 10 Jan 2015 12:46:54 +0200
Subject: [PATCH] Add an option to allocate huge pages in contiunous virtual
 addresses
To: dev@dpdk.org

Add a configuration option: CONFIG_RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
that advises the memory sengment allocation code to allocate as many
hugemages in a continuous way in virtual addresses as possible.

This way, a mempool may be created out of disparsed memzones allocated
from these new continuos memory segments.
---
 lib/librte_eal/linuxapp/eal/eal_memory.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index f2454f4..b8d68b0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -329,6 +329,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,

 #ifndef RTE_EAL_SINGLE_FILE_SEGMENTS
                else if (vma_len == 0) {
+#ifndef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
                        unsigned j, num_pages;

                        /* reserve a virtual area for next contiguous
@@ -340,6 +341,14 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
                                        break;
                        }
                        num_pages = j - i;
+#else // hugepages are will be allocated in a continous virtual address way
+                       unsigned num_pages;
+                       /* We will reserve a virtual area large enough
to fit ALL
+                        * physical blocks.
+                        * This way we can have bigger mempools even
if there is no
+                        * continuos physcial region.
        */
+                       num_pages = hpi->num_pages[0] - i;
+#endif
                        vma_len = num_pages * hugepage_sz;

                        /* get the biggest virtual memory area up to
@@ -1268,6 +1277,16 @@ rte_eal_hugepage_init(void)
                        new_memseg = 1;

                if (new_memseg) {
+#ifdef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
+                       if (0 <= j) {
+                               RTE_LOG(DEBUG, EAL, "Closing memory
segment #%d(%p) vaddr is %p phys is 0x%lx size is 0x%lx "
+                                       "which is #%ld pages next
vaddr will be at 0x%lx\n",
+                                       j,&mcfg->memseg[j],
+                                       mcfg->memseg[j].addr,
mcfg->memseg[j].phys_addr, mcfg->memseg[j].len,
+                                       mcfg->memseg[j].len /
mcfg->memseg[j].hugepage_sz,
+                                       mcfg->memseg[j].addr_64 +
mcfg->memseg[j].len);
+                       }
+#endif
                        j += 1;
                        if (j == RTE_MAX_MEMSEG)
                                break;
--
1.9.3 (Apple Git-50)

================================================================

Then there is the dpdk-application library part that implements the
struct rte_mempool *scattered_mempool_create(uint32_t elt_size,
uint32_t elt_num, int32_t socket_id,
                                             rte_mempool_ctor_t
*mp_init, void *mp_init_arg,
                                             rte_mempool_obj_ctor_t
*obj_init, void *obj_init_arg)

interface. If you would like, I can easily break the different
functions into their right place in the rte_memseg and rte_mempool
DPDK modules and have it included as another interface of the DPDK
library (as suggested by Konstantin below)

=====================================================
static inline int  is_memseg_valid(struct rte_memseg * free_memseg,
size_t requested_page_size,
                                   int socket_id)
{
        if (free_memseg->len == 0) {
                return 0;
        }

        if (socket_id != SOCKET_ID_ANY &&
            free_memseg->socket_id != SOCKET_ID_ANY &&
            free_memseg->socket_id != socket_id) {
                RTE_LOG(DEBUG, USER1, "memseg goes not qualify for
socked_id, requested %d got %d",
                         socket_id, free_memseg->socket_id);
                return 0;
        }

        if (free_memseg->len < requested_page_size) {
                RTE_LOG(DEBUG, USER1, "memseg too small. len %lu <
requested_page_size %lu",
                         free_memseg->len, requested_page_size);
                return 0;
        }


        if (free_memseg->hugepage_sz != requested_page_size) {
                RTE_LOG(DEBUG, USER1, "memset hugepage size !=
requested page size %lu != %lu",
                         free_memseg->hugepage_sz,
                         requested_page_size);
                return 0;
        }

        return 1;
}

static int try_allocating_memseg_range(struct rte_memseg *
free_memseg, int start,
                                       int requested_page_size, size_t
len, int socket_id)
{
        int i;
        for (i = start; i < RTE_MAX_MEMSEG; i++) {
                if (free_memseg[i].addr == NULL) {
                        return -1;
                }

                if (!is_memseg_valid(free_memseg +i,
requested_page_size, socket_id)) {
                        return -1;
                }

                if ((start != i) &&
                    ((char *)free_memseg[i].addr !=
(char*)free_memseg[i-1].addr + free_memseg[i-1].len)) {
                        RTE_LOG(DEBUG, USER1, "Looking for cont memseg range. "
                                 "[%d].vaddr %p != [%d].vaddr %p +
[i-1].len %lu == %p",
                                 i, free_memseg[i].addr, i-1,
free_memseg[i-1].addr,
                                 free_memseg[i-1].len,
                                 (char *)(free_memseg[i-1].addr) +
free_memseg[i-1].len);
                        return -1;
                }

                if ((free_memseg[i].len < len) && ((free_memseg[i].len
% requested_page_size) != 0)) {
                RTE_LOG(DEBUG, USER1, "#%d memseg length not a
multplie of page size, or last."
                         " len %lu len %% requsted_pg_size %lu,
requested_pg_sz %d",
                         i, free_memseg[i].len, free_memseg[i].len %
requested_page_size, requested_page_size);
                return -1;
                }


                if (len <= free_memseg[i].len) {
                        RTE_LOG(DEBUG, USER1, "Successfuly finished
lookng for memsegs. remaining req. "
                                 "len %lu seg_len %lu, start %d i %d",
                                 len, free_memseg[i].len, start, i);
                        return i - start +1;
                }

                if (i == start)  {
                        // We may not start on the beginning, have to
move to next pagesize alignment...
                        char * aligned_vaddr =
RTE_PTR_ALIGN_CEIL(free_memseg[i].addr, requested_page_size);
                        size_t diff = (size_t)(aligned_vaddr - (char
*)free_memseg[i].addr);
                        if ((free_memseg[i].len - diff) %
requested_page_size != 0) {
                                RTE_LOG(ERR, USER1, "BUG! First
segment is not page aligned! vaddr %p aligned "
                                           "vaddr %p diff %lu len %lu,
len - diff %lu, "
                                           "(len%%diff)/%d == %lu",
                                           free_memseg[i].addr,
aligned_vaddr, diff, free_memseg[i].len,
                                           free_memseg[i].len - diff,
                                           requested_page_size,
                                           (free_memseg[i].len - diff)
% requested_page_size);
                                return -1;
                        } else if (0 == free_memseg[i].len - diff) {
                                RTE_LOG(DEBUG, USER1, "After
alignment, first memseg is empty!");
                                return -1;
                        }

                        RTE_LOG(DEBUG, USER1, "First memseg gives
(after alignment) len %lu out of potential %lu",
                                 (free_memseg[i].len - diff),
free_memseg[i].len);
                        len -= (free_memseg[i].len - diff);
                }
                len -= free_memseg[i].len;
        }

        return -1;
}


/**
 * Will register several memory zones, in continueues virtual
addresses of large size.
 * All first memzones will use full pages, only the last memzone may
request less than a full hugepage.
 *
 * It will go through all the free memory segments, once it finds a
memsegment with full hugepages, it
 * will check wheter it can start allocating from that memory segment on.
 */
static const  struct rte_memzone *
memzone_reserve_multiple_cont_mz(const char * basename, size_t *
zones_len, size_t len, int socket_id,
                                 unsigned flags, unsigned align)
{
struct rte_mem_config *mcfg;
        const struct rte_memzone * ret = NULL;
        size_t requested_page_size;
        int i;
        struct rte_memseg * free_memseg = NULL;
        int first_memseg = -1;
        int memseg_count = -1;

        mcfg = rte_eal_get_configuration()->mem_config;
        free_memseg = mcfg->free_memseg;

        RTE_LOG(DEBUG, USER1, "mcfg is at %p free_memseg at %p memseg
at %p", mcfg, mcfg->free_memseg, mcfg->memseg);

        for (i = 0; i  < 10 && (free_memseg[i].addr != NULL); i++) {
                RTE_LOG(DEBUG, USER1, "free_memseg[%d] : vaddr 0x%p
phys_addr 0x%p len %lu pages: %lu [0x%lu]", i,
                         free_memseg[i].addr,
                         (void*)free_memseg[i].phys_addr,
free_memseg[i].len, free_memseg[i].len/free_memseg[i].hugepage_sz,
                         free_memseg[i].hugepage_sz);
        }


        for (i = 0; i  < 10 && (mcfg->memseg[i].addr != NULL); i++) {
                RTE_LOG(DEBUG, USER1, "memseg[%d] : vaddr 0x%p
phys_addr 0x%p len %lu pages: %lu [0x%lu]", i,
                         mcfg->memseg[i].addr,
                         (void*)mcfg->memseg[i].phys_addr, mcfg->memseg[i].len,
                         mcfg->memseg[i].len/mcfg->memseg[i].hugepage_sz,
                         mcfg->memseg[i].hugepage_sz);
        }

        *zones_len = 0;

        if (mcfg->memzone_idx >= RTE_MAX_MEMZONE) {
                RTE_LOG(DEBUG, USER1, "No more room for new memzones");
                return NULL;
        }

        if ((flags & (RTE_MEMZONE_2MB | RTE_MEMZONE_1GB)) == 0) {
                RTE_LOG(DEBUG, USER1, "Must request either 2MB or 1GB pages");
                return NULL;
        }

        if ((flags & RTE_MEMZONE_1GB ) && (flags & RTE_MEMZONE_2MB)) {
                RTE_LOG(DEBUG, USER1, "Cannot request both 1GB and 2MB pages");
                return NULL;
        }

        if (flags & RTE_MEMZONE_2MB) {
                requested_page_size = RTE_PGSIZE_2M;
        } else {
                requested_page_size = RTE_PGSIZE_1G;
        }

        if (len < requested_page_size) {
                RTE_LOG(DEBUG, USER1, "Requested length %lu is smaller
than requested pages size %lu",
                         len , requested_page_size);
                return NULL;
        }

        ret = rte_memzone_reserve_aligned(basename, len, socket_id,
flags, align);
        if (ret != NULL) {
                RTE_LOG(DEBUG, USER1, "Normal
rte_memzone_reserve_aligned worked!");
                *zones_len = 1;
                return ret;
        }

        RTE_LOG(DEBUG, USER1, "rte_memzone_reserve_aligned failed.
Will have to allocate on our own");
        rte_rwlock_write_lock(&mcfg->mlock);

        for (i = 0; i < RTE_MAX_MEMSEG; i++) {
                if (free_memseg[i].addr == NULL) {
                        break;
                }

                if (!is_memseg_valid(free_memseg +i,
requested_page_size, socket_id)) {
                        continue;
                }

                memseg_count =
try_allocating_memseg_range(free_memseg, i, requested_page_size, len,
                                                           socket_id);
                if (0 < memseg_count ) {
                        RTE_LOG(DEBUG, USER1, "Was able to find
memsegments for zone! "
                                 "first segment: %d segment_count %d len %lu",
                                 i, memseg_count, len);
                        first_memseg = i;

                        // Fix first memseg -- make sure it's page aligned!
                        char * aligned_vaddr =
RTE_PTR_ALIGN_CEIL(free_memseg[i].addr,

requested_page_size);
                        size_t diff = (size_t)(aligned_vaddr - (char
*)free_memseg[i].addr);
                        RTE_LOG(DEBUG, USER1, "Decreasing first
segment by %lu", diff);
                        free_memseg[i].addr = aligned_vaddr;
                        free_memseg[i].phys_addr += diff;
                        free_memseg[i].len -= diff;
                        if ((free_memseg[i].phys_addr %
requested_page_size != 0)) {
                                RTE_LOG(ERR, USER1, "After aligning
first free memseg, "
                                           "physical address NOT page
aligned! %p",
                                           (void*)free_memseg[i].phys_addr);
                                abort();
                        }

                        break;
                }
        }

        if (first_memseg < 0) {
                RTE_LOG(DEBUG, USER1, "Could not find memsegs to
allocate enough memory");
                goto out;
        }

        // now perform actual allocation.
        if (mcfg->memzone_idx + memseg_count >= RTE_MAX_MEMZONE) {
                RTE_LOG(DEBUG, USER1, "There are not enough memzones
to allocate. "
                         "memzone_idx %d memseg_count %d max %s=%d",
                         mcfg->memzone_idx, memseg_count,
RTE_STR(RTE_MAX_MEMZONE), RTE_MAX_MEMZONE);
                goto out;
        }

        ret = &mcfg->memzone[mcfg->memzone_idx];
        *zones_len = memseg_count;
        for (i = first_memseg; i < first_memseg + memseg_count; i++) {
                size_t allocated_length;
                if (free_memseg[i].len <= len) {
                        allocated_length = free_memseg[i].len;
                } else {
                        allocated_length = len;
                }

                struct rte_memzone * mz = &mcfg->memzone[mcfg->memzone_idx++];
                snprintf(mz->name, sizeof(mz->name), "%s%d", basename,
i - first_memseg);
                mz->phys_addr   = free_memseg[i].phys_addr;
                mz->addr        = free_memseg[i].addr;
                mz->len         = allocated_length;
                mz->hugepage_sz = free_memseg[i].hugepage_sz;
                mz->socket_id   = free_memseg[i].socket_id;
                mz->flags       = 0;
                mz->memseg_id   = i;

                free_memseg[i].len -= allocated_length;
                free_memseg[i].phys_addr += allocated_length;
                free_memseg[i].addr_64 += allocated_length;
                len -= allocated_length;
        }

        if (len != 0) {
                RTE_LOG(DEBUG, USER1, "After registering all the
memzone, len is too small! Len is %lu", len);
                ret = NULL;
                goto out;
        }
out:
        rte_rwlock_write_unlock(&mcfg->mlock);
        return ret;
}


static inline void build_physical_pages(phys_addr_t * phys_pages, int
num_phys_pages, size_t sz,
                                        const struct rte_memzone * mz,
int num_zones)
{
        size_t accounted_for_size =0;
        int curr_page = 0;
        int i;
        unsigned j;

        RTE_LOG(DEBUG, USER1, "Phys pages are at %p 2M is %d mz
pagesize is %lu trailing zeros: %d",
                 phys_pages, RTE_PGSIZE_2M, mz->hugepage_sz,
__builtin_ctz(mz->hugepage_sz));

        for (i = 0; i < num_zones; i++) {
                size_t mz_remaining_len = mz[i].len;
                for (j = 0; (j <= mz[i].len / RTE_PGSIZE_2M) && (0 <
mz_remaining_len) ; j++) {
                        phys_pages[curr_page++] = mz[i].phys_addr + j
* RTE_PGSIZE_2M;

                        size_t added_len =
RTE_MIN((size_t)RTE_PGSIZE_2M, mz_remaining_len);
                        accounted_for_size += added_len;
                        mz_remaining_len -= added_len;

                        if (sz <= accounted_for_size) {
                                RTE_LOG(DEBUG, USER1, "Filled in %d
pages of the physical pages array", curr_page);
                                return;
                        }
                        if (num_phys_pages < curr_page) {
                                RTE_LOG(ERR, USER1, "When building
physcial pages array, "
                                           "used pages (%d) is more
than allocated pages %d. "
                                           "accounted size %lu size %lu",
                                           curr_page, num_phys_pages,
accounted_for_size, sz);
                                abort();
                        }
                }
        }

        if (accounted_for_size < sz) {
                RTE_LOG(ERR, USER1, "Finished going over %d memory
zones, and still accounted size is %lu "
                           "and requested size is %lu",
                           num_zones, accounted_for_size, sz);
                abort();
        }
}

struct rte_mempool *scattered_mempool_create(uint32_t elt_size,
uint32_t elt_num, int32_t socket_id,
                                             rte_mempool_ctor_t
*mp_init, void *mp_init_arg,
                                             rte_mempool_obj_ctor_t
*obj_init, void *obj_init_arg)
{
        struct rte_mempool *mp;
        const struct rte_memzone *mz;
        size_t                          num_zones;
        struct rte_mempool_objsz obj_sz;
        uint32_t flags, total_size;
        size_t sz;

        flags = (MEMPOOL_F_NO_SPREAD|MEMPOOL_F_SC_GET|MEMPOOL_F_SP_PUT);

        total_size = rte_mempool_calc_obj_size(elt_size, flags, &obj_sz);

        sz = elt_num * total_size;
        /* We now have to account for the "gaps" at the end of each
page. Worst case is that we get
         * all distinct pages, so we have to add the gap for each
possible page */
        int pages_num = (sz + RTE_PGSIZE_2M -1) / RTE_PGSIZE_2M;
        int page_gap = RTE_PGSIZE_2M % elt_size;
        sz += pages_num + page_gap;

        RTE_LOG(DEBUG, USER1, "Will have to allocate %d 2M pages for
the page table.", pages_num);

        if ((mz = memzone_reserve_multiple_cont_mz("data_obj",
&num_zones, sz, socket_id,
                                                   RTE_MEMZONE_2MB,
RTE_PGSIZE_2M)) == NULL) {
                RTE_LOG(WARNING, USER1, "memzone reserve multi mz
returned NULL for socket id %d, will try ANY",
                          socket_id);
                if ((mz =
                     memzone_reserve_multiple_cont_mz("data_obj",
&num_zones, sz, socket_id,
                                                      RTE_MEMZONE_2MB,
RTE_PGSIZE_2M)) == NULL) {
                        RTE_LOG(ERR, USER1, "memzone reserve multi mz
returned NULL even for any socket");
                        return NULL;
                } else {
                        RTE_LOG(DEBUG, USER1, "memzone reserve multi
mz returne %p with %lu zones for SOCKET_ID_ANY",
                                 mz, num_zones);
                }
        } else {
                RTE_LOG(DEBUG, USER1, "memzone reserve multi mz
returned %p with %lu zones for size %lu  socket %d",
                         mz, num_zones, sz, socket_id);
        }

        // Now will "break" the pages into smaller ones
        phys_addr_t * phys_pages = malloc(sizeof(phys_addr_t)*pages_num);
        if(phys_pages == NULL) {
            RTE_LOG(DEBUG, USER1, "phys_pages is null. aborting");
            abort();
        }

        build_physical_pages(phys_pages, pages_num, sz, mz, num_zones);
        RTE_LOG(DEBUG, USER1, "Beginning of vaddr is %p beginning of
physical addr is 0x%lx", mz->addr, mz->phys_addr);
        mp = rte_mempool_xmem_create("data_pool", elt_num, elt_size,
                                     257 , sizeof(struct
rte_pktmbuf_pool_private),
                                     mp_init, mp_init_arg, obj_init,
obj_init_arg,
                                     socket_id, flags, (char *)mz[0].addr,
                                     phys_pages, pages_num,
rte_bsf32(RTE_PGSIZE_2M));

        RTE_LOG(DEBUG, USER1, "rte_mempool_xmem_create returned %p", mp);
        return mp;
}

=================================================================

Please let me know if you have any questions/comments about this code.

Best Regards,

Liran.

On Jan 8, 2015, at 10:19, Newman Poborsky <newman555p@gmail.com> wrote:

I finally found the time to try this and I noticed that on a server
with 1 NUMA node, this works, but if  server has 2 NUMA nodes than by
default memory policy, reserved hugepages are divided on each node and
again DPDK test app fails for the reason already mentioned. I found
out that 'solution' for this is to deallocate hugepages on node1
(after boot) and leave them only on node0:
echo 0 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages

Could someone please explain what changes when there are hugepages on
both nodes? Does this cause some memory fragmentation so that there
aren't enough contiguous segments? If so, how?

Thanks!

Newman

On Mon, Dec 22, 2014 at 11:48 AM, Newman Poborsky <newman555p@gmail.com> wrote:

On Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:

You can reserve hugepages on the kernel cmdline (GRUB).


Great, thanks, I'll try that!

Newman


On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>
wrote:


On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
konstantin.ananyev@intel.com> wrote:



-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
Konstantin
Sent: Thursday, December 18, 2014 5:43 PM
To: Newman Poborsky; dev@dpdk.org
Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM

Hi

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
Sent: Thursday, December 18, 2014 1:26 PM
To: dev@dpdk.org
Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM

Hi,

could someone please provide any explanation why sometimes mempool

creation

fails with ENOMEM?

I run my test app several times without any problems and then I
start
getting ENOMEM error when creating mempool that are used for
packets.

I try

to delete everything from /mnt/huge, I increase the number of huge

pages,

remount /mnt/huge but nothing helps.

There is more than enough memory on server. I tried to debug
rte_mempool_create() call and it seems that after server is
restarted

free

mem segments are bigger than 2MB, but after running test app for

several

times, it seems that all free mem segments have a size of 2MB, and

since I

am requesting 8MB for my packet mempool, this fails.  I'm not really

sure

that this conclusion is correct.


Yes,rte_mempool_create uses  rte_memzone_reserve() to allocate
single physically continuous chunk of memory.
If no such chunk exist, then it would fail.
Why physically continuous?
Main reason - to make things easier for us, as in that case we don't

have to worry

about situation when mbuf crosses page boundary.
So you can overcome that problem like that:
Allocate max amount of memory you would need to hold all mbufs in
worst

case (all pages physically disjoint)

using rte_malloc().


Actually my wrong: rte_malloc()s wouldn't help you here.
You probably need to allocate some external (not managed by EAL) memory
in
that case,
may be mmap() with MAP_HUGETLB, or something similar.

Figure out it's physical mappings.
Call  rte_mempool_xmem_create().
You can look at: app/test-pmd/mempool_anon.c as a reference.
It uses same approach to create mempool over 4K pages.

We probably add similar function into mempool API

(create_scatter_mempool or something)

or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
Though right now it is not there.

Another quick alternative - use 1G pages.

Konstantin




Ok, thanks for the explanation. I understand that this is probably an OS
question more than DPDK, but is there a way to again allocate a contiguous
memory for n-th run of my test app?  It seems that hugepages get
divded/separated to individual 2MB hugepage. Shouldn't OS's memory
management system try to group those hupages back to one contiguous chunk
once my app/process is done?   Again, I know very little about Linux
memory
management and hugepages, so forgive me if this is a stupid question.
Is rebooting the OS the only way to deal with this problem?  Or should I
just try to use 1GB hugepages?

p.s. Konstantin, sorry for the double reply, I accidentally forgot to
include dev list in my first reply  :)

Newman



Does anybody have any idea what to check and how running my test app
several times affects hugepages?

For me, this doesn't make any since because after test app exits,

resources

should be freed, right?

This has been driving me crazy for days now. I tried reading a bit
more
theory about hugepages, but didn't find out anything that could help

me.

Maybe it's something else and completely trivial, but I can't figure
it
out, so any help is appreciated.

Thank you!

BR,
Newman P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-10 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-18 13:25 [dpdk-dev] rte_mempool_create fails with ENOMEM Newman Poborsky
2014-12-18 14:21 ` Alex Markuze
2014-12-18 17:42 ` Ananyev, Konstantin
2014-12-18 20:03   ` Ananyev, Konstantin
2014-12-19 20:13     ` Newman Poborsky
2014-12-20  1:34       ` Stephen Hemminger
2014-12-22 10:48         ` Newman Poborsky
2015-01-08  8:19           ` Newman Poborsky
2015-01-10 19:26             ` Liran Zvibel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).