* Re: [dpdk-dev] memory allocation requirements
2016-04-13 16:03 [dpdk-dev] memory allocation requirements Thomas Monjalon
@ 2016-04-13 17:00 ` Wiles, Keith
2016-04-14 8:48 ` Sergio Gonzalez Monroy
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Wiles, Keith @ 2016-04-13 17:00 UTC (permalink / raw)
To: Thomas Monjalon, Gonzalez Monroy, Sergio; +Cc: dev
>After looking at the patches for container support, it appears that
>some changes are needed in the memory management:
>http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>
>I think it is time to collect what are the needs and expectations of
>the DPDK memory allocator. The goal is to satisfy every needs while
>cleaning the API.
>Here is a first try to start the discussion.
>
>The memory allocator has 2 classes of API in DPDK.
>First the user/application allows or requires DPDK to take over some
>memory resources of the system. The characteristics can be:
> - numa node
> - page size
> - swappable or not
> - contiguous (cannot be guaranteed) or not
> - physical address (as root only)
>Then the drivers or other libraries use the memory through
> - rte_malloc
> - rte_memzone
> - rte_mempool
Do not forget about rte_pktmbuf_create() which replies on rte_mempool and rte_memzone for high performance. We need to make sure we do not break this area.
We need to draw a good diagram showing the relationship to memory allocation and API’s at least I need something like this to help understand the bigger picture. Then we can start looking at how to modify memory allocation.
What I want is to reduce the primary APIs related to the complexity of the APIs and start using less arguments and handle the most command cases. The more complex memory configurations should not be hidden in the API IMO, but more explicit using APIs to configure the memory allocation in the case of rte_mempool_create().
>I think we can integrate the characteristics of the requested memory
>in rte_malloc. Then rte_memzone would be only a named rte_malloc.
>The rte_mempool still focus on collection of objects with cache.
>
>If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
>and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
These need to be remove or at least moved to a runtime configuration.
>The Xen support should also be better integrated.
The Xen support and just external memory manager support is coming in 16.07, which I hope helps with the external memory management support by adding a better structure around how external memory is managed.
>
>Currently, the first class of API is directly implemented as command line
>parameters. Please let's think of C functions first.
I would also like to start thinking in terms of config-file read on startup instead of command line options. This would help create the C functions you are talking about IMO.
>The EAL parameters should simply wrap some API functions and let the
>applications tune the memory initialization with a well documented API.
>
>Probably that I forget some needs, e.g. for the secondary processes.
>Please comment.
This will be a big change and will effect many applications, but I hope it can be kept to a minimum.
>
Regards,
Keith
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-13 16:03 [dpdk-dev] memory allocation requirements Thomas Monjalon
2016-04-13 17:00 ` Wiles, Keith
@ 2016-04-14 8:48 ` Sergio Gonzalez Monroy
2016-04-14 14:46 ` Olivier MATZ
2016-05-18 10:28 ` Alejandro Lucero
3 siblings, 0 replies; 8+ messages in thread
From: Sergio Gonzalez Monroy @ 2016-04-14 8:48 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
On 13/04/2016 17:03, Thomas Monjalon wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
+1
> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
> - numa node
> - page size
> - swappable or not
> - contiguous (cannot be guaranteed) or not
> - physical address (as root only)
I think this ties up with the different command line options related to
memory.
We have 3 choices:
1) no option : allocate all free hugepages in the system.
Read free hugepages from sysfs (possible race conditions if
multiple mount points
for the same page size). We also need to account for a limit on the
hugetlbfs mount,
plus if we have a cgroup it looks like we have no other way than
handle SIGBUS signal
to deal with the fact that we may succeed allocating the hugepages
even though
they are not pre-faulted (this happens with MAP_POPULATE option too).
2) -m : allocate as much memory regardless of the numa node.
3) --socket-mem : allocate memory per numa node.
At the moment we are not able to specify how much memory of a given page
size we
want to allocate.
So would be provide contiguous memory as an option changing default
behavior?
> Then the drivers or other libraries use the memory through
> - rte_malloc
> - rte_memzone
> - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.
So the other bit we need to remember is the memory for the hardware queues.
There is already an API in ethdev rte_eth_dma_zone_reserve() which I
think would
make sense to move to EAL so the memory allocator can guarantee contig
memory
transparently for the cases that we may have memory of different
hugepage sizes.
> If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
> and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
> The Xen support should also be better integrated.
CONFIG_RTE_LIBRTE_IVSHMEM should probably be a runtime option and
CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS could likely be removed once we have a
single mmap file for hugepages.
> Currently, the first class of API is directly implemented as command line
> parameters. Please let's think of C functions first.
> The EAL parameters should simply wrap some API functions and let the
> applications tune the memory initialization with a well documented API.
>
> Probably that I forget some needs, e.g. for the secondary processes.
> Please comment.
Regards,
Sergio
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-13 16:03 [dpdk-dev] memory allocation requirements Thomas Monjalon
2016-04-13 17:00 ` Wiles, Keith
2016-04-14 8:48 ` Sergio Gonzalez Monroy
@ 2016-04-14 14:46 ` Olivier MATZ
2016-04-14 15:39 ` Sergio Gonzalez Monroy
2016-05-18 10:28 ` Alejandro Lucero
3 siblings, 1 reply; 8+ messages in thread
From: Olivier MATZ @ 2016-04-14 14:46 UTC (permalink / raw)
To: Thomas Monjalon, sergio.gonzalez.monroy; +Cc: dev
Hi,
On 04/13/2016 06:03 PM, Thomas Monjalon wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>
> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
> - numa node
> - page size
> - swappable or not
> - contiguous (cannot be guaranteed) or not
> - physical address (as root only)
> Then the drivers or other libraries use the memory through
> - rte_malloc
> - rte_memzone
> - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.
Just to mention that some evolutions [1] are planned in mempool in
16.07, allowing to populate a mempool with several chunks of memory,
and still ensuring that the objects are physically contiguous. It
completely removes the need to allocate a big virtually contiguous
memory zone (and also physically contiguous if not using
rte_mempool_create_xmem(), which is probably the case in most of
the applications).
Knowing this, the code that remaps the hugepages to get the largest
possible physically contiguous zone probably becomes useless after
the mempool series. Changing it to only one mmap(file) in hugetlbfs
per NUMA socket would clearly simplify this part of EAL.
For other allocations that must be physically contiguous (ex: zones
shared with the hardware), having a page-sized granularity is maybe
enough.
Regards,
Olivier
[1] http://dpdk.org/ml/archives/dev/2016-April/037464.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-14 14:46 ` Olivier MATZ
@ 2016-04-14 15:39 ` Sergio Gonzalez Monroy
2016-04-15 7:12 ` Olivier Matz
0 siblings, 1 reply; 8+ messages in thread
From: Sergio Gonzalez Monroy @ 2016-04-14 15:39 UTC (permalink / raw)
To: Olivier MATZ, Thomas Monjalon; +Cc: dev
On 14/04/2016 15:46, Olivier MATZ wrote:
> Hi,
>
> On 04/13/2016 06:03 PM, Thomas Monjalon wrote:
>> After looking at the patches for container support, it appears that
>> some changes are needed in the memory management:
>> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>>
>>
>> I think it is time to collect what are the needs and expectations of
>> the DPDK memory allocator. The goal is to satisfy every needs while
>> cleaning the API.
>> Here is a first try to start the discussion.
>>
>> The memory allocator has 2 classes of API in DPDK.
>> First the user/application allows or requires DPDK to take over some
>> memory resources of the system. The characteristics can be:
>> - numa node
>> - page size
>> - swappable or not
>> - contiguous (cannot be guaranteed) or not
>> - physical address (as root only)
>> Then the drivers or other libraries use the memory through
>> - rte_malloc
>> - rte_memzone
>> - rte_mempool
>> I think we can integrate the characteristics of the requested memory
>> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
>> The rte_mempool still focus on collection of objects with cache.
>
> Just to mention that some evolutions [1] are planned in mempool in
> 16.07, allowing to populate a mempool with several chunks of memory,
> and still ensuring that the objects are physically contiguous. It
> completely removes the need to allocate a big virtually contiguous
> memory zone (and also physically contiguous if not using
> rte_mempool_create_xmem(), which is probably the case in most of
> the applications).
>
> Knowing this, the code that remaps the hugepages to get the largest
> possible physically contiguous zone probably becomes useless after
> the mempool series. Changing it to only one mmap(file) in hugetlbfs
> per NUMA socket would clearly simplify this part of EAL.
>
Are you suggesting to make those changes after the mempool series
has been applied but keeping the current memzone/malloc behavior?
Regards,
Sergio
> For other allocations that must be physically contiguous (ex: zones
> shared with the hardware), having a page-sized granularity is maybe
> enough.
>
> Regards,
> Olivier
>
> [1] http://dpdk.org/ml/archives/dev/2016-April/037464.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-14 15:39 ` Sergio Gonzalez Monroy
@ 2016-04-15 7:12 ` Olivier Matz
2016-04-15 8:47 ` Sergio Gonzalez Monroy
0 siblings, 1 reply; 8+ messages in thread
From: Olivier Matz @ 2016-04-15 7:12 UTC (permalink / raw)
To: Sergio Gonzalez Monroy, Thomas Monjalon; +Cc: dev
Hi,
On 04/14/2016 05:39 PM, Sergio Gonzalez Monroy wrote:
>> Just to mention that some evolutions [1] are planned in mempool in
>> 16.07, allowing to populate a mempool with several chunks of memory,
>> and still ensuring that the objects are physically contiguous. It
>> completely removes the need to allocate a big virtually contiguous
>> memory zone (and also physically contiguous if not using
>> rte_mempool_create_xmem(), which is probably the case in most of
>> the applications).
>>
>> Knowing this, the code that remaps the hugepages to get the largest
>> possible physically contiguous zone probably becomes useless after
>> the mempool series. Changing it to only one mmap(file) in hugetlbfs
>> per NUMA socket would clearly simplify this part of EAL.
>>
>
> Are you suggesting to make those changes after the mempool series
> has been applied but keeping the current memzone/malloc behavior?
I wonder if the default property of memzone/malloc which is to
allocate physically contiguous memory shouldn't be dropped. It could
remain optional, knowing that allocating a physically contiguous zone
larger than a page cannot be guaranteed.
But yes, I'm in favor of doing these changes in eal_memory.c, it would
drop a lot a complex code (all rtemap* stuff), and today I'm not seeing
any big issue of doing it... maybe we'll find one during the
discussion :)
Regards,
Olivier
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-15 7:12 ` Olivier Matz
@ 2016-04-15 8:47 ` Sergio Gonzalez Monroy
0 siblings, 0 replies; 8+ messages in thread
From: Sergio Gonzalez Monroy @ 2016-04-15 8:47 UTC (permalink / raw)
To: Olivier Matz, Thomas Monjalon; +Cc: dev
On 15/04/2016 08:12, Olivier Matz wrote:
> Hi,
>
> On 04/14/2016 05:39 PM, Sergio Gonzalez Monroy wrote:
>>> Just to mention that some evolutions [1] are planned in mempool in
>>> 16.07, allowing to populate a mempool with several chunks of memory,
>>> and still ensuring that the objects are physically contiguous. It
>>> completely removes the need to allocate a big virtually contiguous
>>> memory zone (and also physically contiguous if not using
>>> rte_mempool_create_xmem(), which is probably the case in most of
>>> the applications).
>>>
>>> Knowing this, the code that remaps the hugepages to get the largest
>>> possible physically contiguous zone probably becomes useless after
>>> the mempool series. Changing it to only one mmap(file) in hugetlbfs
>>> per NUMA socket would clearly simplify this part of EAL.
>>>
>> Are you suggesting to make those changes after the mempool series
>> has been applied but keeping the current memzone/malloc behavior?
> I wonder if the default property of memzone/malloc which is to
> allocate physically contiguous memory shouldn't be dropped. It could
> remain optional, knowing that allocating a physically contiguous zone
> larger than a page cannot be guaranteed.
>
> But yes, I'm in favor of doing these changes in eal_memory.c, it would
> drop a lot a complex code (all rtemap* stuff), and today I'm not seeing
> any big issue of doing it... maybe we'll find one during the
> discussion :)
I'm in favor of doing those changes but then I think we need to support
allocating
no contig memory through memzone/malloc or other libraries such as
librte_hash
may not be able to get the memory they need, right?
Otherwise all library would need a rework like the mempool series to
deal with
non-contig memory.
For contig memory, I would prefer a new API for dma areas (something
similar to
rte_eth_dma_zone_reserve() in ethdev) that would transparently deal with
the case
where we have multiple huge page sizes.
Sergio
> Regards,
> Olivier
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] memory allocation requirements
2016-04-13 16:03 [dpdk-dev] memory allocation requirements Thomas Monjalon
` (2 preceding siblings ...)
2016-04-14 14:46 ` Olivier MATZ
@ 2016-05-18 10:28 ` Alejandro Lucero
3 siblings, 0 replies; 8+ messages in thread
From: Alejandro Lucero @ 2016-05-18 10:28 UTC (permalink / raw)
To: Thomas Monjalon, Burakov, Anatoly; +Cc: sergio.gonzalez.monroy, dev
On Wed, Apr 13, 2016 at 5:03 PM, Thomas Monjalon <thomas.monjalon@6wind.com>
wrote:
> After looking at the patches for container support, it appears that
> some changes are needed in the memory management:
> http://thread.gmane.org/gmane.comp.networking.dpdk.devel/32786/focus=32788
>
> I think it is time to collect what are the needs and expectations of
> the DPDK memory allocator. The goal is to satisfy every needs while
> cleaning the API.
> Here is a first try to start the discussion.
>
> The memory allocator has 2 classes of API in DPDK.
> First the user/application allows or requires DPDK to take over some
> memory resources of the system. The characteristics can be:
> - numa node
> - page size
> - swappable or not
> - contiguous (cannot be guaranteed) or not
> - physical address (as root only)
> Then the drivers or other libraries use the memory through
> - rte_malloc
> - rte_memzone
> - rte_mempool
> I think we can integrate the characteristics of the requested memory
> in rte_malloc. Then rte_memzone would be only a named rte_malloc.
> The rte_mempool still focus on collection of objects with cache.
>
> If a rework happens, maybe that the build options CONFIG_RTE_LIBRTE_IVSHMEM
> and CONFIG_RTE_EAL_SINGLE_FILE_SEGMENTS can be removed.
> The Xen support should also be better integrated.
>
> Currently, the first class of API is directly implemented as command line
> parameters. Please let's think of C functions first.
> The EAL parameters should simply wrap some API functions and let the
> applications tune the memory initialization with a well documented API.
>
> Probably that I forget some needs, e.g. for the secondary processes.
> Please comment.
>
Just to mention VFIO IOMMU mapping should be adjusted for just those
memsegs physically contiguous which rte_pktmbuf_pool_create will allocate
along with those hugepages backing driver/device descriptor rings. Mapping
all the memsegs is not a performance issue but I think it is the right
thing to do.
Maybe some memseg flag like "DMA_CAPABLE" or similar should be used for
IOMMU mapping.
Other question is avoiding to use mbufs from no "DMA_CAPABLE" segments with
a device. I'm thinking about an DPDK app using a virtio network driver and
a device-backed PMD at the same time what could be a possibility for having
best of both worlds (intra-host and inter-host VM communications).
^ permalink raw reply [flat|nested] 8+ messages in thread