DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] libhugetlbfs
@ 2015-07-22 10:40 Thomas Monjalon
  2015-07-23  7:34 ` Gonzalez Monroy, Sergio
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas Monjalon @ 2015-07-22 10:40 UTC (permalink / raw)
  To: sergio.gonzalez.monroy; +Cc: dev

Sergio,

As the maintainer of memory allocation, would you consider using
libhugetlbfs in DPDK for Linux?
It may simplify a part of our memory allocator and avoid some potential
bugs which would be already fixed in the dedicated lib.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] libhugetlbfs
  2015-07-22 10:40 [dpdk-dev] libhugetlbfs Thomas Monjalon
@ 2015-07-23  7:34 ` Gonzalez Monroy, Sergio
  2015-07-23  8:12   ` Thomas Monjalon
  2015-07-23 12:08   ` Burakov, Anatoly
  0 siblings, 2 replies; 6+ messages in thread
From: Gonzalez Monroy, Sergio @ 2015-07-23  7:34 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 22/07/2015 11:40, Thomas Monjalon wrote:
> Sergio,
>
> As the maintainer of memory allocation, would you consider using
> libhugetlbfs in DPDK for Linux?
> It may simplify a part of our memory allocator and avoid some potential
> bugs which would be already fixed in the dedicated lib.
I did have a look at it a couple of months ago and I thought there were 
a few issues:
- get_hugepage_region/get_huge_pages only allocates default size huge pages
   (you can set a different default huge page size with environment 
variables but no
   support for multiple sizes) plus we have no guarantee on physically 
contiguous pages.
- That leave us with 
hugetlbfs_unlinked_fd/hugetlbfs_unlinked_fd_for_size. These APIs
   wouldn't simplify a lot the current code, just the allocation of the 
pages themselves
   (ie. creating a file in hugetlbfs mount).
   Then there is the issue with multi-process; because they return a 
file descriptor while
   unlinking the file, we would need some sort of Inter-Process 
Communication to pass
   the descriptors to secondary processes.
- Not a big deal but AFAIK it is not possible to have multiple mount 
points for the same
   hugepage size, and even if you do, hugetlbfs_find_path_for_size 
returns always the
   same path (ie. first found in list).
- We still need to parse /proc/self/pagemap to get physical address of 
mapped hugepages.

I guess that if we were to push for a new API such as 
hugetlbfs_fd_for_size, we could use
it for the hugepage allocation, but we still would have to parse 
/proc/self/pagemap to get
physical address and then order those hugepages.

Thoughts?

Sergio

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] libhugetlbfs
  2015-07-23  7:34 ` Gonzalez Monroy, Sergio
@ 2015-07-23  8:12   ` Thomas Monjalon
  2015-07-23  9:29     ` Gonzalez Monroy, Sergio
  2015-07-23 12:08   ` Burakov, Anatoly
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Monjalon @ 2015-07-23  8:12 UTC (permalink / raw)
  To: Gonzalez Monroy, Sergio; +Cc: dev

2015-07-23 08:34, Gonzalez Monroy, Sergio:
> On 22/07/2015 11:40, Thomas Monjalon wrote:
> > Sergio,
> >
> > As the maintainer of memory allocation, would you consider using
> > libhugetlbfs in DPDK for Linux?
> > It may simplify a part of our memory allocator and avoid some potential
> > bugs which would be already fixed in the dedicated lib.
> I did have a look at it a couple of months ago and I thought there were 
> a few issues:
> - get_hugepage_region/get_huge_pages only allocates default size huge pages
>    (you can set a different default huge page size with environment 
> variables but no
>    support for multiple sizes) plus we have no guarantee on physically 
> contiguous pages.

Speaking about that, we don't always need contiguous pages.
Maybe we should take it into account when reserving memory.
Some flags DMA (locked physical pages that are not swappable) and CONTIGUOUS
may be considered.

> - That leave us with 
> hugetlbfs_unlinked_fd/hugetlbfs_unlinked_fd_for_size. These APIs
>    wouldn't simplify a lot the current code, just the allocation of the 
> pages themselves
>    (ie. creating a file in hugetlbfs mount).
>    Then there is the issue with multi-process; because they return a 
> file descriptor while
>    unlinking the file, we would need some sort of Inter-Process 
> Communication to pass
>    the descriptors to secondary processes.
> - Not a big deal but AFAIK it is not possible to have multiple mount 
> points for the same
>    hugepage size, and even if you do, hugetlbfs_find_path_for_size 
> returns always the
>    same path (ie. first found in list).
> - We still need to parse /proc/self/pagemap to get physical address of 
> mapped hugepages.
> 
> I guess that if we were to push for a new API such as 
> hugetlbfs_fd_for_size, we could use
> it for the hugepage allocation, but we still would have to parse 
> /proc/self/pagemap to get
> physical address and then order those hugepages.
> 
> Thoughts?

Why not extending the API and pushing our code to this lib?
It would allow to share the maintenance.

The same move could be done to libpciaccess.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] libhugetlbfs
  2015-07-23  8:12   ` Thomas Monjalon
@ 2015-07-23  9:29     ` Gonzalez Monroy, Sergio
  2015-07-23 11:47       ` Thomas Monjalon
  0 siblings, 1 reply; 6+ messages in thread
From: Gonzalez Monroy, Sergio @ 2015-07-23  9:29 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On 23/07/2015 09:12, Thomas Monjalon wrote:
> 2015-07-23 08:34, Gonzalez Monroy, Sergio:
>> On 22/07/2015 11:40, Thomas Monjalon wrote:
>>> Sergio,
>>>
>>> As the maintainer of memory allocation, would you consider using
>>> libhugetlbfs in DPDK for Linux?
>>> It may simplify a part of our memory allocator and avoid some potential
>>> bugs which would be already fixed in the dedicated lib.
>> I did have a look at it a couple of months ago and I thought there were
>> a few issues:
>> - get_hugepage_region/get_huge_pages only allocates default size huge pages
>>     (you can set a different default huge page size with environment
>> variables but no
>>     support for multiple sizes) plus we have no guarantee on physically
>> contiguous pages.
> Speaking about that, we don't always need contiguous pages.
> Maybe we should take it into account when reserving memory.
> Some flags DMA (locked physical pages that are not swappable) and CONTIGUOUS
> may be considered.
Sure. I think I also mentioned this as possible future work in the 
Dynamic Memzones RFC.
>> - That leave us with
>> hugetlbfs_unlinked_fd/hugetlbfs_unlinked_fd_for_size. These APIs
>>     wouldn't simplify a lot the current code, just the allocation of the
>> pages themselves
>>     (ie. creating a file in hugetlbfs mount).
>>     Then there is the issue with multi-process; because they return a
>> file descriptor while
>>     unlinking the file, we would need some sort of Inter-Process
>> Communication to pass
>>     the descriptors to secondary processes.
>> - Not a big deal but AFAIK it is not possible to have multiple mount
>> points for the same
>>     hugepage size, and even if you do, hugetlbfs_find_path_for_size
>> returns always the
>>     same path (ie. first found in list).
>> - We still need to parse /proc/self/pagemap to get physical address of
>> mapped hugepages.
>>
>> I guess that if we were to push for a new API such as
>> hugetlbfs_fd_for_size, we could use
>> it for the hugepage allocation, but we still would have to parse
>> /proc/self/pagemap to get
>> physical address and then order those hugepages.
>>
>> Thoughts?
> Why not extending the API and pushing our code to this lib?
> It would allow to share the maintenance.
>
> The same move could be done to libpciaccess.
I don't disagree with the idea of using libhugetlbfs, I just tried to 
point out that
it's not just a drop in replacement.

Sergio

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] libhugetlbfs
  2015-07-23  9:29     ` Gonzalez Monroy, Sergio
@ 2015-07-23 11:47       ` Thomas Monjalon
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas Monjalon @ 2015-07-23 11:47 UTC (permalink / raw)
  To: Gonzalez Monroy, Sergio; +Cc: dev

2015-07-23 10:29, Gonzalez Monroy, Sergio:
> On 23/07/2015 09:12, Thomas Monjalon wrote:
> > 2015-07-23 08:34, Gonzalez Monroy, Sergio:
> >> On 22/07/2015 11:40, Thomas Monjalon wrote:
> >>> Sergio,
> >>>
> >>> As the maintainer of memory allocation, would you consider using
> >>> libhugetlbfs in DPDK for Linux?
> >>> It may simplify a part of our memory allocator and avoid some potential
> >>> bugs which would be already fixed in the dedicated lib.
> >> I did have a look at it a couple of months ago and I thought there were
> >> a few issues:
> >> - get_hugepage_region/get_huge_pages only allocates default size huge pages
> >>     (you can set a different default huge page size with environment
> >> variables but no
> >>     support for multiple sizes) plus we have no guarantee on physically
> >> contiguous pages.
> > Speaking about that, we don't always need contiguous pages.
> > Maybe we should take it into account when reserving memory.
> > Some flags DMA (locked physical pages that are not swappable) and CONTIGUOUS
> > may be considered.
> Sure. I think I also mentioned this as possible future work in the 
> Dynamic Memzones RFC.
> >> - That leave us with
> >> hugetlbfs_unlinked_fd/hugetlbfs_unlinked_fd_for_size. These APIs
> >>     wouldn't simplify a lot the current code, just the allocation of the
> >> pages themselves
> >>     (ie. creating a file in hugetlbfs mount).
> >>     Then there is the issue with multi-process; because they return a
> >> file descriptor while
> >>     unlinking the file, we would need some sort of Inter-Process
> >> Communication to pass
> >>     the descriptors to secondary processes.
> >> - Not a big deal but AFAIK it is not possible to have multiple mount
> >> points for the same
> >>     hugepage size, and even if you do, hugetlbfs_find_path_for_size
> >> returns always the
> >>     same path (ie. first found in list).
> >> - We still need to parse /proc/self/pagemap to get physical address of
> >> mapped hugepages.
> >>
> >> I guess that if we were to push for a new API such as
> >> hugetlbfs_fd_for_size, we could use
> >> it for the hugepage allocation, but we still would have to parse
> >> /proc/self/pagemap to get
> >> physical address and then order those hugepages.
> >>
> >> Thoughts?
> > Why not extending the API and pushing our code to this lib?
> > It would allow to share the maintenance.
> >
> > The same move could be done to libpciaccess.
> I don't disagree with the idea of using libhugetlbfs, I just tried to 
> point out that
> it's not just a drop in replacement.

Yes, thank you for the fine analysis, Sergio.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] libhugetlbfs
  2015-07-23  7:34 ` Gonzalez Monroy, Sergio
  2015-07-23  8:12   ` Thomas Monjalon
@ 2015-07-23 12:08   ` Burakov, Anatoly
  1 sibling, 0 replies; 6+ messages in thread
From: Burakov, Anatoly @ 2015-07-23 12:08 UTC (permalink / raw)
  To: Gonzalez Monroy, Sergio, Thomas Monjalon; +Cc: dev

Hi Sergio

>    Then there is the issue with multi-process; because they return a file
> descriptor while
>    unlinking the file, we would need some sort of Inter-Process

That's kind of what VFIO does for multiprocess, so you may want to look at that in case you decide to get down that route :)

Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-07-23 12:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-22 10:40 [dpdk-dev] libhugetlbfs Thomas Monjalon
2015-07-23  7:34 ` Gonzalez Monroy, Sergio
2015-07-23  8:12   ` Thomas Monjalon
2015-07-23  9:29     ` Gonzalez Monroy, Sergio
2015-07-23 11:47       ` Thomas Monjalon
2015-07-23 12:08   ` Burakov, Anatoly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).