DPDK usage discussions
 help / color / mirror / Atom feed
From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
To: Antonio Di Bacco <a.dibacco.ks@gmail.com>
Cc: users@dpdk.org
Subject: Re: Failure while allocating 1GB hugepages
Date: Wed, 5 Jun 2024 01:50:09 +0300	[thread overview]
Message-ID: <20240605015009.7f660149@sovereign> (raw)
In-Reply-To: <CAO8pfFmiTQzS9VgUVhm3o_hDa2biz62QdGFKc5EWgjFd2TKxSA@mail.gmail.com>

2024-06-03 14:39 (UTC+0200), Antonio Di Bacco:
> Hi,
> I have the same behaviour with the code in this message.
> 
> The first rte_memzone_reserve_aligned() call requesting 1.5GB
> contiguous memory always fails, while the second one is always
> successful.

Hi,

I can't explain the "always" part, but unstable behavior comes from
unpredictable IOVA (physical address) that DPDK gets from the kernel.
On the first try:

1. DPDK has no 1G hugepages mapped, it needs 2 more 1G hugepages.
   	alloc_pages_on_heap() -> eal_memalloc_alloc_seg_bulk()

2. DPDK asks the kernel for one 1G hugepage,
   kernel maps the hugepage with IOVA = 0xFC000000,
   DPDK stores it in memseg_arr[0].
	eal_memalloc_alloc_seg_bulk() -> alloc_seg()

3. Same for another hugepage and memseg_arr[1]->iova = 0xF8000000.

4. DPDK checks is the pages are continuous.
	alloc_pages_on_heap() -> eal_memalloc_is_contig() = false

5. Since it's a failure, DPDK frees newly allocated pages.
	alloc_pages_on_heap() -> rollback_expand_heap()

On the second try:

6. Steps 1 and 2 repeat, but now memseg_arr[0]->iova = 0xF8000000.
7. Step 3 repeats, but now memseg_arr[0]->iova = 0xFC000000.
8. IOVAs are continuous, success.

Just a wild guess why the second try may be likely to succeed:
memseg_arr[1] with IOVA = 0xF8000000 is freed last at step 5,
so maybe this is why the kernel is likely to reuse this page at step 6.

I'm afraid the simplest way to get PA-continuous 1.5G reliably
is indeed to try several times.
The preferred way is to use IOMMU and IOVA-as-VA if HW permits.

> It seems in eal_memalloc_is_contig() the 'msl->memseg_arr' items are inverted:
> when there is the sequence FC0000000, F80000000 the allocation fails,
> while the segments sequence F80000000, FC0000000 is fine.
> From my understaning 'msl->memseg_arr' comes from
> 'rte_eal_get_configuration()->mem_config;' which is rte_config
> declared in eal_common_config.c

Not quite, msl->memseg_arr content is dynamic, see above.

P.S. One may say, DPDK could do better.
It does have N hugepages occupying a continuous range of IOVA.
DPDK could make them VA-continuous by remapping.
But this would be more work, it still wouldn't be 100% reliable,
and still insecure and inflexible compared to IOMMU.

      reply	other threads:[~2024-06-04 22:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-10  9:33 Antonio Di Bacco
2024-05-10 15:07 ` Dmitry Kozlyuk
2024-05-22 10:22   ` Antonio Di Bacco
2024-05-30 10:28     ` Antonio Di Bacco
2024-05-30 15:00       ` Dmitry Kozlyuk
2024-06-03 12:39         ` Antonio Di Bacco
2024-06-04 22:50           ` Dmitry Kozlyuk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240605015009.7f660149@sovereign \
    --to=dmitry.kozliuk@gmail.com \
    --cc=a.dibacco.ks@gmail.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).