From: Bao-Long Tran <tranbaolong@niometrics.com>
To: anatoly.burakov@intel.com, olivier.matz@6wind.com,
arybchenko@solarflare.com
Cc: dev@dpdk.org, users@dpdk.org, ricudis@niometrics.com
Subject: [dpdk-dev] Inconsistent behavior of mempool with regards to hugepage allocation
Date: Mon, 23 Dec 2019 19:09:29 +0800 [thread overview]
Message-ID: <AEEF393A-B56D-4F06-B54F-5AF4022B1F2D@niometrics.com> (raw)
Hi,
I'm not sure if this is a bug, but I've seen an inconsistency in the behavior
of DPDK with regards to hugepage allocation for rte_mempool. Basically, for the
same mempool size, the number of hugepages allocated changes from run to run.
Here's how I reproduce with DPDK 19.11. IOVA=pa (default)
1. Reserve 16x1G hugepages on socket 0
2. Replace examples/skeleton/basicfwd.c with the code below, build and run
make && ./build/basicfwd
3. At the same time, watch the number of hugepages allocated
"watch -n.1 ls /dev/hugepages"
4. Repeat step 2
If you can reproduce, you should see that for some runs, DPDK allocates 5
hugepages, other times it allocates 6. When it allocates 6, if you watch the
output from step 3., you will see that DPDK first try to allocate 5 hugepages,
then unmap all 5, retry, and got 6.
For our use case, it's important that DPDK allocate the same number of
hugepages on every run so we can get reproducable results.
Studying the code, this seems to be the behavior of
rte_mempool_populate_default(). If I understand correctly, if the first try fail
to get 5 IOVA-contiguous pages, it retries, relaxing the IOVA-contiguous
condition, and eventually wound up with 6 hugepages.
Questions:
1. Why does the API sometimes fail to get IOVA contig mem, when hugepage memory
is abundant?
2. Why does the 2nd retry need N+1 hugepages?
Some insights for Q1: From my experiments, seems like the IOVA of the first
hugepage is not guaranteed to be at the start of the IOVA space (understandably).
It could explain the retry when the IOVA of the first hugepage is near the end of
the IOVA space. But I have also seen situation where the 1st hugepage is near
the beginning of the IOVA space and it still failed the 1st time.
Here's the code:
#include <rte_eal.h>
#include <rte_mbuf.h>
int
main(int argc, char *argv[])
{
struct rte_mempool *mbuf_pool;
unsigned mbuf_pool_size = 2097151;
int ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
printf("Creating mbuf pool size=%u\n", mbuf_pool_size);
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", mbuf_pool_size,
256, 0, RTE_MBUF_DEFAULT_BUF_SIZE, 0);
printf("mbuf_pool %p\n", mbuf_pool);
return 0;
}
Best regards,
BL
next reply other threads:[~2019-12-23 11:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-23 11:09 Bao-Long Tran [this message]
2019-12-26 15:45 ` Olivier Matz
2019-12-27 8:11 ` Olivier Matz
2019-12-27 10:05 ` Bao-Long Tran
2019-12-27 11:11 ` Olivier Matz
2020-01-07 13:06 ` Burakov, Anatoly
2020-01-09 13:32 ` Olivier Matz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AEEF393A-B56D-4F06-B54F-5AF4022B1F2D@niometrics.com \
--to=tranbaolong@niometrics.com \
--cc=anatoly.burakov@intel.com \
--cc=arybchenko@solarflare.com \
--cc=dev@dpdk.org \
--cc=olivier.matz@6wind.com \
--cc=ricudis@niometrics.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).