Hello Bruce,
Thank you for your mail!
I have attached the complete log that we observed.
Upon your suggestion, we experimented with the
--in-memory option and it worked! Since this option enables both
--no-shconf and
--huge-unlink. We also tried each option separately. The test ran fine with the
--no-shconf option, but it failed when only the
--huge-unlink option was added.
Now when
--no-shconf option also worked, we also need some understanding whether it will impact multi-queue performance
and if there are any other costs associated with using this flag.
Best Regards,
Avijit Pandey
Cloud SME | VoerEirAB
+919598570190
From:
Bruce Richardson <bruce.richardson@intel.com>
Date: Wednesday, 27 March 2024 at 20:25
To: Avijit Pandey <Avijit@voereir.com>
Cc: dev@dpdk.org <dev@dpdk.org>
Subject: Re: Error in rte_eal_init() when multiple PODs over single node of K8 cluster
On Wed, Mar 27, 2024 at 12:42:55PM +0000, Avijit Pandey wrote:
> Hello Devs,
>
>
> I hope this email finds you well.
>
> I am reaching out to seek assistance regarding an issue I am facing in
> DPDK within my Kubernetes cluster.
>
>
> I have deployed a Kubernetes cluster v1.26.0, and I am currently
> running network testing through DPPD-PRoX ([1]commit/02425932) using
> DPDK (v22.11.0). I have deployed 3 pairs of PODs (3 server pods and 3
> client pods) on a single K8 node. The server generates and sends
> traffic to the receiver pod.
>
>
> During the automated testing, I encounter an error: "Error in
> rte_eal_init()." This error occurs randomly, and I am unable to
> determine the root cause. However, this issue does not occur when I use
> a single pair of PODs (1 server pod and 1 client pod). The traffic is
> sent and received through the sriov NICs.
>
>
<snip>
> With master core index 23, full core mask is 0x2800000
>
> EAL command line: /opt/samplevnf/VNFs/DPPD-PROX/build/prox
> -c0x2800000 --main-lcore=23 -n4 --allow 0000:86:04.6
>
> error Error in rte_eal_init()
>
>
Not sure what the problem is exactly, without a better error message. Can
you manage to provide the EAL output in the failure case, perhaps using
--log-level flag to up the log levels a bit higher if the error is not
clear from the default output.
Also, in case of running multiple instances of DPDK on a single system, I'd
generally recommend passing --in-memory flag to each instance to avoid
issues with conflicts over hugepage files. (This will disable support for
DPDK multi-process operation, so don't use the flag if that is a feature
you are using.)
/Bruce
PS: couple of other comments on your commandline that may be of interest,
since it's a little longer than it needs to be :-)
- We'd generally recommend, for clarity, using "-l" flag rather than "-c"
for passing core masks. In your case "-c 0x2800000" should be equivalent
to the more comprehensible "-l 23,25".
- DPDK always uses the lowest core number as the main lcore, so in the
example above --main-lcore=23 should be superfluous and can be omitted
- For mempool creation, -n 4 is the default in DPDK if unsupecified, so
again that flag can be dropped without impact, unless something specific
in the app depends on it in some other way.
- If you want to shorten your allow list a little, the "0000:" can be
dropped from the PCI address. So "--allow 0000:86:04.6" can be
"-a 86:04.6"