Hello Bruce,

 

Thank you for your mail!

 

I have attached the complete log that we observed.

 

Upon your suggestion, we experimented with the --in-memory option and it worked!  Since this option enables both --no-shconf and --huge-unlink. We also tried each option separately. The test ran fine with the --no-shconf option, but it failed when only the --huge-unlink option was added.

Now when --no-shconf option also worked, we also need some understanding whether it will impact multi-queue performance and if there are any other costs associated with using this flag.

 

 

 

Best Regards,

Avijit Pandey
Cloud SME | VoerEirAB
+919598570190

From: Bruce Richardson <bruce.richardson@intel.com>
Date: Wednesday, 27 March 2024 at 20:25
To: Avijit Pandey <Avijit@voereir.com>
Cc: dev@dpdk.org <dev@dpdk.org>
Subject: Re: Error in rte_eal_init() when multiple PODs over single node of K8 cluster

On Wed, Mar 27, 2024 at 12:42:55PM +0000, Avijit  Pandey wrote:
>    Hello Devs,
>
>
>    I hope this email finds you well.
>
>    I am reaching out to seek assistance regarding an issue I am facing in
>    DPDK within my Kubernetes cluster.
>
>
>    I have deployed a Kubernetes cluster v1.26.0, and I am currently
>    running network testing through DPPD-PRoX ([1]commit/02425932) using
>    DPDK (v22.11.0). I have deployed 3 pairs of PODs (3 server pods and 3
>    client pods) on a single K8 node. The server generates and sends
>    traffic to the receiver pod.
>
>
>    During the automated testing, I encounter an error: "Error in
>    rte_eal_init()." This error occurs randomly, and I am unable to
>    determine the root cause. However, this issue does not occur when I use
>    a single pair of PODs (1 server pod and 1 client pod). The traffic is
>    sent and received through the sriov NICs.
>
>

<snip>
>            With master core index 23, full core mask is 0x2800000
>
>            EAL command line: /opt/samplevnf/VNFs/DPPD-PROX/build/prox
>    -c0x2800000 --main-lcore=23 -n4 --allow 0000:86:04.6
>
>    error   Error in rte_eal_init()
>
>

Not sure what the problem is exactly, without a better error message. Can
you manage to provide the EAL output in the failure case, perhaps using
--log-level flag to up the log levels a bit higher if the error is not
clear from the default output.

Also, in case of running multiple instances of DPDK on a single system, I'd
generally recommend passing --in-memory flag to each instance to avoid
issues with conflicts over hugepage files. (This will disable support for
DPDK multi-process operation, so don't use the flag if that is a feature
you are using.)

/Bruce

PS: couple of other comments on your commandline that may be of interest,
since it's a little longer than it needs to be :-)
 - We'd generally recommend, for clarity, using "-l" flag rather than "-c"
   for passing core masks. In your case "-c 0x2800000" should be equivalent
   to the more comprehensible "-l 23,25".
 - DPDK always uses the lowest core number as the main lcore, so in the
   example above --main-lcore=23 should be superfluous and can be omitted
 - For mempool creation, -n 4 is the default in DPDK if unsupecified, so
   again that flag can be dropped without impact, unless something specific
   in the app depends on it in some other way.
 - If you want to shorten your allow list a little, the "0000:" can be
   dropped from the PCI address. So "--allow 0000:86:04.6" can be
   "-a 86:04.6"