Hey Darius,

 

Thanks for the help. Actually the issue was very silly – our SriovNetwork resource had a maxTxRate and minTxRate specified due to which it was capping the generation in the first place.

 

Removing that fixed the generation cap.

 

Regards

Tanmay 

 

 

From: Dariusz Sosnowski <dsosnowski@nvidia.com>
Date: Thursday, 25 April 2024 at 10:08
PM
To: Tanmay Pandey <tanmay@voereir.com>
Cc: users@dpdk.org <users@dpdk.org>
Subject: RE: Performance Bottleneck at NIC with Openshift

Hi,

Since as you mentioned, the similar HW, the same DPDK and PRoX versions were able to achieve much better performance,
I'd guess that the problem might be related to how processes in pods are being scheduled with OpenShift. Specifically, I would:

- Check if both pods are not scheduled on the same cores.
- Verify if cores on which these pods are running, are isolated.

Anything interrupting threads responsible for generating traffic will hurt the performance.

Best regards,
Dariusz Sosnowski

> From: Tanmay Pandey <tanmay@voereir.com>
> Sent: Friday, April 5, 2024 14:41
> To: users@dpdk.org
> Subject: Performance Bottleneck at NIC with Openshift
>
> External email: Use caution opening links or attachments
>
> Hi,
>
> I am using DPDK version 22.11 for performance evaluation running PRoX on an Openshift Cluster where I have created two pods - I am sending traffic from one and receiving on the other and I've found that I'm unable to utilize more than 6GB of bandwidth in the server at the packet generation level. I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.
> I've attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.11 NVIDIA Mellanox NIC performance report available at
https://fast.dpdk.org/doc/perf/DPDK_22_11_NVIDIA_Mellanox_NIC_performance_report.pdf . However, the problem persists.
> Additionally, I've investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet generation, but I'm uncertain about the underlying cause.
> I am very new to DPDK so don't really know how to debug this issue. I believe there is something happening between the NIC layer and Openshift.
> Additionally, I used the same hardware running kubeadm where I was using the same DPDK and PRoX version with a similar setup and was able to achieve much better performance(at least for the packet generation part - where my current bottleneck occurs).
> Can someone point me in the right direction?
> I would be happy to provide any other required information
> Below are the SUT details:
> Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
> uname -r
> 5.14.0-284.54.1.rt14.339.el9_2.x86_64
>
> ethtool -i enp216s0f0np0
> driver: mlx5_core
> version: 5.14.0-284.54.1.rt14.339.el9_2.
> firmware-version: 22.35.2000 (MT_0000000359)
> expansion-rom-version:
> bus-info: 0000:d8:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
>
> ## CPU
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  104
>   On-line CPU(s) list:   0-103
> Vendor ID:               GenuineIntel
>   BIOS Vendor ID:        Intel
>   Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
>     BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
> Operating System:
> cat /etc/os-release
> NAME="Red Hat Enterprise Linux CoreOS"
> ID="rhcos"
> ID_LIKE="rhel fedora"
> VERSION="415.92.202402201450-0"
> VERSION_ID="4.15"
> VARIANT="CoreOS"
> VARIANT_ID=coreos
> PLATFORM_ID="platform:el9"
> PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
> HOME_URL="
https://www.redhat.com/"
> DOCUMENTATION_URL="
https://docs.openshift.com/container-platform/4.15/"
> BUG_REPORT_URL="
https://bugzilla.redhat.com/"
> REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
> REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"
> REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
> REDHAT_SUPPORT_PRODUCT_VERSION="4.15"
> OPENSHIFT_VERSION="4.15"
> RHEL_VERSION="9.2"
> OSTREE_VERSION="415.92.202402201450-0"
> OCP Cluster
> oc version
> Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59
> Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3