DPDK usage discussions
 help / color / mirror / Atom feed
* Performance Bottleneck at NIC with Openshift
@ 2024-04-05 12:40 Tanmay Pandey
  2024-04-25 16:38 ` Dariusz Sosnowski
  0 siblings, 1 reply; 4+ messages in thread
From: Tanmay Pandey @ 2024-04-05 12:40 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 3377 bytes --]

Hi,


I am using DPDK version 22.11 for performance evaluation running PRoX on an Openshift Cluster where I have created two pods – I am sending traffic from one and receiving on the other and I’ve found that I’m unable to utilize more than 6GB of bandwidth in the server at the packet generation level. I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.

I’ve attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.11 NVIDIA Mellanox NIC performance report available at https://fast.dpdk.org/doc/perf/DPDK_22_11_NVIDIA_Mellanox_NIC_performance_report.pdf . However, the problem persists.

Additionally, I’ve investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet generation, but I’m uncertain about the underlying cause.

I am very new to DPDK so don’t really know how to debug this issue. I believe there is something happening between the NIC layer and Openshift.

Additionally, I used the same hardware running kubeadm where I was using the same DPDK and PRoX version with a similar setup and was able to achieve much better performance(at least for the packet generation part – where my current bottleneck occurs).

Can someone point me in the right direction?

I would be happy to provide any other required information

Below are the SUT details:

Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

uname -r

5.14.0-284.54.1.rt14.339.el9_2.x86_64



ethtool -i enp216s0f0np0

driver: mlx5_core

version: 5.14.0-284.54.1.rt14.339.el9_2.

firmware-version: 22.35.2000 (MT_0000000359)

expansion-rom-version:

bus-info: 0000:d8:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes



## CPU

Architecture:            x86_64

  CPU op-mode(s):        32-bit, 64-bit

  Address sizes:         46 bits physical, 48 bits virtual

  Byte Order:            Little Endian

CPU(s):                  104

  On-line CPU(s) list:   0-103

Vendor ID:               GenuineIntel

  BIOS Vendor ID:        Intel

  Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

    BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz

Operating System:

cat /etc/os-release

NAME="Red Hat Enterprise Linux CoreOS"

ID="rhcos"

ID_LIKE="rhel fedora"

VERSION="415.92.202402201450-0"

VERSION_ID="4.15"

VARIANT="CoreOS"

VARIANT_ID=coreos

PLATFORM_ID="platform:el9"

PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"

ANSI_COLOR="0;31"

CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"

HOME_URL="https://www.redhat.com/"

DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.15/"

BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"

REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"

REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"

REDHAT_SUPPORT_PRODUCT_VERSION="4.15"

OPENSHIFT_VERSION="4.15"

RHEL_VERSION="9.2"

OSTREE_VERSION="415.92.202402201450-0"

OCP Cluster

oc version

Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59

Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3


[-- Attachment #2: Type: text/html, Size: 48411 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Performance Bottleneck at NIC with Openshift
  2024-04-05 12:40 Performance Bottleneck at NIC with Openshift Tanmay Pandey
@ 2024-04-25 16:38 ` Dariusz Sosnowski
  2024-04-26 12:21   ` Tanmay Pandey
  0 siblings, 1 reply; 4+ messages in thread
From: Dariusz Sosnowski @ 2024-04-25 16:38 UTC (permalink / raw)
  To: Tanmay Pandey; +Cc: users

Hi,

Since as you mentioned, the similar HW, the same DPDK and PRoX versions were able to achieve much better performance,
I'd guess that the problem might be related to how processes in pods are being scheduled with OpenShift. Specifically, I would:

- Check if both pods are not scheduled on the same cores.
- Verify if cores on which these pods are running, are isolated.

Anything interrupting threads responsible for generating traffic will hurt the performance.

Best regards,
Dariusz Sosnowski

> From: Tanmay Pandey <tanmay@voereir.com> 
> Sent: Friday, April 5, 2024 14:41
> To: users@dpdk.org
> Subject: Performance Bottleneck at NIC with Openshift
> 
> External email: Use caution opening links or attachments 
> 
> Hi,
> 
> I am using DPDK version 22.11 for performance evaluation running PRoX on an Openshift Cluster where I have created two pods - I am sending traffic from one and receiving on the other and I've found that I'm unable to utilize more than 6GB of bandwidth in the server at the packet generation level. I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.
> I've attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.11 NVIDIA Mellanox NIC performance report available at https://fast.dpdk.org/doc/perf/DPDK_22_11_NVIDIA_Mellanox_NIC_performance_report.pdf . However, the problem persists.
> Additionally, I've investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet generation, but I'm uncertain about the underlying cause.
> I am very new to DPDK so don't really know how to debug this issue. I believe there is something happening between the NIC layer and Openshift. 
> Additionally, I used the same hardware running kubeadm where I was using the same DPDK and PRoX version with a similar setup and was able to achieve much better performance(at least for the packet generation part - where my current bottleneck occurs).
> Can someone point me in the right direction? 
> I would be happy to provide any other required information
> Below are the SUT details:
> Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
> uname -r
> 5.14.0-284.54.1.rt14.339.el9_2.x86_64
> 
> ethtool -i enp216s0f0np0
> driver: mlx5_core
> version: 5.14.0-284.54.1.rt14.339.el9_2.
> firmware-version: 22.35.2000 (MT_0000000359)
> expansion-rom-version:
> bus-info: 0000:d8:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
> 
> ## CPU
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  104
>   On-line CPU(s) list:   0-103
> Vendor ID:               GenuineIntel
>   BIOS Vendor ID:        Intel
>   Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
>     BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
> Operating System:
> cat /etc/os-release
> NAME="Red Hat Enterprise Linux CoreOS"
> ID="rhcos"
> ID_LIKE="rhel fedora"
> VERSION="415.92.202402201450-0"
> VERSION_ID="4.15"
> VARIANT="CoreOS"
> VARIANT_ID=coreos
> PLATFORM_ID="platform:el9"
> PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
> HOME_URL="https://www.redhat.com/"
> DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.15/"
> BUG_REPORT_URL="https://bugzilla.redhat.com/"
> REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
> REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"
> REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
> REDHAT_SUPPORT_PRODUCT_VERSION="4.15"
> OPENSHIFT_VERSION="4.15"
> RHEL_VERSION="9.2"
> OSTREE_VERSION="415.92.202402201450-0"
> OCP Cluster
> oc version
> Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59
> Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance Bottleneck at NIC with Openshift
  2024-04-25 16:38 ` Dariusz Sosnowski
@ 2024-04-26 12:21   ` Tanmay Pandey
  2024-04-29  8:59     ` Dariusz Sosnowski
  0 siblings, 1 reply; 4+ messages in thread
From: Tanmay Pandey @ 2024-04-26 12:21 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 4617 bytes --]

Hey Darius,

Thanks for the help. Actually the issue was very silly – our SriovNetwork resource had a maxTxRate and minTxRate specified due to which it was capping the generation in the first place.

Removing that fixed the generation cap.

Regards
Tanmay


From: Dariusz Sosnowski <dsosnowski@nvidia.com>
Date: Thursday, 25 April 2024 at 10:08 PM
To: Tanmay Pandey <tanmay@voereir.com>
Cc: users@dpdk.org <users@dpdk.org>
Subject: RE: Performance Bottleneck at NIC with Openshift
Hi,

Since as you mentioned, the similar HW, the same DPDK and PRoX versions were able to achieve much better performance,
I'd guess that the problem might be related to how processes in pods are being scheduled with OpenShift. Specifically, I would:

- Check if both pods are not scheduled on the same cores.
- Verify if cores on which these pods are running, are isolated.

Anything interrupting threads responsible for generating traffic will hurt the performance.

Best regards,
Dariusz Sosnowski

> From: Tanmay Pandey <tanmay@voereir.com>
> Sent: Friday, April 5, 2024 14:41
> To: users@dpdk.org
> Subject: Performance Bottleneck at NIC with Openshift
>
> External email: Use caution opening links or attachments
>
> Hi,
>
> I am using DPDK version 22.11 for performance evaluation running PRoX on an Openshift Cluster where I have created two pods - I am sending traffic from one and receiving on the other and I've found that I'm unable to utilize more than 6GB of bandwidth in the server at the packet generation level. I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.
> I've attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.11 NVIDIA Mellanox NIC performance report available at https://fast.dpdk.org/doc/perf/DPDK_22_11_NVIDIA_Mellanox_NIC_performance_report.pdf . However, the problem persists.
> Additionally, I've investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet generation, but I'm uncertain about the underlying cause.
> I am very new to DPDK so don't really know how to debug this issue. I believe there is something happening between the NIC layer and Openshift.
> Additionally, I used the same hardware running kubeadm where I was using the same DPDK and PRoX version with a similar setup and was able to achieve much better performance(at least for the packet generation part - where my current bottleneck occurs).
> Can someone point me in the right direction?
> I would be happy to provide any other required information
> Below are the SUT details:
> Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
> uname -r
> 5.14.0-284.54.1.rt14.339.el9_2.x86_64
>
> ethtool -i enp216s0f0np0
> driver: mlx5_core
> version: 5.14.0-284.54.1.rt14.339.el9_2.
> firmware-version: 22.35.2000 (MT_0000000359)
> expansion-rom-version:
> bus-info: 0000:d8:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: no
> supports-register-dump: no
> supports-priv-flags: yes
>
> ## CPU
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  104
>   On-line CPU(s) list:   0-103
> Vendor ID:               GenuineIntel
>   BIOS Vendor ID:        Intel
>   Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
>     BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
> Operating System:
> cat /etc/os-release
> NAME="Red Hat Enterprise Linux CoreOS"
> ID="rhcos"
> ID_LIKE="rhel fedora"
> VERSION="415.92.202402201450-0"
> VERSION_ID="4.15"
> VARIANT="CoreOS"
> VARIANT_ID=coreos
> PLATFORM_ID="platform:el9"
> PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
> HOME_URL="https://www.redhat.com/"
> DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.15/"
> BUG_REPORT_URL="https://bugzilla.redhat.com/"
> REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
> REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"
> REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
> REDHAT_SUPPORT_PRODUCT_VERSION="4.15"
> OPENSHIFT_VERSION="4.15"
> RHEL_VERSION="9.2"
> OSTREE_VERSION="415.92.202402201450-0"
> OCP Cluster
> oc version
> Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59
> Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

[-- Attachment #2: Type: text/html, Size: 9827 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Performance Bottleneck at NIC with Openshift
  2024-04-26 12:21   ` Tanmay Pandey
@ 2024-04-29  8:59     ` Dariusz Sosnowski
  0 siblings, 0 replies; 4+ messages in thread
From: Dariusz Sosnowski @ 2024-04-29  8:59 UTC (permalink / raw)
  To: Tanmay Pandey; +Cc: users

Hi,

Thanks for info. Glad the issue was resolved.

Best regards,
Dariusz Sosnowski

> From: Tanmay Pandey <tanmay@voereir.com> 
> Sent: Friday, April 26, 2024 14:22
> To: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Cc: users@dpdk.org
> Subject: Re: Performance Bottleneck at NIC with Openshift
> 
> External email: Use caution opening links or attachments 
> 
> Hey Darius,
> 
> Thanks for the help. Actually the issue was very silly – our SriovNetwork resource had a maxTxRate and minTxRate specified due to which it was capping > the generation in the first place. 
> 
> Removing that fixed the generation cap. 
> 
> Regards
> Tanmay 
> 
> 
> From: Dariusz Sosnowski <mailto:dsosnowski@nvidia.com>
> Date: Thursday, 25 April 2024 at 10:08 PM
> To: Tanmay Pandey <mailto:tanmay@voereir.com>
> Cc: mailto:users@dpdk.org <mailto:users@dpdk.org>
> Subject: RE: Performance Bottleneck at NIC with Openshift
> Hi,
> 
> Since as you mentioned, the similar HW, the same DPDK and PRoX versions were able to achieve much better performance,
> I'd guess that the problem might be related to how processes in pods are being scheduled with OpenShift. Specifically, I would:
> 
> - Check if both pods are not scheduled on the same cores.
> - Verify if cores on which these pods are running, are isolated.
> 
> Anything interrupting threads responsible for generating traffic will hurt the performance.
> 
> Best regards,
> Dariusz Sosnowski
> 
> > From: Tanmay Pandey <mailto:tanmay@voereir.com> 
> > Sent: Friday, April 5, 2024 14:41
> > To: mailto:users@dpdk.org
> > Subject: Performance Bottleneck at NIC with Openshift
> > 
> > External email: Use caution opening links or attachments 
> > 
> > Hi,
> > 
> > I am using DPDK version 22.11 for performance evaluation running PRoX on an Openshift Cluster where I have created two pods - I am sending traffic > from one and receiving on the other and I've found that I'm unable to utilize more than 6GB of bandwidth in the server at the packet generation level. > I have tested with a 64-byte frame size and achieved a maximum of 6.99 MPPS.
> > I've attempted to address this issue by adhering to the recommendations outlined in the DPDK 22.11 NVIDIA Mellanox NIC performance report available > at https://fast.dpdk.org/doc/perf/DPDK_22_11_NVIDIA_Mellanox_NIC_performance_report.pdf . However, the problem persists.
> > Additionally, I've investigated packet loss at the NIC interface level and found no anomalies. The bottleneck appears to be related to packet > generation, but I'm uncertain about the underlying cause.
> > I am very new to DPDK so don't really know how to debug this issue. I believe there is something happening between the NIC layer and Openshift. 
> > Additionally, I used the same hardware running kubeadm where I was using the same DPDK and PRoX version with a similar setup and was able to achieve > much better performance(at least for the packet generation part - where my current bottleneck occurs).
> > Can someone point me in the right direction? 
> > I would be happy to provide any other required information
> > Below are the SUT details:
> > Nic Model: Ethernet controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
> > uname -r
> > 5.14.0-284.54.1.rt14.339.el9_2.x86_64
> > 
> > ethtool -i enp216s0f0np0
> > driver: mlx5_core
> > version: 5.14.0-284.54.1.rt14.339.el9_2.
> > firmware-version: 22.35.2000 (MT_0000000359)
> > expansion-rom-version:
> > bus-info: 0000:d8:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: no
> > supports-register-dump: no
> > supports-priv-flags: yes
> > 
> > ## CPU
> > Architecture:            x86_64
> >   CPU op-mode(s):        32-bit, 64-bit
> >   Address sizes:         46 bits physical, 48 bits virtual
> >   Byte Order:            Little Endian
> > CPU(s):                  104
> >   On-line CPU(s) list:   0-103
> > Vendor ID:               GenuineIntel
> >   BIOS Vendor ID:        Intel
> >   Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
> >     BIOS Model name:     Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
> > Operating System:
> > cat /etc/os-release
> > NAME="Red Hat Enterprise Linux CoreOS"
> > ID="rhcos"
> > ID_LIKE="rhel fedora"
> > VERSION="415.92.202402201450-0"
> > VERSION_ID="4.15"
> > VARIANT="CoreOS"
> > VARIANT_ID=coreos
> > PLATFORM_ID="platform:el9"
> > PRETTY_NAME="Red Hat Enterprise Linux CoreOS 415.92.202402201450-0 (Plow)"
> > ANSI_COLOR="0;31"
> > CPE_NAME="cpe:/o:redhat:enterprise_linux:9::coreos"
> > HOME_URL="https://www.redhat.com/"
> > DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.15/"
> > BUG_REPORT_URL="https://bugzilla.redhat.com/"
> > REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
> > REDHAT_BUGZILLA_PRODUCT_VERSION="4.15"
> > REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
> > REDHAT_SUPPORT_PRODUCT_VERSION="4.15"
> > OPENSHIFT_VERSION="4.15"
> > RHEL_VERSION="9.2"
> > OSTREE_VERSION="415.92.202402201450-0"
> > OCP Cluster
> > oc version
> > Client Version: 4.15.0-202402070507.p0.g48dcf59.assembly.stream-48dcf59
> > Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-29  8:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-05 12:40 Performance Bottleneck at NIC with Openshift Tanmay Pandey
2024-04-25 16:38 ` Dariusz Sosnowski
2024-04-26 12:21   ` Tanmay Pandey
2024-04-29  8:59     ` Dariusz Sosnowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).