Hi Dariusz,
Thanks a lot for looking into this.
I am attaching the infos you requested to this email. I reproduced the issue described below on another
machine and that machine has two Nvidia cards and a newer ConnectX6 Firmware.
The card I used for testing and reproducing is ConnectX6 on PCI address 0000:3b:00.0 and 0000:3b:00.1
I ran the commands I mentioned below in the email and PF1 traffic of this card to linux kernel was cut off.
--------
pci/0000:3b:00.0:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000359
running:
fw.version 22.41.1000
fw 22.41.1000
stored:
fw.version 22.41.1000
fw 22.41.1000
auxiliary/mlx5_core.eth.0:
driver mlx5_core.eth
pci/0000:3b:00.1:
driver mlx5_core
versions:
fixed:
fw.psid MT_0000000359
running:
fw.version 22.41.1000
fw 22.41.1000
stored:
fw.version 22.41.1000
fw 22.41.1000
Linux Kernel Version: 6.6.12
--------
We didn’t configure any LAG but we enabled this firmware setting "LAG_RESOURCE_ALLOCATION"
as it is needed for multiport eswitch per documentation here:
https://doc.dpdk.org/guides/nics/mlx5.html#id1
Linux logs and sysfs / devlink outputs are on attach as a text file.
Thanks & Regards,
Guvenc Gulce
-----Original Message-----
From: Dariusz Sosnowski
Sent: Wednesday, 19 June 2024 20:13
To: Guelce, Guevenc ; users@dpdk.org
Subject: RE: Enabling multiport eswitch (mlx5) breaks PF1 bifurcation immediately
Hi,
> From: Guelce, Guevenc
> Sent: Friday, June 14, 2024 11:18
> To: users@dpdk.org
> Cc: Dariusz Sosnowski
> Subject: Enabling multiport eswitch (mlx5) breaks PF1 bifurcation
> immediately Hi all, Hi Dariusz,
>
>
> Thanks a lot for your help so far. We really appreciate it.
> I just want to touch base with this question which was asked by my colleague Tao a while back.
>
> Our question is actually quite simple. Issuing the commands listed
> below on a ConnectX-6 Dx Card breaks the bifurcated nature of the mlx5
> driver in linux kernel for PF1. (No traffic is forwarded to linux
> kernel anymore on PF1) You don’t need to start any testpmd or dpdk application. Just issuing the following commands below breaks the PF1 in linux kernel already.
>
> sudo devlink dev eswitch set pci/0000:8a:00.0 mode switchdev sudo
> devlink dev eswitch set pci/0000:8a:00.1 mode switchdev sudo devlink
> dev param set pci/0000:8a:00.0 name esw_multiport value true cmode
> runtime sudo devlink dev param set pci/0000:8a:00.1 name esw_multiport
> value true cmode runtime
>
>
> ---------
> pci/0000:8a:00.0:
> driver mlx5_core
> versions:
> fixed:
> fw.psid MT_0000000359
> running:
> fw.version 22.39.2048
> fw 22.39.2048
> Linux kernel version: 6.6.16
> DPDK: 23.11 (But not really needed to reproduce the issue) ---- environment>------
>
>
> This makes the eswitch multiport feature for us unusable. Could you please advise whether we are missing smt here ?
> As we are really keen to use this feature.
Could you please send us the following info? It would help with debugging the issue.
- Despite the Multiport E-Switch configuration, do you configure any additional bonding?
- Output of commands:
- sudo devlink dev param show
- for f in /sys/kernel/debug/mlx5/0000:8a:00.0/lag/*; do echo $f; cat $f; done
- for f in /sys/kernel/debug/mlx5/0000:8a:00.1/lag/*; do echo $f; cat $f; done
- Output of dmesg, ideally all logs since boot.
>
> Thanks & Regards
>
>
> Guvenc Gulce
>
Best regards,
Dariusz Sosnowski