Hi Alex,
PF MTU is 9000, VF MTU is 2000 (tried with 1500 also and get same crash).
Here is the test topology.
Traffic pattern:
downlink: gtpu packet length 1236, throughput: 3.65Gbps
uplink: gtpu packet length 302, throughput 0.47Gbps
Crash seen in the l2hicu Test2 container.
This commit (547b239a21) you mentioned in the other
mail chain is not included in my dpdk version.
Br, Xiaoping
From: Alexander Kozyrev <akozyrev@nvidia.com>
Sent: 2023年7月12日 4:48
To: Xiaoping Yan (NSB) <xiaoping.yan@nokia-sbell.com>
Cc: Matan Azrad <matan@nvidia.com>; users@dpdk.org
Subject: [External] RE: dpdk mlx5 driver crash in rxq_cq_decompress_v
|
CAUTION: This is an external email. Please be very careful when clicking
links or opening attachments. See http://nok.it/nsb for additional information. |
Hi Xiaoping, I cannot reproduce the issue locally, all the fixes for CQE recovery are the part of 22.11.2 already.
Would you mind sharing more information about your setup, test-case and traffic characteristics? Do you have VF/PF MTU mismatch?
Regards,
Alex
From: Xiaoping Yan (NSB) <xiaoping.yan@nokia-sbell.com>
Sent: Tuesday, July 4, 2023 10:57 PM
To: Alexander Kozyrev <akozyrev@nvidia.com>
Subject: RE: dpdk mlx5 driver crash in rxq_cq_decompress_v
Hi Alex,
Here is the CQE
Br, Xiaoping
From: Alexander Kozyrev <akozyrev@nvidia.com>
Sent: 2023年7月5日 9:45
To: Matan Azrad <matan@nvidia.com>; Xiaoping Yan (NSB) <xiaoping.yan@nokia-sbell.com>;
users@dpdk.org; Dekel Peled <dekelp@nvidia.com>
Subject: [External] RE: dpdk mlx5 driver crash in rxq_cq_decompress_v
|
CAUTION: This is an external email. Please be very careful when clicking
links or opening attachments. See http://nok.it/nsb for additional information. |
Hi
Xiaoping, could you please forward the error CQE dump to me?
Would you mind elaborating more on your traffic pattern and test case scenario?
The following commit supposed to ignore MTU mismatch error between VF and PF:
547b239a21 net/mlx5: ignore non-critical syndromes for Rx queue
Regards,
Alex
From: Matan Azrad <matan@nvidia.com>
Sent: Sunday, July 2, 2023 11:35 PM
To: Xiaoping Yan (NSB) <xiaoping.yan@nokia-sbell.com>;
users@dpdk.org; Dekel Peled <dekelp@nvidia.com>; Alexander Kozyrev <akozyrev@nvidia.com>
Subject: Re: dpdk mlx5 driver crash in rxq_cq_decompress_v
+
@Alexander Kozyrev to suggest.
From: Xiaoping Yan (NSB) <xiaoping.yan@nokia-sbell.com>
Sent: Monday, July 3, 2023 4:18:22 AM
To: users@dpdk.org <users@dpdk.org>; Matan Azrad <matan@nvidia.com>;
dekelp@nvidia.com <dekelp@nvidia.com>
Subject: RE: dpdk mlx5 driver crash in rxq_cq_decompress_v
External email: Use caution opening links or attachments
|
Hi,
@'dekelp@nvidia.com'@'Matan
Azrad' Can you kindly suggest?
Thank you.
Br, Xiaoping
From: Xiaoping Yan (NSB)
Sent: 2023年6月27日
12:11
To: users@dpdk.org; 'Matan Azrad' <matan@nvidia.com>; 'dekelp@nvidia.com' <dekelp@nvidia.com>
Subject: dpdk mlx5 driver crash in rxq_cq_decompress_v
Hi,
dpdk version in use: 21.11.2
Mlx5 driver crashes in rxq_cq_decompress_v in traffic test after several minutes.
Stack trace:
(gdb) bt
#0 0x00007ffff58612bc in _mm_storeu_si128 (__B=..., __P=<optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/12/include/emmintrin.h:739
#1 rxq_cq_decompress_v (rxq=rxq@entry=0x2abe5592f40, cq=cq@entry=0x2abe54fdb00, elts=elts@entry=0x2abe5594638)
at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:142
#2 0x00007ffff5862c84 in rxq_burst_v (no_cq=<synthetic pointer>, err=0x7fffffffb848,
pkts_n=4, pkts=<optimized out>,
rxq=0x2abe5592f40) at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec.c:349
#3 mlx5_rx_burst_vec (dpdk_rxq=0x2abe5592f40, pkts=0x7fffffffbf80, pkts_n=32) at ../dpdk-21.11/drivers/net/mlx5/mlx5_rxtx_vec.c:393
#4 0x00005555556a0f41 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fffffffbf80, queue_id=0,
port_id=1)
at /usr/include/rte_ethdev.h:5721
…
Attached is the error log “Unexpected CQE error syndrome…” and dump file
I found there was a similar bug here:
https://bugs.dpdk.org/show_bug.cgi?id=334
But the fix (88c0733535d6 extend Rx completion with error handling) should already been included, as I’m using 21.11.2
Also below commit (fix to 88c0733535d6) is already included in my dpdk version.
commit 60b254e3923d007bcadbb8d410f95ad89a2f13fa
Author: Matan Azrad
matan@nvidia.com
Date: Thu Aug 11 19:51:55 2022 +0300
net/mlx5: fix Rx queue recovery mechanism
Any suggestion?
Thank you.
Br, Xiaoping