DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Mellanox Unexpected CQE error syndrome
@ 2020-10-29  8:18 Krauz, Pavel
       [not found] ` <DM5PR12MB2406C92A5955F6BF200F3DBCCD100@DM5PR12MB2406.namprd12.prod.outlook.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Krauz, Pavel @ 2020-10-29  8:18 UTC (permalink / raw)
  To: users

Hello,
I am having problem with HPE Ethernet 100Gb 2-port 841QSFP28 Adapter which is a Mellanox adapter for 100G network.

The DPDK driver reports and generates lot of error files like dpdk_mlx5_port_0_rxq_0_2459159054 and loses traffic (because IMHO it must reset the card):

the first line of the error report files is as follows:

Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 10040 rq_ci = 494774062 cq_ci = 3586794130
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 27509 rq_ci = 808774458 cq_ci = 1527072213
Unexpected CQE error syndrome 0x0e CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 2413356687
Unexpected CQE error syndrome 0xd4 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 1527072220
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 60345 rq_ci = 242051992 cq_ci = 1769091515
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 1138 rq_ci = 619349053 cq_ci = 3152294540
Unexpected CQE error syndrome 0xa0 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 897769578
Unexpected CQE error syndrome 0xf1 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 1769091529
Unexpected CQE error syndrome 0x75 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 3152294549
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 64529 rq_ci = 763919355 cq_ci = 2532978162
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 5267 rq_ci = 678728828 cq_ci = 3092052802
Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 46035 rq_ci = 3556062128 cq_ci = 2413356673
Unexpected CQE error syndrome 0x73 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 2532978172
Unexpected CQE error syndrome 0x40 CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 3092052808

I have tried latest card HP firmware and enable/disable CQE compression in the mlx5 DPDK driver using rxq_cqe_comp_en=0/1, but no improvement.

Does anybody know what can be the problem and how to mitigate it?

Thanks
Pavel Krauz


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-users] Mellanox Unexpected CQE error syndrome
       [not found]           ` <DM5PR12MB240692A7078BDDB69AD8227CCDFA0@DM5PR12MB2406.namprd12.prod.outlook.com>
@ 2020-12-11 10:01             ` Slava Ovsiienko
  2020-12-14 13:47               ` Krauz, Pavel
  0 siblings, 1 reply; 3+ messages in thread
From: Slava Ovsiienko @ 2020-12-11 10:01 UTC (permalink / raw)
  To: users, Pavel.Krauz

>-----Original Message-----
>From: users <users-bounces@dpdk.org> On Behalf Of Krauz, Pavel
>Sent: Thursday, October 29, 2020 10:18 AM
>To: users@dpdk.org
>Subject: [dpdk-users] Mellanox Unexpected CQE error syndrome
>
>Hello,
>I am having problem with HPE Ethernet 100Gb 2-port 841QSFP28
>Adapter which is a Mellanox adapter for 100G network.
>
>The DPDK driver reports and generates lot of error files like
>dpdk_mlx5_port_0_rxq_0_2459159054 and loses traffic (because>IMHO
>it must reset the card):
>
>the first line of the error report files is as follows:
>
>Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977
>wqe_counter = 10040 rq_ci = 494774062 cq_ci = 3586794130
>Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977
>wqe_counter = 27509 rq_ci= 808774458 cq_ci = 1527072213
>Unexpected CQE error syndrome 0x0e CQN = 1030 RQN = 12582977
>wqe_counter = 0 rq_ci = 32768 cq_ci = 2413356687
>
..snip..
>
>I have tried latest card HP firmware and enable/disable CQE
>compression in the mlx5 DPDK driver using rxq_cqe_comp_en=0/1,
>but no
improvement.
>
>Does anybody know what can be the problem and how to mitigate it?
>
>Thanks
>Pavel Krauz

Hi, Pavel.

Sorry, I missed this mail from users@ mailing list.
It would be nice to create an issue in DPDK Bugzilla
and have the dedicated thread to handle.

What DPDK version do you use? Syndromes in this report
are different ones, at my first glance it looks like as not
correct MTU or max packet length is configured. The NIC receives
the packet with the length exceeding for what queue was configured
and packet data overwrite the descriptors, resulting into syndromes.

With best regards,
Slava

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-users] Mellanox Unexpected CQE error syndrome
  2020-12-11 10:01             ` Slava Ovsiienko
@ 2020-12-14 13:47               ` Krauz, Pavel
  0 siblings, 0 replies; 3+ messages in thread
From: Krauz, Pavel @ 2020-12-14 13:47 UTC (permalink / raw)
  To: Slava Ovsiienko, users

Hi Slava,

Your suggestion made me realize that we do not set the flag for jumbo frames correctly. After setting the DEV_RX_OFFLOAD_JUMBO_FRAME the problem disappears.

However when we set the
port_conf.rxmode.max_rx_pkt_len = RTE_ETHER_MAX_LEN;

and do not enable jumbo frames offload, then we see the Mellanox debug files again.
(mbuf packet pool is configured with data_size = RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)

It seems like if there are jumbo frames in the monitored network but not enabled in the DPDK then the card goes to error state.
 
> It would be nice to create an issue in DPDK Bugzilla and have the dedicated thread to handle.

Yes, will do it for the case of jumbo frames not enabled in DPDK and pkt_len set low.

> What DPDK version do you use?

We are using dpdk 19.11.4

b.r.
Pavel



-----Original Message-----
From: Slava Ovsiienko <viacheslavo@nvidia.com> 
Sent: pátek 11. prosince 2020 11:01
To: users@dpdk.org; Krauz, Pavel <Pavel.Krauz@anritsu.com>
Subject: RE: Mellanox Unexpected CQE error syndrome

>-----Original Message-----
>From: users <users-bounces@dpdk.org> On Behalf Of Krauz, Pavel
>Sent: Thursday, October 29, 2020 10:18 AM
>To: users@dpdk.org
>Subject: [dpdk-users] Mellanox Unexpected CQE error syndrome
>
>Hello,
>I am having problem with HPE Ethernet 100Gb 2-port 841QSFP28 Adapter 
>which is a Mellanox adapter for 100G network.
>
>The DPDK driver reports and generates lot of error files like
>dpdk_mlx5_port_0_rxq_0_2459159054 and loses traffic (because>IMHO it 
>must reset the card):
>
>the first line of the error report files is as follows:
>
>Unexpected CQE error syndrome 0x22 CQN = 1030 RQN = 12582977 
>wqe_counter = 10040 rq_ci = 494774062 cq_ci = 3586794130 Unexpected CQE 
>error syndrome 0x22 CQN = 1030 RQN = 12582977 wqe_counter = 27509 
>rq_ci= 808774458 cq_ci = 1527072213 Unexpected CQE error syndrome 0x0e 
>CQN = 1030 RQN = 12582977 wqe_counter = 0 rq_ci = 32768 cq_ci = 
>2413356687
>
..snip..
>
>I have tried latest card HP firmware and enable/disable CQE compression 
>in the mlx5 DPDK driver using rxq_cqe_comp_en=0/1, but no
improvement.
>
>Does anybody know what can be the problem and how to mitigate it?
>
>Thanks
>Pavel Krauz

Hi, Pavel.

Sorry, I missed this mail from users@ mailing list.
It would be nice to create an issue in DPDK Bugzilla and have the dedicated thread to handle.

What DPDK version do you use? Syndromes in this report are different ones, at my first glance it looks like as not correct MTU or max packet length is configured. The NIC receives the packet with the length exceeding for what queue was configured and packet data overwrite the descriptors, resulting into syndromes.

With best regards,
Slava

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-14 13:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29  8:18 [dpdk-users] Mellanox Unexpected CQE error syndrome Krauz, Pavel
     [not found] ` <DM5PR12MB2406C92A5955F6BF200F3DBCCD100@DM5PR12MB2406.namprd12.prod.outlook.com>
     [not found]   ` <BN7PR12MB2707CCA96E2515FEB9AF4AF6AF110@BN7PR12MB2707.namprd12.prod.outlook.com>
     [not found]     ` <DM5PR12MB240694B8DB2FF96D0F0F734DCDEA0@DM5PR12MB2406.namprd12.prod.outlook.com>
     [not found]       ` <MWHPR12MB1501D91C00D0E6E37A052881DFE90@MWHPR12MB1501.namprd12.prod.outlook.com>
     [not found]         ` <DM5PR12MB2406108B1891BC8A869B0B40CDFD0@DM5PR12MB2406.namprd12.prod.outlook.com>
     [not found]           ` <DM5PR12MB240692A7078BDDB69AD8227CCDFA0@DM5PR12MB2406.namprd12.prod.outlook.com>
2020-12-11 10:01             ` Slava Ovsiienko
2020-12-14 13:47               ` Krauz, Pavel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).