Can you please try the last LTS 20.11.3?
We have some related fixes and we think the issue is already solved.

Regards,
Asaf Penso

From: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com>
Sent: Thursday, October 14, 2021 12:33 PM
To: Asaf Penso <asafp@nvidia.com>; users@dpdk.org
Cc: Slava Ovsiienko <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>; Raslan Darawsheh <rasland@nvidia.com>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

Hi,

I’m using 20.11
commit b1d36cf828771e28eb0130b59dcf606c2a0bc94d (HEAD, tag: v20.11)
Author: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>
Date:   Fri Nov 27 19:48:48 2020 +0100

    version: 20.11.0

Best regards
Yan Xiaoping

From: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Sent: 2021年10月14日 14:56
To: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>; users@dpdk.org<mailto:users@dpdk.org>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

Are you using the latest stable 20.11.3? If not, can you try?

Regards,
Asaf Penso

From: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>
Sent: Thursday, September 30, 2021 11:05 AM
To: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>; users@dpdk.org<mailto:users@dpdk.org>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

Hi,

In below log, we can clearly see packets are dropped between counter rx_unicast_packets  and rx_good_packets
But there is not any error/miss counter tell why/where packet is dropped.
Is this a known bug/limitation of Mellanox card?
Any suggestion?

Counter in  test center(traffic generator):
              Tx count: 617496152
              Rx count: 617475672
              Drop: 20480

testpmd started with:
dpdk-testpmd -l "2,3" --legacy-mem --socket-mem "5000,0" -a 0000:03:07.0  -- -i --nb-cores=1 --portmask=0x1 --rxd=512 --txd=512
testpmd> port stop 0
testpmd> vlan set filter on 0
testpmd> rx_vlan add 767 0
testpmd> port start 0
testpmd> set fwd 5tswap
testpmd> start
testpmd> show fwd stats all

  ---------------------- Forward statistics for port 0  ----------------------
  RX-packets: 617475727      RX-dropped: 0             RX-total: 617475727
  TX-packets: 617475727      TX-dropped: 0             TX-total: 617475727
  ----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
  RX-packets: 617475727      RX-dropped: 0             RX-total: 617475727
  TX-packets: 617475727      TX-dropped: 0             TX-total: 617475727
  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
testpmd> show port xstats 0
###### NIC extended statistics for port 0
rx_good_packets: 617475731
tx_good_packets: 617475730
rx_good_bytes: 45693207378
tx_good_bytes: 45693207036
rx_missed_errors: 0
rx_errors: 0
tx_errors: 0
rx_mbuf_allocation_errors: 0
rx_q0_packets: 617475731
rx_q0_bytes: 45693207378
rx_q0_errors: 0
tx_q0_packets: 617475730
tx_q0_bytes: 45693207036
rx_wqe_errors: 0
rx_unicast_packets: 617496152
rx_unicast_bytes: 45694715248
tx_unicast_packets: 617475730
tx_unicast_bytes: 45693207036
rx_multicast_packets: 3
rx_multicast_bytes: 342
tx_multicast_packets: 0
tx_multicast_bytes: 0
rx_broadcast_packets: 56
rx_broadcast_bytes: 7308
tx_broadcast_packets: 0
tx_broadcast_bytes: 0
tx_phy_packets: 0
rx_phy_packets: 0
rx_phy_crc_errors: 0
tx_phy_bytes: 0
rx_phy_bytes: 0
rx_phy_in_range_len_errors: 0
rx_phy_symbol_errors: 0
rx_phy_discard_packets: 0
tx_phy_discard_packets: 0
tx_phy_errors: 0
rx_out_of_buffer: 0
tx_pp_missed_interrupt_errors: 0
tx_pp_rearm_queue_errors: 0
tx_pp_clock_queue_errors: 0
tx_pp_timestamp_past_errors: 0
tx_pp_timestamp_future_errors: 0
tx_pp_jitter: 0
tx_pp_wander: 0
tx_pp_sync_lost: 0


Best regards
Yan Xiaoping

From: Yan, Xiaoping (NSB - CN/Hangzhou)
Sent: 2021年9月29日 16:26
To: 'Asaf Penso' <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Cc: 'Slava Ovsiienko' <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; 'Matan Azrad' <matan@nvidia.com<mailto:matan@nvidia.com>>; 'Raslan Darawsheh' <rasland@nvidia.com<mailto:rasland@nvidia.com>>; Xu, Meng-Maggie (NSB - CN/Hangzhou) <meng-maggie.xu@nokia-sbell.com<mailto:meng-maggie.xu@nokia-sbell.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

Hi,

We replaced the NIC also (originally it was cx-4, now it is cx-5), but result is the same.
Do you know why the packet is dropped between rx_port_unicast_packets and rx_good_packets, but there is no error/miss counter?

And do you know mlx5_xxx kernel thread?
They have cpu affinity to all cpu cores, including the core used by fastpath/testpmd.
Would it affect?

[cranuser1@hztt24f-rm17-ocp-sno-1 ~]$ taskset -cp 74548
pid 74548's current affinity list: 0-27

[cranuser1@hztt24f-rm17-ocp-sno-1 ~]$ ps -emo pid,tid,psr,comm | grep mlx5
    903       -   - mlx5_health0000
    904       -   - mlx5_page_alloc
    907       -   - mlx5_cmd_0000:0
    916       -   - mlx5_events
    917       -   - mlx5_esw_wq
    918       -   - mlx5_fw_tracer
    919       -   - mlx5_hv_vhca
    921       -   - mlx5_fc
    924       -   - mlx5_health0000
    925       -   - mlx5_page_alloc
    927       -   - mlx5_cmd_0000:0
    935       -   - mlx5_events
    936       -   - mlx5_esw_wq
    937       -   - mlx5_fw_tracer
    938       -   - mlx5_hv_vhca
    939       -   - mlx5_fc
    941       -   - mlx5_health0000
    942       -   - mlx5_page_alloc



Best regards
Yan Xiaoping

From: Yan, Xiaoping (NSB - CN/Hangzhou)
Sent: 2021年9月29日 15:03
To: 'Asaf Penso' <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>; Xu, Meng-Maggie (NSB - CN/Hangzhou) <meng-maggie.xu@nokia-sbell.com<mailto:meng-maggie.xu@nokia-sbell.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

Hi,

It is 20.11 (We upgraded to 20.11 recently).

Best regards
Yan Xiaoping

From: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Sent: 2021年9月29日 14:47
To: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>; Xu, Meng-Maggie (NSB - CN/Hangzhou) <meng-maggie.xu@nokia-sbell.com<mailto:meng-maggie.xu@nokia-sbell.com>>
Subject: Re: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets

What dpdk version are you using?
19.11 doesn't support 5tswap mode in testpmd.

Regards,
Asaf Penso
________________________________
From: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>
Sent: Monday, September 27, 2021 5:55:21 AM
To: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>; Xu, Meng-Maggie (NSB - CN/Hangzhou) <meng-maggie.xu@nokia-sbell.com<mailto:meng-maggie.xu@nokia-sbell.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets


Hi,



I tried also with testpmd with such command and configuration:

dpdk-testpmd -l "4,5" --legacy-mem --socket-mem "5000,0" -a 0000:03:02.0  -- -i --nb-cores=1 --portmask=0x1 --rxd=512 --txd=512

testpmd> port stop 0

testpmd> vlan set filter on 0

testpmd> rx_vlan add 767 0

testpmd> port start 0

testpmd> set fwd 5tswap

testpmd> start



it only gets 1.4mpps.

with 1.5mpps, it starts to drop packets occasionally.





Best regards

Yan Xiaoping



From: Yan, Xiaoping (NSB - CN/Hangzhou)
Sent: 2021年9月26日 13:19
To: 'Asaf Penso' <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>; Xu, Meng-Maggie (NSB - CN/Hangzhou) <meng-maggie.xu@nokia-sbell.com<mailto:meng-maggie.xu@nokia-sbell.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets



Hi,



I was using 6wind fastpath instead of testpmd.

>> Do you configure any flow?

I think not, but is there any command to check?

>> Do you work in isolate mode?

Do you mean the CPU?

The dpdk application (6wind fastpath) run inside container and it is using CPU core from exclusive pool<https://github.com/nokia/CPU-Pooler>

On the otherhand, the cpu isolation is done by host infrastructure and a bit complicated, I’m not sure if there is really no any other task run in this core.





BTW, we recently switched the host infra to redhat openshift container platform, and same problem is there…

We can get 1.6mpps with intel 810 NIC, but we can only gets 1mpps for mlx.

I raised also a ticket to mellanox Support

https://support.mellanox.com/s/case/5001T00001ZC0jzQAD

There is log about cpu affinity, and some mlx5_xxx threads seems strange to me…

Can you please also check the ticket?









Best regards

Yan Xiaoping



From: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Sent: 2021年9月26日 12:57
To: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets



Hi,



Could you please share the testpmd command line you are using?

Do you configure any flow? Do you work in isolate mode?



Regards,

Asaf Penso



From: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>
Sent: Monday, July 26, 2021 7:52 AM
To: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>; users@dpdk.org<mailto:users@dpdk.org>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets



Hi,



dpdk version in use is 19.11

I have not tried with latest upstream version.



It seems performance is affected by IPv6 neighbor advertisement packets coming to this interface

05:20:04.025290 IP6 fe80::6cf1:9fff:fe4e:8a01 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::6cf1:9fff:fe4e:8a01, length 32

        0x0000:  3333 0000 0001 6ef1 9f4e 8a01 86dd 6008

        0x0010:  fe44 0020 3aff fe80 0000 0000 0000 6cf1

        0x0020:  9fff fe4e 8a01 ff02 0000 0000 0000 0000

        0x0030:  0000 0000 0001 8800 96d9 2000 0000 fe80

        0x0040:  0000 0000 0000 6cf1 9fff fe4e 8a01 0201

        0x0050:  6ef1 9f4e 8a01

Somehow, there are about 100 such packets per second coming to the interface, and packet loss happens.

When we change default vlan in switch so that there is no such packets come to the interface (the mlx5 VF under test), there is not packet loss anymore.



In both cases, all packets have arrived to rx_vport_unicast_packets.

In the packet loss case, we see less packets in rx_good_packets (rx_vport_unicast_packets = rx_good_packets + lost packet).

If the dpdk application is too slow to receive all packets from the VF, is there any counter to indicate this?



Any suggestion?

Thank you.



Best regards

Yan Xiaoping



-----Original Message-----
From: Asaf Penso <asafp@nvidia.com<mailto:asafp@nvidia.com>>
Sent: 2021年7月13日 20:36
To: Yan, Xiaoping (NSB - CN/Hangzhou) <xiaoping.yan@nokia-sbell.com<mailto:xiaoping.yan@nokia-sbell.com>>; users@dpdk.org<mailto:users@dpdk.org>
Cc: Slava Ovsiienko <viacheslavo@nvidia.com<mailto:viacheslavo@nvidia.com>>; Matan Azrad <matan@nvidia.com<mailto:matan@nvidia.com>>; Raslan Darawsheh <rasland@nvidia.com<mailto:rasland@nvidia.com>>
Subject: RE: mlx5 VF packet lost between rx_port_unicast_packets and rx_good_packets



Hello Yan,



Can you please mention which DPDK version you use and whether you see this issue also with latest upstream version?



Regards,

Asaf Penso



>-----Original Message-----

>From: users <users-bounces@dpdk.org<mailto:users-bounces@dpdk.org>> On Behalf Of Yan, Xiaoping (NSB -

>CN/Hangzhou)

>Sent: Monday, July 5, 2021 1:08 PM

>To: users@dpdk.org<mailto:users@dpdk.org>

>Subject: [dpdk-users] mlx5 VF packet lost between

>rx_port_unicast_packets and rx_good_packets

>

>Hi,

>

>When doing traffic loopback test on a mlx5 VF, we found there are some

>packet loss (not all packet received back ).

>

>From xstats counters, I found all packets have been received in

>rx_port_unicast_packets, but rx_good_packets has lower counter, and

>rx_port_unicast_packets - rx_good_packets = lost packets i.e. packet

>lost between rx_port_unicast_packets and rx_good_packets.

>But I can not find any other counter indicating where exactly those

>packets are lost.

>

>Any idea?

>

>Attached is the counter logs. (bf is before the test, af is after the

>test, fp-cli dpdk-port-stats is the command used to get xstats, and

>ethtool -S _f1 (the vf

>used) also printed) Test equipment reports that it sends: 2911176

>packets,

>receives:  2909474, dropped: 1702 And the xstats (after - before) shows

>rx_port_unicast_packets 2911177,  rx_good_packets 2909475, so drop

>(2911177 - rx_good_packets) is 1702

>

>BTW, I also noticed this discussion "packet loss between phy and good

>counter"

>http://mails.dpdk.org/archives/users/2018-July/003271.html

>but my case seems to be different as packet also received in

>rx_port_unicast_packets, and I checked counter from pf  (ethtool -S

>ens1f0 in attached log), rx_discards_phy is not increasing.

>

>Thank you.

>

>Best regards

>Yan Xiaoping