[dpdk-users] mlx5: packets lost between good+discard and phy counters

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] mlx5: packets lost between good+discard and phy counters
@ 2021-04-03  0:03 Gerry Wan
  2021-04-11  1:31 ` Gerry Wan
  0 siblings, 1 reply; 5+ messages in thread
From: Gerry Wan @ 2021-04-03  0:03 UTC (permalink / raw)
  To: users

I have a simple forwarding experiment using a mlx5 NIC directly connected
to a generator. I am noticing that at high enough throughput,
rx_good_packets + rx_phy_discard_packets may not equal rx_phy_packets.
Where are these packets being dropped?

Below is an example xstats where I receive at almost the limit of what my
application can handle with no loss. It shows rx_phy_discard_packets is 0
but the number actually received by the CPU is less than rx_phy_packets.
rx_out_of_buffer and other errors are also 0.

I have disabled Ethernet flow control via rte_eth_dev_flow_ctrl_set with
mode = RTE_FC_NONE, if that matters.

{
    "rx_good_packets": 319992439,
    "tx_good_packets": 0,
    "rx_good_bytes": 19199546340,
    "tx_good_bytes": 0,
    "rx_missed_errors": 0,
    "rx_errors": 0,
    "tx_errors": 0,
    "rx_mbuf_allocation_errors": 0,
    "rx_q0_packets": 319992439,
    "rx_q0_bytes": 19199546340,
    "rx_q0_errors": 0,
    "rx_wqe_errors": 0,
    "rx_unicast_packets": 319999892,
    "rx_unicast_bytes": 19199993520,
    "tx_unicast_packets": 0,
    "tx_unicast_bytes": 0,
    "rx_multicast_packets": 0,
    "rx_multicast_bytes": 0,
    "tx_multicast_packets": 0,
    "tx_multicast_bytes": 0,
    "rx_broadcast_packets": 0,
    "rx_broadcast_bytes": 0,
    "tx_broadcast_packets": 0,
    "tx_broadcast_bytes": 0,
    "tx_phy_packets": 0,
    "rx_phy_packets": 319999892,
    "rx_phy_crc_errors": 0,
    "tx_phy_bytes": 0,
    "rx_phy_bytes": 20479993088,
    "rx_phy_in_range_len_errors": 0,
    "rx_phy_symbol_errors": 0,
    "rx_phy_discard_packets": 0,
    "tx_phy_discard_packets": 0,
    "tx_phy_errors": 0,
    "rx_out_of_buffer": 0,
    "tx_pp_missed_interrupt_errors": 0,
    "tx_pp_rearm_queue_errors": 0,
    "tx_pp_clock_queue_errors": 0,
    "tx_pp_timestamp_past_errors": 0,
    "tx_pp_timestamp_future_errors": 0,
    "tx_pp_jitter": 0,
    "tx_pp_wander": 0,
    "tx_pp_sync_lost": 0,
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-users] mlx5: packets lost between good+discard and phy counters
  2021-04-03  0:03 [dpdk-users] mlx5: packets lost between good+discard and phy counters Gerry Wan
@ 2021-04-11  1:31 ` Gerry Wan
  2021-04-13 13:39   ` Tom Barbette
  0 siblings, 1 reply; 5+ messages in thread
From: Gerry Wan @ 2021-04-11  1:31 UTC (permalink / raw)
  To: users

After further investigation, I think this may be a bug introduced in DPDK
v20.11, where these "lost" packets should be counted as "rx_out_of_buffer"
and "rx_missed_errors". On v20.08 both of these counters increment, but on
v20.11 and v21.02 these counters always remain 0.

Any workarounds for this? This is an important statistic for my use case.

On Fri, Apr 2, 2021 at 5:03 PM Gerry Wan <gerryw@stanford.edu> wrote:

> I have a simple forwarding experiment using a mlx5 NIC directly connected
> to a generator. I am noticing that at high enough throughput,
> rx_good_packets + rx_phy_discard_packets may not equal rx_phy_packets.
> Where are these packets being dropped?
>
> Below is an example xstats where I receive at almost the limit of what my
> application can handle with no loss. It shows rx_phy_discard_packets is 0
> but the number actually received by the CPU is less than rx_phy_packets.
> rx_out_of_buffer and other errors are also 0.
>
> I have disabled Ethernet flow control via rte_eth_dev_flow_ctrl_set with
> mode = RTE_FC_NONE, if that matters.
>
> {
>     "rx_good_packets": 319992439,
>     "tx_good_packets": 0,
>     "rx_good_bytes": 19199546340,
>     "tx_good_bytes": 0,
>     "rx_missed_errors": 0,
>     "rx_errors": 0,
>     "tx_errors": 0,
>     "rx_mbuf_allocation_errors": 0,
>     "rx_q0_packets": 319992439,
>     "rx_q0_bytes": 19199546340,
>     "rx_q0_errors": 0,
>     "rx_wqe_errors": 0,
>     "rx_unicast_packets": 319999892,
>     "rx_unicast_bytes": 19199993520,
>     "tx_unicast_packets": 0,
>     "tx_unicast_bytes": 0,
>     "rx_multicast_packets": 0,
>     "rx_multicast_bytes": 0,
>     "tx_multicast_packets": 0,
>     "tx_multicast_bytes": 0,
>     "rx_broadcast_packets": 0,
>     "rx_broadcast_bytes": 0,
>     "tx_broadcast_packets": 0,
>     "tx_broadcast_bytes": 0,
>     "tx_phy_packets": 0,
>     "rx_phy_packets": 319999892,
>     "rx_phy_crc_errors": 0,
>     "tx_phy_bytes": 0,
>     "rx_phy_bytes": 20479993088,
>     "rx_phy_in_range_len_errors": 0,
>     "rx_phy_symbol_errors": 0,
>     "rx_phy_discard_packets": 0,
>     "tx_phy_discard_packets": 0,
>     "tx_phy_errors": 0,
>     "rx_out_of_buffer": 0,
>     "tx_pp_missed_interrupt_errors": 0,
>     "tx_pp_rearm_queue_errors": 0,
>     "tx_pp_clock_queue_errors": 0,
>     "tx_pp_timestamp_past_errors": 0,
>     "tx_pp_timestamp_future_errors": 0,
>     "tx_pp_jitter": 0,
>     "tx_pp_wander": 0,
>     "tx_pp_sync_lost": 0,
> }
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-users] mlx5: packets lost between good+discard and phy counters
  2021-04-11  1:31 ` Gerry Wan
@ 2021-04-13 13:39   ` Tom Barbette
  2021-04-14 11:15     ` Asaf Penso
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Barbette @ 2021-04-13 13:39 UTC (permalink / raw)
  To: Gerry Wan, users; +Cc: Matan Azrad, Shahaf Shuler, Viacheslav Ovsiienko

CC-ing maintainers.

I did observe that too. rx_out_of_buffer is always 0 since a few months 
(I did not personnaly try to revert versions as Gerry did, I assume it 
was a DPDK update indeed as Gerry verified).


Tom

Le 11-04-21 à 03:31, Gerry Wan a écrit :
> After further investigation, I think this may be a bug introduced in DPDK
> v20.11, where these "lost" packets should be counted as "rx_out_of_buffer"
> and "rx_missed_errors". On v20.08 both of these counters increment, but 
on
> v20.11 and v21.02 these counters always remain 0.
>
> Any workarounds for this? This is an important statistic for my use case.
>
> On Fri, Apr 2, 2021 at 5:03 PM Gerry Wan <gerryw@stanford.edu> wrote:
>
>> I have a simple forwarding experiment using a mlx5 NIC directly connected
>> to a generator. I am noticing that at high enough throughput,
>> rx_good_packets + rx_phy_discard_packets may not equal rx_phy_packets.
>> Where are these packets being dropped?
>>
>> Below is an example xstats where I receive at almost the limit of what 
my
>> application can handle with no loss. It shows rx_phy_discard_packets is 0
>> but the number actually received by the CPU is less than rx_phy_packets.
>> rx_out_of_buffer and other errors are also 0.
>>
>> I have disabled Ethernet flow control via rte_eth_dev_flow_ctrl_set with
>> mode = RTE_FC_NONE, if that matters.
>>
>> {
>>      "rx_good_packets": 319992439,
>>      "tx_good_packets": 0,
>>      "rx_good_bytes": 19199546340,
>>      "tx_good_bytes": 0,
>>      "rx_missed_errors": 0,
>>      "rx_errors": 0,
>>      "tx_errors": 0,
>>      "rx_mbuf_allocation_errors": 0,
>>      "rx_q0_packets": 319992439,
>>      "rx_q0_bytes": 19199546340,
>>      "rx_q0_errors": 0,
>>      "rx_wqe_errors": 0,
>>      "rx_unicast_packets": 319999892,
>>      "rx_unicast_bytes": 19199993520,
>>      "tx_unicast_packets": 0,
>>      "tx_unicast_bytes": 0,
>>      "rx_multicast_packets": 0,
>>      "rx_multicast_bytes": 0,
>>      "tx_multicast_packets": 0,
>>      "tx_multicast_bytes": 0,
>>      "rx_broadcast_packets": 0,
>>      "rx_broadcast_bytes": 0,
>>      "tx_broadcast_packets": 0,
>>      "tx_broadcast_bytes": 0,
>>      "tx_phy_packets": 0,
>>      "rx_phy_packets": 319999892,
>>      "rx_phy_crc_errors": 0,
>>      "tx_phy_bytes": 0,
>>      "rx_phy_bytes": 20479993088,
>>      "rx_phy_in_range_len_errors": 0,
>>      "rx_phy_symbol_errors": 0,
>>      "rx_phy_discard_packets": 0,
>>      "tx_phy_discard_packets": 0,
>>      "tx_phy_errors": 0,
>>      "rx_out_of_buffer": 0,
>>      "tx_pp_missed_interrupt_errors": 0,
>>      "tx_pp_rearm_queue_errors": 0,
>>      "tx_pp_clock_queue_errors": 0,
>>      "tx_pp_timestamp_past_errors": 0,
>>      "tx_pp_timestamp_future_errors": 0,
>>      "tx_pp_jitter": 0,
>>      "tx_pp_wander": 0,
>>      "tx_pp_sync_lost": 0,
>> }
>>
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-users] mlx5: packets lost between good+discard and phy counters
  2021-04-13 13:39   ` Tom Barbette
@ 2021-04-14 11:15     ` Asaf Penso
  2021-04-14 19:13       ` Gerry Wan
  0 siblings, 1 reply; 5+ messages in thread
From: Asaf Penso @ 2021-04-14 11:15 UTC (permalink / raw)
  To: Tom Barbette, Gerry Wan, users
  Cc: Matan Azrad, Shahaf Shuler, Slava Ovsiienko

Hello Gerry and Tom,

We are aware of this issue and already provided a fix to 21.05 and CCed stable.
Please check this series from Matan Azrad, and let me know the result of your cases:

[PATCH 0/4] net/mlx5: fix imissed statistic
The imissed port statistic counts packets that were dropped by the device Rx queues.

In mlx5, the imissed counter summarizes 2 counters:
	- packets dropped by the SW queue handling counted by SW.
	- packets dropped by the HW queues due to "out of buffer" events
	  detected when no SW buffer is available for the incoming
	  packets.

There is HW counter object that should be created per device, and all the Rx queues should be assigned to this counter in configuration time.

This part was missed when the Rx queues were created by DevX what remained the "out of buffer" counter clean forever in this case.

Add 2 options to assign the DevX Rx queues to queue counter:
	- Create queue counter per device by DevX and assign all the
	  queues to it.
	- Query the kernel counter and assign all the queues to it.

Use the first option by default and if it is failed, fallback to the second option.

Matan Azrad (4):
  common/mlx5/linux: add glue function to query WQ
  common/mlx5: add DevX command to query WQ
  common/mlx5: add DevX commands for queue counters
  net/mlx5: fix imissed statistics


Regards,
Asaf Penso

>-----Original Message-----
>From: users <users-bounces@dpdk.org> On Behalf Of Tom Barbette
>Sent: Tuesday, April 13, 2021 4:40 PM
>To: Gerry Wan <gerryw@stanford.edu>; users@dpdk.org
>Cc: Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
>Slava Ovsiienko <viacheslavo@nvidia.com>
>Subject: Re: [dpdk-users] mlx5: packets lost between good+discard and phy
>counters
>
>CC-ing maintainers.
>
>I did observe that too. rx_out_of_buffer is always 0 since a few months (I did
>not personnaly try to revert versions as Gerry did, I assume it was a DPDK
>update indeed as Gerry verified).
>
>
>Tom
>
>Le 11-04-21 à 03:31, Gerry Wan a écrit :
>> After further investigation, I think this may be a bug introduced in
>> DPDK v20.11, where these "lost" packets should be counted as
>"rx_out_of_buffer"
>> and "rx_missed_errors". On v20.08 both of these counters increment,
>> but
>on
>> v20.11 and v21.02 these counters always remain 0.
>>
>> Any workarounds for this? This is an important statistic for my use case.
>>
>> On Fri, Apr 2, 2021 at 5:03 PM Gerry Wan <gerryw@stanford.edu> wrote:
>>
>>> I have a simple forwarding experiment using a mlx5 NIC directly
>>> connected to a generator. I am noticing that at high enough
>>> throughput, rx_good_packets + rx_phy_discard_packets may not equal
>rx_phy_packets.
>>> Where are these packets being dropped?
>>>
>>> Below is an example xstats where I receive at almost the limit of
>>> what
>my
>>> application can handle with no loss. It shows rx_phy_discard_packets
>>> is 0 but the number actually received by the CPU is less than
>rx_phy_packets.
>>> rx_out_of_buffer and other errors are also 0.
>>>
>>> I have disabled Ethernet flow control via rte_eth_dev_flow_ctrl_set
>>> with mode = RTE_FC_NONE, if that matters.
>>>
>>> {
>>>      "rx_good_packets": 319992439,
>>>      "tx_good_packets": 0,
>>>      "rx_good_bytes": 19199546340,
>>>      "tx_good_bytes": 0,
>>>      "rx_missed_errors": 0,
>>>      "rx_errors": 0,
>>>      "tx_errors": 0,
>>>      "rx_mbuf_allocation_errors": 0,
>>>      "rx_q0_packets": 319992439,
>>>      "rx_q0_bytes": 19199546340,
>>>      "rx_q0_errors": 0,
>>>      "rx_wqe_errors": 0,
>>>      "rx_unicast_packets": 319999892,
>>>      "rx_unicast_bytes": 19199993520,
>>>      "tx_unicast_packets": 0,
>>>      "tx_unicast_bytes": 0,
>>>      "rx_multicast_packets": 0,
>>>      "rx_multicast_bytes": 0,
>>>      "tx_multicast_packets": 0,
>>>      "tx_multicast_bytes": 0,
>>>      "rx_broadcast_packets": 0,
>>>      "rx_broadcast_bytes": 0,
>>>      "tx_broadcast_packets": 0,
>>>      "tx_broadcast_bytes": 0,
>>>      "tx_phy_packets": 0,
>>>      "rx_phy_packets": 319999892,
>>>      "rx_phy_crc_errors": 0,
>>>      "tx_phy_bytes": 0,
>>>      "rx_phy_bytes": 20479993088,
>>>      "rx_phy_in_range_len_errors": 0,
>>>      "rx_phy_symbol_errors": 0,
>>>      "rx_phy_discard_packets": 0,
>>>      "tx_phy_discard_packets": 0,
>>>      "tx_phy_errors": 0,
>>>      "rx_out_of_buffer": 0,
>>>      "tx_pp_missed_interrupt_errors": 0,
>>>      "tx_pp_rearm_queue_errors": 0,
>>>      "tx_pp_clock_queue_errors": 0,
>>>      "tx_pp_timestamp_past_errors": 0,
>>>      "tx_pp_timestamp_future_errors": 0,
>>>      "tx_pp_jitter": 0,
>>>      "tx_pp_wander": 0,
>>>      "tx_pp_sync_lost": 0,
>>> }
>>>
>>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-users] mlx5: packets lost between good+discard and phy counters
  2021-04-14 11:15     ` Asaf Penso
@ 2021-04-14 19:13       ` Gerry Wan
  0 siblings, 0 replies; 5+ messages in thread
From: Gerry Wan @ 2021-04-14 19:13 UTC (permalink / raw)
  To: Asaf Penso
  Cc: Tom Barbette, users, Matan Azrad, Shahaf Shuler, Slava Ovsiienko

I applied the patch to 21.02 and it looks like it works. Thanks for the fix!

Follow-up question: what is the difference between
        - packets dropped by the SW queue handling counted by SW.
        - packets dropped by the HW queues due to "out of buffer" events
          detected when no SW buffer is available for the incoming
          packets.

I've interpreted nonzero imissed/rx_out_of_buffer statistic as the software
polling loop is too slow to handle incoming packets, thus filling up the rx
queues and requiring the device to drop further packets. Is this still a
correct interpretation?

On Wed, Apr 14, 2021 at 4:15 AM Asaf Penso <asafp@nvidia.com> wrote:

> Hello Gerry and Tom,
>
> We are aware of this issue and already provided a fix to 21.05 and CCed
> stable.
> Please check this series from Matan Azrad, and let me know the result of
> your cases:
>
> [PATCH 0/4] net/mlx5: fix imissed statistic
> The imissed port statistic counts packets that were dropped by the device
> Rx queues.
>
> In mlx5, the imissed counter summarizes 2 counters:
>         - packets dropped by the SW queue handling counted by SW.
>         - packets dropped by the HW queues due to "out of buffer" events
>           detected when no SW buffer is available for the incoming
>           packets.
>
> There is HW counter object that should be created per device, and all the
> Rx queues should be assigned to this counter in configuration time.
>
> This part was missed when the Rx queues were created by DevX what remained
> the "out of buffer" counter clean forever in this case.
>
> Add 2 options to assign the DevX Rx queues to queue counter:
>         - Create queue counter per device by DevX and assign all the
>           queues to it.
>         - Query the kernel counter and assign all the queues to it.
>
> Use the first option by default and if it is failed, fallback to the
> second option.
>
> Matan Azrad (4):
>   common/mlx5/linux: add glue function to query WQ
>   common/mlx5: add DevX command to query WQ
>   common/mlx5: add DevX commands for queue counters
>   net/mlx5: fix imissed statistics
>
>
> Regards,
> Asaf Penso
>
> >-----Original Message-----
> >From: users <users-bounces@dpdk.org> On Behalf Of Tom Barbette
> >Sent: Tuesday, April 13, 2021 4:40 PM
> >To: Gerry Wan <gerryw@stanford.edu>; users@dpdk.org
> >Cc: Matan Azrad <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>;
> >Slava Ovsiienko <viacheslavo@nvidia.com>
> >Subject: Re: [dpdk-users] mlx5: packets lost between good+discard and phy
> >counters
> >
> >CC-ing maintainers.
> >
> >I did observe that too. rx_out_of_buffer is always 0 since a few months
> (I did
> >not personnaly try to revert versions as Gerry did, I assume it was a DPDK
> >update indeed as Gerry verified).
> >
> >
> >Tom
> >
> >Le 11-04-21 à 03:31, Gerry Wan a écrit :
> >> After further investigation, I think this may be a bug introduced in
> >> DPDK v20.11, where these "lost" packets should be counted as
> >"rx_out_of_buffer"
> >> and "rx_missed_errors". On v20.08 both of these counters increment,
> >> but
> >on
> >> v20.11 and v21.02 these counters always remain 0.
> >>
> >> Any workarounds for this? This is an important statistic for my use
> case.
> >>
> >> On Fri, Apr 2, 2021 at 5:03 PM Gerry Wan <gerryw@stanford.edu> wrote:
> >>
> >>> I have a simple forwarding experiment using a mlx5 NIC directly
> >>> connected to a generator. I am noticing that at high enough
> >>> throughput, rx_good_packets + rx_phy_discard_packets may not equal
> >rx_phy_packets.
> >>> Where are these packets being dropped?
> >>>
> >>> Below is an example xstats where I receive at almost the limit of
> >>> what
> >my
> >>> application can handle with no loss. It shows rx_phy_discard_packets
> >>> is 0 but the number actually received by the CPU is less than
> >rx_phy_packets.
> >>> rx_out_of_buffer and other errors are also 0.
> >>>
> >>> I have disabled Ethernet flow control via rte_eth_dev_flow_ctrl_set
> >>> with mode = RTE_FC_NONE, if that matters.
> >>>
> >>> {
> >>>      "rx_good_packets": 319992439,
> >>>      "tx_good_packets": 0,
> >>>      "rx_good_bytes": 19199546340,
> >>>      "tx_good_bytes": 0,
> >>>      "rx_missed_errors": 0,
> >>>      "rx_errors": 0,
> >>>      "tx_errors": 0,
> >>>      "rx_mbuf_allocation_errors": 0,
> >>>      "rx_q0_packets": 319992439,
> >>>      "rx_q0_bytes": 19199546340,
> >>>      "rx_q0_errors": 0,
> >>>      "rx_wqe_errors": 0,
> >>>      "rx_unicast_packets": 319999892,
> >>>      "rx_unicast_bytes": 19199993520,
> >>>      "tx_unicast_packets": 0,
> >>>      "tx_unicast_bytes": 0,
> >>>      "rx_multicast_packets": 0,
> >>>      "rx_multicast_bytes": 0,
> >>>      "tx_multicast_packets": 0,
> >>>      "tx_multicast_bytes": 0,
> >>>      "rx_broadcast_packets": 0,
> >>>      "rx_broadcast_bytes": 0,
> >>>      "tx_broadcast_packets": 0,
> >>>      "tx_broadcast_bytes": 0,
> >>>      "tx_phy_packets": 0,
> >>>      "rx_phy_packets": 319999892,
> >>>      "rx_phy_crc_errors": 0,
> >>>      "tx_phy_bytes": 0,
> >>>      "rx_phy_bytes": 20479993088,
> >>>      "rx_phy_in_range_len_errors": 0,
> >>>      "rx_phy_symbol_errors": 0,
> >>>      "rx_phy_discard_packets": 0,
> >>>      "tx_phy_discard_packets": 0,
> >>>      "tx_phy_errors": 0,
> >>>      "rx_out_of_buffer": 0,
> >>>      "tx_pp_missed_interrupt_errors": 0,
> >>>      "tx_pp_rearm_queue_errors": 0,
> >>>      "tx_pp_clock_queue_errors": 0,
> >>>      "tx_pp_timestamp_past_errors": 0,
> >>>      "tx_pp_timestamp_future_errors": 0,
> >>>      "tx_pp_jitter": 0,
> >>>      "tx_pp_wander": 0,
> >>>      "tx_pp_sync_lost": 0,
> >>> }
> >>>
> >>>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-14 19:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-03  0:03 [dpdk-users] mlx5: packets lost between good+discard and phy counters Gerry Wan
2021-04-11  1:31 ` Gerry Wan
2021-04-13 13:39   ` Tom Barbette
2021-04-14 11:15     ` Asaf Penso
2021-04-14 19:13       ` Gerry Wan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).