* [dpdk-dev] ixgbe: account more Rx errors Issue
@ 2015-09-04 9:38 Andriy Berestovskyy
2015-09-04 12:44 ` Tahhan, Maryam
0 siblings, 1 reply; 9+ messages in thread
From: Andriy Berestovskyy @ 2015-09-04 9:38 UTC (permalink / raw)
To: Maryam Tahhan, dev
Hi,
Updating to DPDK 2.1 I noticed an issue with the ixgbe stats.
In commit f6bf669b9900 "ixgbe: account more Rx errors" we add XEC
hardware counter (l3_l4_xsum_error) to the ierrors now. The issue is
the UDP packets with zero check sum are counted in XEC and now in
ierrors too.
I've tried to disable hw_ip_checksum in rxmode, but it didn't help.
I'm not sure we should add XEC to ierrors, because packets counted in
XEC are not dropped by the NIC actually. So in my case ierrors counter
is now greater than actual number of packets received by the NIC,
which makes no sense.
What's your opinion?
Regards,
Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-04 9:38 [dpdk-dev] ixgbe: account more Rx errors Issue Andriy Berestovskyy
@ 2015-09-04 12:44 ` Tahhan, Maryam
2015-09-04 16:58 ` Andriy Berestovskyy
0 siblings, 1 reply; 9+ messages in thread
From: Tahhan, Maryam @ 2015-09-04 12:44 UTC (permalink / raw)
To: Andriy Berestovskyy, dev, Olivier MATZ
> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
> Sent: Friday, September 4, 2015 10:38 AM
> To: Tahhan, Maryam; dev@dpdk.org
> Subject: ixgbe: account more Rx errors Issue
>
> Hi,
> Updating to DPDK 2.1 I noticed an issue with the ixgbe stats.
>
> In commit f6bf669b9900 "ixgbe: account more Rx errors" we add XEC
> hardware counter (l3_l4_xsum_error) to the ierrors now. The issue is the
> UDP packets with zero check sum are counted in XEC and now in ierrors too.
>
> I've tried to disable hw_ip_checksum in rxmode, but it didn't help.
>
> I'm not sure we should add XEC to ierrors, because packets counted in XEC
> are not dropped by the NIC actually. So in my case ierrors counter is now
> greater than actual number of packets received by the NIC, which makes no
> sense.
>
> What's your opinion?
Hi Andriy
Thanks for flagging this, I'm aware of this phenomenon, unfortunately it means we are hitting 2 hw registers on the NIC.
XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
And general crc errors counts Counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (from <Destination Address> through <CRC>, inclusively) in length. This register counts all packets received, regardless of L2 filtering and receive enablement
So our options are we can:
1. Add only one of these into the error stats.
2. We can introduce some cooking of stats in this scenario, so only add either or if they are equal or one is higher than the other.
3. Add them all which means you can have more errors than the number of received packets, but TBH this is going to be the case if your packets have multiple errors anyway.
I'm happy to go with either 1, 2 or 3 but would like some more feedback from the community on this front.
Regards
Maryam
> Regards,
> Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-04 12:44 ` Tahhan, Maryam
@ 2015-09-04 16:58 ` Andriy Berestovskyy
2015-09-06 17:15 ` Tahhan, Maryam
0 siblings, 1 reply; 9+ messages in thread
From: Andriy Berestovskyy @ 2015-09-04 16:58 UTC (permalink / raw)
To: Tahhan, Maryam; +Cc: dev
Hi Maryam,
Please see below.
> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
Please note than UDP checksum is optional for IPv4, but UDP packets with zero checksum hit XEC.
> And general crc errors counts Counts the number of receive packets with CRC errors.
Let me explain you with an example.
DPDK 2.0 behavior:
host A sends 10M IPv4 UDP packets (no checksum) to host B
host B stats: 9M ipackets + 1M ierrors (missed) = 10M
DPDK 2.1 behavior:
host A sends 10M IPv4 UDP packets (no checksum) to host B
host B stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> So our options are we can:
> 1. Add only one of these into the error stats.
> 2. We can introduce some cooking of stats in this scenario, so only add either or if they are equal or one is higher than the other.
> 3. Add them all which means you can have more errors than the number of received packets, but TBH this is going to be the case if your packets have multiple errors anyway.
4. ierrors should reflect NIC drops only.
XEC does not count drops, so IMO it should be removed from ierrors.
Please note that we still can access the XEC using rte_eth_xstats_get()
Regards,
Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-04 16:58 ` Andriy Berestovskyy
@ 2015-09-06 17:15 ` Tahhan, Maryam
2015-09-07 8:30 ` Olivier MATZ
0 siblings, 1 reply; 9+ messages in thread
From: Tahhan, Maryam @ 2015-09-06 17:15 UTC (permalink / raw)
To: Andriy Berestovskyy; +Cc: dev
> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
> Sent: Friday, September 4, 2015 5:59 PM
> To: Tahhan, Maryam
> Cc: dev@dpdk.org; Olivier MATZ
> Subject: Re: ixgbe: account more Rx errors Issue
>
> Hi Maryam,
> Please see below.
>
> > XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
>
> Please note than UDP checksum is optional for IPv4, but UDP packets with
> zero checksum hit XEC.
>
I understand, but this is what the hardware register is picking up and what I included previously is the definitions of the registers from the datasheet.
> > And general crc errors counts Counts the number of receive packets with
> CRC errors.
>
> Let me explain you with an example.
>
> DPDK 2.0 behavior:
> host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M
> ipackets + 1M ierrors (missed) = 10M
>
> DPDK 2.1 behavior:
> host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M
> ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
Because it's hitting the 2 error registers. If you had packets with multiple errors that are added up as part of ierrors you'll still be getting more than 10M errors which is why I asked for feedback on the 3 suggestions below. What I'm saying is the number of errors being > the number of received packets will be seen if you hit multiple error registers on the NIC.
>
> > So our options are we can:
> > 1. Add only one of these into the error stats.
> > 2. We can introduce some cooking of stats in this scenario, so only add
> either or if they are equal or one is higher than the other.
> > 3. Add them all which means you can have more errors than the number of
> received packets, but TBH this is going to be the case if your packets have
> multiple errors anyway.
>
> 4. ierrors should reflect NIC drops only.
I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is defined as the Total number of erroneous received packets.
Maybe we need a clear definition or a separate drop counter as I see uint64_t q_errors defined as: Total number of queue packets received that are dropped.
> XEC does not count drops, so IMO it should be removed from ierrors.
While it's picking up the 0 checksum as an error (which it shouldn't necessarily be doing), removing it could mean missing other valid L3/L4 checksum errors... Let me experiment some more with L3/L4 checksum errors and crcerrs to see if we can cook the stats around this register in particular. I would hate to remove it and miss genuine errors
>
> Please note that we still can access the XEC using rte_eth_xstats_get()
>
>
> Regards,
> Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-06 17:15 ` Tahhan, Maryam
@ 2015-09-07 8:30 ` Olivier MATZ
2015-09-07 11:44 ` Tahhan, Maryam
0 siblings, 1 reply; 9+ messages in thread
From: Olivier MATZ @ 2015-09-07 8:30 UTC (permalink / raw)
To: Tahhan, Maryam, Andriy Berestovskyy; +Cc: dev
Hi,
On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
>> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
>> Sent: Friday, September 4, 2015 5:59 PM
>> To: Tahhan, Maryam
>> Cc: dev@dpdk.org; Olivier MATZ
>> Subject: Re: ixgbe: account more Rx errors Issue
>>
>> Hi Maryam,
>> Please see below.
>>
>>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
>>
>> Please note than UDP checksum is optional for IPv4, but UDP packets with
>> zero checksum hit XEC.
>>
>
> I understand, but this is what the hardware register is picking up and what I included previously is the definitions of the registers from the datasheet.
>
>>> And general crc errors counts Counts the number of receive packets with
>> CRC errors.
>>
>> Let me explain you with an example.
>>
>> DPDK 2.0 behavior:
>> host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M
>> ipackets + 1M ierrors (missed) = 10M
>>
>> DPDK 2.1 behavior:
>> host A sends 10M IPv4 UDP packets (no checksum) to host B host B stats: 9M
>> ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
>
> Because it's hitting the 2 error registers. If you had packets with multiple errors that are added up as part of ierrors you'll still be getting more than 10M errors which is why I asked for feedback on the 3 suggestions below. What I'm saying is the number of errors being > the number of received packets will be seen if you hit multiple error registers on the NIC.
>
>>
>>> So our options are we can:
>>> 1. Add only one of these into the error stats.
>>> 2. We can introduce some cooking of stats in this scenario, so only add
>> either or if they are equal or one is higher than the other.
>>> 3. Add them all which means you can have more errors than the number of
>> received packets, but TBH this is going to be the case if your packets have
>> multiple errors anyway.
>>
>> 4. ierrors should reflect NIC drops only.
>
> I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is defined as the Total number of erroneous received packets.
> Maybe we need a clear definition or a separate drop counter as I see uint64_t q_errors defined as: Total number of queue packets received that are dropped.
>
>> XEC does not count drops, so IMO it should be removed from ierrors.
>
> While it's picking up the 0 checksum as an error (which it shouldn't necessarily be doing), removing it could mean missing other valid L3/L4 checksum errors... Let me experiment some more with L3/L4 checksum errors and crcerrs to see if we can cook the stats around this register in particular. I would hate to remove it and miss genuine errors
For me, the definition that looks the most straightforward is:
ipackets = packets successfully received by hardware
imissed = packets dropped by hardware because the software does
not poll fast enough (= queue full)
ierrors = packets dropped by hardware (malformed packets, ...)
These 3 stats never count twice the same packet.
If we want more statistics, they could go in xstats. For instance,
a counter for invalid checksum. The definition of these stats would
be pmd-specific.
I agree we should clarify and have a consensus on the definitions
before going further.
Regards,
Olivier
>
>>
>> Please note that we still can access the XEC using rte_eth_xstats_get()
>>
>>
>> Regards,
>> Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-07 8:30 ` Olivier MATZ
@ 2015-09-07 11:44 ` Tahhan, Maryam
2015-09-09 17:43 ` Kyle Larose
0 siblings, 1 reply; 9+ messages in thread
From: Tahhan, Maryam @ 2015-09-07 11:44 UTC (permalink / raw)
To: Olivier MATZ, Andriy Berestovskyy; +Cc: dev
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, September 7, 2015 9:30 AM
> To: Tahhan, Maryam; Andriy Berestovskyy
> Cc: dev@dpdk.org
> Subject: Re: ixgbe: account more Rx errors Issue
>
> Hi,
>
> On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> >> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
> >> Sent: Friday, September 4, 2015 5:59 PM
> >> To: Tahhan, Maryam
> >> Cc: dev@dpdk.org; Olivier MATZ
> >> Subject: Re: ixgbe: account more Rx errors Issue
> >>
> >> Hi Maryam,
> >> Please see below.
> >>
> >>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> >>
> >> Please note than UDP checksum is optional for IPv4, but UDP packets
> >> with zero checksum hit XEC.
> >>
> >
> > I understand, but this is what the hardware register is picking up and what I
> included previously is the definitions of the registers from the datasheet.
> >
> >>> And general crc errors counts Counts the number of receive packets
> >>> with
> >> CRC errors.
> >>
> >> Let me explain you with an example.
> >>
> >> DPDK 2.0 behavior:
> >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> >> stats: 9M ipackets + 1M ierrors (missed) = 10M
> >>
> >> DPDK 2.1 behavior:
> >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> >> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> >
> > Because it's hitting the 2 error registers. If you had packets with multiple
> errors that are added up as part of ierrors you'll still be getting more than
> 10M errors which is why I asked for feedback on the 3 suggestions below.
> What I'm saying is the number of errors being > the number of received
> packets will be seen if you hit multiple error registers on the NIC.
> >
> >>
> >>> So our options are we can:
> >>> 1. Add only one of these into the error stats.
> >>> 2. We can introduce some cooking of stats in this scenario, so only
> >>> add
> >> either or if they are equal or one is higher than the other.
> >>> 3. Add them all which means you can have more errors than the number
> >>> of
> >> received packets, but TBH this is going to be the case if your
> >> packets have multiple errors anyway.
> >>
> >> 4. ierrors should reflect NIC drops only.
> >
> > I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is defined
> as the Total number of erroneous received packets.
> > Maybe we need a clear definition or a separate drop counter as I see
> uint64_t q_errors defined as: Total number of queue packets received that
> are dropped.
> >
> >> XEC does not count drops, so IMO it should be removed from ierrors.
> >
> > While it's picking up the 0 checksum as an error (which it shouldn't
> > necessarily be doing), removing it could mean missing other valid
> > L3/L4 checksum errors... Let me experiment some more with L3/L4
> > checksum errors and crcerrs to see if we can cook the stats around
> > this register in particular. I would hate to remove it and miss
> > genuine errors
>
> For me, the definition that looks the most straightforward is:
>
> ipackets = packets successfully received by hardware imissed = packets
> dropped by hardware because the software does
> not poll fast enough (= queue full)
> ierrors = packets dropped by hardware (malformed packets, ...)
>
> These 3 stats never count twice the same packet.
>
> If we want more statistics, they could go in xstats. For instance, a counter for
> invalid checksum. The definition of these stats would be pmd-specific.
>
> I agree we should clarify and have a consensus on the definitions before going
> further.
>
>
> Regards,
> Olivier
Hi Olivier
I think it's important to distinguish between errors and drops and provide a statistics API that exposes both. This way people have access to as much information as possible when things do go wrong and nothing is missed in terms of errors.
My suggestion for the high level registers would be:
ipackets = Total number of packets successfully received by hardware
imissed = Total number of packets dropped by hardware because the software does not poll fast enough (= queue full)
idrops = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers).
ierrors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet.
This way people can see how many packets were dropped and why at a high level as well as through the extended stats API rather than using one API or the other. What do you think?
Best Regards
Maryam
>
>
>
> >
> >>
> >> Please note that we still can access the XEC using
> >> rte_eth_xstats_get()
> >>
> >>
> >> Regards,
> >> Andriy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-07 11:44 ` Tahhan, Maryam
@ 2015-09-09 17:43 ` Kyle Larose
2015-09-14 9:50 ` Tahhan, Maryam
0 siblings, 1 reply; 9+ messages in thread
From: Kyle Larose @ 2015-09-09 17:43 UTC (permalink / raw)
To: Tahhan, Maryam; +Cc: dev
On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam <maryam.tahhan@intel.com>
wrote:
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Monday, September 7, 2015 9:30 AM
> > To: Tahhan, Maryam; Andriy Berestovskyy
> > Cc: dev@dpdk.org
> > Subject: Re: ixgbe: account more Rx errors Issue
> >
> > Hi,
> >
> > On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> > >> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
> > >> Sent: Friday, September 4, 2015 5:59 PM
> > >> To: Tahhan, Maryam
> > >> Cc: dev@dpdk.org; Olivier MATZ
> > >> Subject: Re: ixgbe: account more Rx errors Issue
> > >>
> > >> Hi Maryam,
> > >> Please see below.
> > >>
> > >>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> > >>
> > >> Please note than UDP checksum is optional for IPv4, but UDP packets
> > >> with zero checksum hit XEC.
> > >>
> > >
> > > I understand, but this is what the hardware register is picking up and
> what I
> > included previously is the definitions of the registers from the
> datasheet.
> > >
> > >>> And general crc errors counts Counts the number of receive packets
> > >>> with
> > >> CRC errors.
> > >>
> > >> Let me explain you with an example.
> > >>
> > >> DPDK 2.0 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 1M ierrors (missed) = 10M
> > >>
> > >> DPDK 2.1 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> > >
> > > Because it's hitting the 2 error registers. If you had packets with
> multiple
> > errors that are added up as part of ierrors you'll still be getting more
> than
> > 10M errors which is why I asked for feedback on the 3 suggestions below.
> > What I'm saying is the number of errors being > the number of received
> > packets will be seen if you hit multiple error registers on the NIC.
> > >
> > >>
> > >>> So our options are we can:
> > >>> 1. Add only one of these into the error stats.
> > >>> 2. We can introduce some cooking of stats in this scenario, so only
> > >>> add
> > >> either or if they are equal or one is higher than the other.
> > >>> 3. Add them all which means you can have more errors than the number
> > >>> of
> > >> received packets, but TBH this is going to be the case if your
> > >> packets have multiple errors anyway.
> > >>
> > >> 4. ierrors should reflect NIC drops only.
> > >
> > > I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is
> defined
> > as the Total number of erroneous received packets.
> > > Maybe we need a clear definition or a separate drop counter as I see
> > uint64_t q_errors defined as: Total number of queue packets received that
> > are dropped.
> > >
> > >> XEC does not count drops, so IMO it should be removed from ierrors.
> > >
> > > While it's picking up the 0 checksum as an error (which it shouldn't
> > > necessarily be doing), removing it could mean missing other valid
> > > L3/L4 checksum errors... Let me experiment some more with L3/L4
> > > checksum errors and crcerrs to see if we can cook the stats around
> > > this register in particular. I would hate to remove it and miss
> > > genuine errors
> >
> > For me, the definition that looks the most straightforward is:
> >
> > ipackets = packets successfully received by hardware imissed = packets
> > dropped by hardware because the software does
> > not poll fast enough (= queue full)
> > ierrors = packets dropped by hardware (malformed packets, ...)
> >
> > These 3 stats never count twice the same packet.
> >
> > If we want more statistics, they could go in xstats. For instance, a
> counter for
> > invalid checksum. The definition of these stats would be pmd-specific.
> >
> > I agree we should clarify and have a consensus on the definitions before
> going
> > further.
> >
> >
> > Regards,
> > Olivier
>
> Hi Olivier
> I think it's important to distinguish between errors and drops and provide
> a statistics API that exposes both. This way people have access to as much
> information as possible when things do go wrong and nothing is missed in
> terms of errors.
>
> My suggestion for the high level registers would be:
> ipackets = Total number of packets successfully received by hardware
> imissed = Total number of packets dropped by hardware because the
> software does not poll fast enough (= queue full)
> idrops = Total number of packets dropped by hardware (malformed packets,
> ...) Where the # of drops can ONLY be <= the packets received (without
> overlap between registers).
> ierrors = Total number of erroneous received packets. Where the # of
> errors can be >= the packets received (without overlap between registers),
> this is because there may be multiple errors associated with a packet.
>
> This way people can see how many packets were dropped and why at a high
> level as well as through the extended stats API rather than using one API
> or the other. What do you think?
>
> Best Regards
> Maryam
> >
> >
> >
> > >
> > >>
> > >> Please note that we still can access the XEC using
> > >> rte_eth_xstats_get()
> > >>
> > >>
> > >> Regards,
> > >> Andriy
>
Hi Maryam,
If we look to the if-mib (from http://www.ietf.org/rfc/rfc2233.txt), we can
see that their definition of in errors aligns more closely with Olivier's.
There they say (>>> <<< mine):
ifInErrors OBJECT-TYPE
SYNTAX Counter32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"For packet-oriented interfaces, >>> the number of inbound
packets that contained errors preventing them from
being deliverable to a higher-layer protocol <<<. For
character-oriented or fixed-length interfaces, the
number of inbound transmission units that contained
errors preventing them from being deliverable to a
higher-layer protocol.
Discontinuities in the value of this counter can occur
at re-initialization of the management system, and at
other times as indicated by the value of
ifCounterDiscontinuityTime."
::= { ifEntry 14 }
They count it as the number of packets, not the number of errors. So, if a
packet contains two errors, it is only counted once.
I'm not sure what the intention of the ierrors stat is. Do we intend to use
it to feed into MIBs/standards such as the above? Or do we intend to make
it something different? If the former, I think we should conform to the
meaning suggested by rfc2233.
Thanks,
Kyle
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-09 17:43 ` Kyle Larose
@ 2015-09-14 9:50 ` Tahhan, Maryam
2015-10-22 8:21 ` Martin Weiser
0 siblings, 1 reply; 9+ messages in thread
From: Tahhan, Maryam @ 2015-09-14 9:50 UTC (permalink / raw)
To: 'Kyle Larose'; +Cc: dev
> From: Kyle Larose [mailto:eomereadig@gmail.com]
> Sent: Wednesday, September 9, 2015 6:43 PM
> To: Tahhan, Maryam
> Cc: Olivier MATZ; Andriy Berestovskyy; dev@dpdk.org
> Subject: Re: [dpdk-dev] ixgbe: account more Rx errors Issue
>
>
> On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam <maryam.tahhan@intel.com> wrote:
> > From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> > Sent: Monday, September 7, 2015 9:30 AM
> > To: Tahhan, Maryam; Andriy Berestovskyy
> > Cc: dev@dpdk.org
> > Subject: Re: ixgbe: account more Rx errors Issue
> >
> > Hi,
> >
> > On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
> > >> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
> > >> Sent: Friday, September 4, 2015 5:59 PM
> > >> To: Tahhan, Maryam
> > >> Cc: dev@dpdk.org; Olivier MATZ
> > >> Subject: Re: ixgbe: account more Rx errors Issue
> > >>
> > >> Hi Maryam,
> > >> Please see below.
> > >>
> > >>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
> > >>
> > >> Please note than UDP checksum is optional for IPv4, but UDP packets
> > >> with zero checksum hit XEC.
> > >>
> > >
> > > I understand, but this is what the hardware register is picking up and what I
> > included previously is the definitions of the registers from the datasheet.
> > >
> > >>> And general crc errors counts Counts the number of receive packets
> > >>> with
> > >> CRC errors.
> > >>
> > >> Let me explain you with an example.
> > >>
> > >> DPDK 2.0 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 1M ierrors (missed) = 10M
> > >>
> > >> DPDK 2.1 behavior:
> > >> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
> > >> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
> > >
> > > Because it's hitting the 2 error registers. If you had packets with multiple
> > errors that are added up as part of ierrors you'll still be getting more than
> > 10M errors which is why I asked for feedback on the 3 suggestions below.
> > What I'm saying is the number of errors being > the number of received
> > packets will be seen if you hit multiple error registers on the NIC.
> > >
> > >>
> > >>> So our options are we can:
> > >>> 1. Add only one of these into the error stats.
> > >>> 2. We can introduce some cooking of stats in this scenario, so only
> > >>> add
> > >> either or if they are equal or one is higher than the other.
> > >>> 3. Add them all which means you can have more errors than the number
> > >>> of
> > >> received packets, but TBH this is going to be the case if your
> > >> packets have multiple errors anyway.
> > >>
> > >> 4. ierrors should reflect NIC drops only.
> > >
> > > I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is defined
> > as the Total number of erroneous received packets.
> > > Maybe we need a clear definition or a separate drop counter as I see
> > uint64_t q_errors defined as: Total number of queue packets received that
> > are dropped.
> > >
> > >> XEC does not count drops, so IMO it should be removed from ierrors.
> > >
> > > While it's picking up the 0 checksum as an error (which it shouldn't
> > > necessarily be doing), removing it could mean missing other valid
> > > L3/L4 checksum errors... Let me experiment some more with L3/L4
> > > checksum errors and crcerrs to see if we can cook the stats around
> > > this register in particular. I would hate to remove it and miss
> > > genuine errors
> >
> > For me, the definition that looks the most straightforward is:
> >
>> ipackets = packets successfully received by hardware imissed = packets
> > dropped by hardware because the software does
> > not poll fast enough (= queue full)
> > ierrors = packets dropped by hardware (malformed packets, ...)
> >
> > These 3 stats never count twice the same packet.
> >
> > If we want more statistics, they could go in xstats. For instance, a counter for
> > invalid checksum. The definition of these stats would be pmd-specific.
> >
> > I agree we should clarify and have a consensus on the definitions before going
> > further.
> >
> >
> > Regards,
> > Olivier
> Hi Olivier
> I think it's important to distinguish between errors and drops and provide a statistics API that exposes both. This way people have access to as much information as possible when things do go wrong and nothing is missed in terms of errors.
>
> My suggestion for the high level registers would be:
> ipackets = Total number of packets successfully received by hardware
> imissed = Total number of packets dropped by hardware because the software does not poll fast enough (= queue full)
> idrops = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers).
> ierrors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet.
>
> This way people can see how many packets were dropped and why at a high level as well as through the extended stats API rather than using one API or the other. What do you think?
>
> Best Regards
> Maryam
> >
> >
> >
> >
> > >>
> > >> Please note that we still can access the XEC using
> > >> rte_eth_xstats_get()
> > >>
> > >>
> > >> Regards,
> > >> Andriy
>
> Hi Maryam,
>
> If we look to the if-mib (from http://www.ietf.org/rfc/rfc2233.txt), we can see that their definition of in errors aligns more closely with Olivier's.
>
> There they say (>>> <<< mine):
>
> ifInErrors OBJECT-TYPE
> SYNTAX Counter32
> MAX-ACCESS read-only
> STATUS current
> DESCRIPTION
> "For packet-oriented interfaces, >>> the number of inbound
> packets that contained errors preventing them from
> being deliverable to a higher-layer protocol <<<. For
> character-oriented or fixed-length interfaces, the
> number of inbound transmission units that contained
> errors preventing them from being deliverable to a
> higher-layer protocol.
>
> Discontinuities in the value of this counter can occur
> at re-initialization of the management system, and at
> other times as indicated by the value of
> ifCounterDiscontinuityTime."
> ::= { ifEntry 14 }
>
> They count it as the number of packets, not the number of errors. So, if a packet contains two errors, it is only counted once.
>
> I'm not sure what the intention of the ierrors stat is. Do we intend to use it to feed into MIBs/standards such as the above? Or do we intend to make it something different? If the former, I think we should conform to the meaning suggested by rfc2233.
> Thanks,
>
> Kyle
Hi Kyle
Ok, I can now see that we were approaching error stats from different levels, in that I was considering things more from a packet level than an interface level. I'm quite happy with the definitions Olivier provided for an interface level and agree that this is better from a backwards compatibility perspective with existing drivers. I still see a need though for exposing errors at a packet level, as such I would propose the following:
ipackets = Total number of packets successfully received by hardware
imissed = Total number of packets dropped by hardware because the software does not poll fast enough (= queue full)
ierrors = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers).
Rx_pkt_errors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet.
The reason why I think this is important is for fault management of DPDK Interfaces from a higher level fault management entity. ATM I'm developing a collectd plugin for DPDK statistics with the fault management use-case in mind. With that it would be of a great advantage to expose error statistics through the generic statistics API as well as through the extended stats API, to ensure that no error is missed.
In addition to this, if I look at the various interface datasheet I see a distinction being made between error and drop registers, in that they have both. Finally if we look at ifconfig, they make a distinction at a high level between drops and errors.
$ifconfig -a
eth0 Link encap:Ethernet HWaddr
inet addr: Bcast: Mask:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:113307576 errors:0 dropped:0 overruns:0 frame:0
TX packets:125554856 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:56712860715 (54085.5 Mb) TX bytes:78332692918 (74703.8 Mb)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:114677128 errors:0 dropped:0 overruns:0 frame:0
TX packets:114677128 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:87742098789 (83677.3 Mb) TX bytes:87742098789 (83677.3 Mb)
For me, what it really comes down to is making the interface as intuitive as possible for a higher level entity to monitor a DPDK interface without solely relying on the extended NIC interface, which if we change the definition of ierrors to include dropped packets only, without exposing the erroneous packets counter will not include erroneous packet counters that don't result drops and as such, we could have missed errors on the NIC.
Is the proposed solution amiable to all parties? I'm happy to provide more details about the DPDK collectd plugin and the development effort there if anyone is interested.
All the best
Maryam
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] ixgbe: account more Rx errors Issue
2015-09-14 9:50 ` Tahhan, Maryam
@ 2015-10-22 8:21 ` Martin Weiser
0 siblings, 0 replies; 9+ messages in thread
From: Martin Weiser @ 2015-10-22 8:21 UTC (permalink / raw)
To: Tahhan, Maryam; +Cc: dev
On 14.09.15 11:50, Tahhan, Maryam wrote:
>> From: Kyle Larose [mailto:eomereadig@gmail.com]
>> Sent: Wednesday, September 9, 2015 6:43 PM
>> To: Tahhan, Maryam
>> Cc: Olivier MATZ; Andriy Berestovskyy; dev@dpdk.org
>> Subject: Re: [dpdk-dev] ixgbe: account more Rx errors Issue
>>
>>
>> On Mon, Sep 7, 2015 at 7:44 AM, Tahhan, Maryam <maryam.tahhan@intel.com> wrote:
>>> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
>>> Sent: Monday, September 7, 2015 9:30 AM
>>> To: Tahhan, Maryam; Andriy Berestovskyy
>>> Cc: dev@dpdk.org
>>> Subject: Re: ixgbe: account more Rx errors Issue
>>>
>>> Hi,
>>>
>>> On 09/06/2015 07:15 PM, Tahhan, Maryam wrote:
>>>>> From: Andriy Berestovskyy [mailto:aber@semihalf.com]
>>>>> Sent: Friday, September 4, 2015 5:59 PM
>>>>> To: Tahhan, Maryam
>>>>> Cc: dev@dpdk.org; Olivier MATZ
>>>>> Subject: Re: ixgbe: account more Rx errors Issue
>>>>>
>>>>> Hi Maryam,
>>>>> Please see below.
>>>>>
>>>>>> XEC counts the Number of receive IPv4, TCP, UDP or SCTP XSUM errors
>>>>> Please note than UDP checksum is optional for IPv4, but UDP packets
>>>>> with zero checksum hit XEC.
>>>>>
>>>> I understand, but this is what the hardware register is picking up and what I
>>> included previously is the definitions of the registers from the datasheet.
>>>>>> And general crc errors counts Counts the number of receive packets
>>>>>> with
>>>>> CRC errors.
>>>>>
>>>>> Let me explain you with an example.
>>>>>
>>>>> DPDK 2.0 behavior:
>>>>> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
>>>>> stats: 9M ipackets + 1M ierrors (missed) = 10M
>>>>>
>>>>> DPDK 2.1 behavior:
>>>>> host A sends 10M IPv4 UDP packets (no checksum) to host B host B
>>>>> stats: 9M ipackets + 11M in ierrors (1M missed + 10M XEC) = 20M?
>>>> Because it's hitting the 2 error registers. If you had packets with multiple
>>> errors that are added up as part of ierrors you'll still be getting more than
>>> 10M errors which is why I asked for feedback on the 3 suggestions below.
>>> What I'm saying is the number of errors being > the number of received
>>> packets will be seen if you hit multiple error registers on the NIC.
>>>>>> So our options are we can:
>>>>>> 1. Add only one of these into the error stats.
>>>>>> 2. We can introduce some cooking of stats in this scenario, so only
>>>>>> add
>>>>> either or if they are equal or one is higher than the other.
>>>>>> 3. Add them all which means you can have more errors than the number
>>>>>> of
>>>>> received packets, but TBH this is going to be the case if your
>>>>> packets have multiple errors anyway.
>>>>>
>>>>> 4. ierrors should reflect NIC drops only.
>>>> I may have misinterpreted this, but ierrors in rte_ethdev.h ierrors is defined
>>> as the Total number of erroneous received packets.
>>>> Maybe we need a clear definition or a separate drop counter as I see
>>> uint64_t q_errors defined as: Total number of queue packets received that
>>> are dropped.
>>>>> XEC does not count drops, so IMO it should be removed from ierrors.
>>>> While it's picking up the 0 checksum as an error (which it shouldn't
>>>> necessarily be doing), removing it could mean missing other valid
>>>> L3/L4 checksum errors... Let me experiment some more with L3/L4
>>>> checksum errors and crcerrs to see if we can cook the stats around
>>>> this register in particular. I would hate to remove it and miss
>>>> genuine errors
>>> For me, the definition that looks the most straightforward is:
>>>
>>> ipackets = packets successfully received by hardware imissed = packets
>>> dropped by hardware because the software does
>>> not poll fast enough (= queue full)
>>> ierrors = packets dropped by hardware (malformed packets, ...)
>>>
>>> These 3 stats never count twice the same packet.
>>>
>>> If we want more statistics, they could go in xstats. For instance, a counter for
>>> invalid checksum. The definition of these stats would be pmd-specific.
>>>
>>> I agree we should clarify and have a consensus on the definitions before going
>>> further.
>>>
>>>
>>> Regards,
>>> Olivier
>> Hi Olivier
>> I think it's important to distinguish between errors and drops and provide a statistics API that exposes both. This way people have access to as much information as possible when things do go wrong and nothing is missed in terms of errors.
>>
>> My suggestion for the high level registers would be:
>> ipackets = Total number of packets successfully received by hardware
>> imissed = Total number of packets dropped by hardware because the software does not poll fast enough (= queue full)
>> idrops = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers).
>> ierrors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet.
>>
>> This way people can see how many packets were dropped and why at a high level as well as through the extended stats API rather than using one API or the other. What do you think?
>>
>> Best Regards
>> Maryam
>>>
>>>
>>>
>>>>> Please note that we still can access the XEC using
>>>>> rte_eth_xstats_get()
>>>>>
>>>>>
>>>>> Regards,
>>>>> Andriy
>> Hi Maryam,
>>
>> If we look to the if-mib (from http://www.ietf.org/rfc/rfc2233.txt), we can see that their definition of in errors aligns more closely with Olivier's.
>>
>> There they say (>>> <<< mine):
>>
>> ifInErrors OBJECT-TYPE
>> SYNTAX Counter32
>> MAX-ACCESS read-only
>> STATUS current
>> DESCRIPTION
>> "For packet-oriented interfaces, >>> the number of inbound
>> packets that contained errors preventing them from
>> being deliverable to a higher-layer protocol <<<. For
>> character-oriented or fixed-length interfaces, the
>> number of inbound transmission units that contained
>> errors preventing them from being deliverable to a
>> higher-layer protocol.
>>
>> Discontinuities in the value of this counter can occur
>> at re-initialization of the management system, and at
>> other times as indicated by the value of
>> ifCounterDiscontinuityTime."
>> ::= { ifEntry 14 }
>>
>> They count it as the number of packets, not the number of errors. So, if a packet contains two errors, it is only counted once.
>>
>> I'm not sure what the intention of the ierrors stat is. Do we intend to use it to feed into MIBs/standards such as the above? Or do we intend to make it something different? If the former, I think we should conform to the meaning suggested by rfc2233.
>> Thanks,
>>
>> Kyle
> Hi Kyle
>
> Ok, I can now see that we were approaching error stats from different levels, in that I was considering things more from a packet level than an interface level. I'm quite happy with the definitions Olivier provided for an interface level and agree that this is better from a backwards compatibility perspective with existing drivers. I still see a need though for exposing errors at a packet level, as such I would propose the following:
>
>
> ipackets = Total number of packets successfully received by hardware
> imissed = Total number of packets dropped by hardware because the software does not poll fast enough (= queue full)
> ierrors = Total number of packets dropped by hardware (malformed packets, ...) Where the # of drops can ONLY be <= the packets received (without overlap between registers).
> Rx_pkt_errors = Total number of erroneous received packets. Where the # of errors can be >= the packets received (without overlap between registers), this is because there may be multiple errors associated with a packet.
>
> The reason why I think this is important is for fault management of DPDK Interfaces from a higher level fault management entity. ATM I'm developing a collectd plugin for DPDK statistics with the fault management use-case in mind. With that it would be of a great advantage to expose error statistics through the generic statistics API as well as through the extended stats API, to ensure that no error is missed.
>
> In addition to this, if I look at the various interface datasheet I see a distinction being made between error and drop registers, in that they have both. Finally if we look at ifconfig, they make a distinction at a high level between drops and errors.
>
> $ifconfig -a
> eth0 Link encap:Ethernet HWaddr
> inet addr: Bcast: Mask:
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:113307576 errors:0 dropped:0 overruns:0 frame:0
> TX packets:125554856 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:56712860715 (54085.5 Mb) TX bytes:78332692918 (74703.8 Mb)
>
> lo Link encap:Local Loopback
> inet addr:127.0.0.1 Mask:255.0.0.0
> UP LOOPBACK RUNNING MTU:16436 Metric:1
> RX packets:114677128 errors:0 dropped:0 overruns:0 frame:0
> TX packets:114677128 errors:0 dropped:0 overruns:0 carrier:0
> collisions:0 txqueuelen:0
> RX bytes:87742098789 (83677.3 Mb) TX bytes:87742098789 (83677.3 Mb)
>
>
> For me, what it really comes down to is making the interface as intuitive as possible for a higher level entity to monitor a DPDK interface without solely relying on the extended NIC interface, which if we change the definition of ierrors to include dropped packets only, without exposing the erroneous packets counter will not include erroneous packet counters that don't result drops and as such, we could have missed errors on the NIC.
>
> Is the proposed solution amiable to all parties? I'm happy to provide more details about the DPDK collectd plugin and the development effort there if anyone is interested.
>
> All the best
> Maryam
>
>
Hi Maryam,
I would like to strongly vote for your proposal. I believe that these
'high level' stats provided through the generic stats API should be
presented in a way that they can be easily summed up to get the number
of packets actually seen by the NIC and distinguish between the packets
that could be processed and the ones that were lost. This is especially
true if the stats for all other NIC types are reported in this way.
Best regards,
Martin
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-10-22 8:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 9:38 [dpdk-dev] ixgbe: account more Rx errors Issue Andriy Berestovskyy
2015-09-04 12:44 ` Tahhan, Maryam
2015-09-04 16:58 ` Andriy Berestovskyy
2015-09-06 17:15 ` Tahhan, Maryam
2015-09-07 8:30 ` Olivier MATZ
2015-09-07 11:44 ` Tahhan, Maryam
2015-09-09 17:43 ` Kyle Larose
2015-09-14 9:50 ` Tahhan, Maryam
2015-10-22 8:21 ` Martin Weiser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).