DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
       [not found] <MWHPR04MB059226F0055B856110C55933CFC80@MWHPR04MB0592.namprd04.prod.outlook.com>
@ 2019-07-23 15:16 ` Bly, Mike
  2019-07-23 16:03   ` Ananyev, Konstantin
  0 siblings, 1 reply; 6+ messages in thread
From: Bly, Mike @ 2019-07-23 15:16 UTC (permalink / raw)
  To: 'dev@dpdk.org'

Hello,

We are chasing an interesting NIC transmit issue where after some period of time with normal operation the NIC enters a state where it refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational and is still receiving traffic. It simply refuses to transmit anymore.

Our application is running DPDK 17.05.1. In digging through the email archives, this appears to be related to the following posts, as we see the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set problems they describe:
http://mails.dpdk.org/archives/dev/2017-August/073240.html
http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@alibaba-inc.com/

Having not seen any resolution on the above DPDK posts and after a number of other investigative steps, we incorporated the rte_ethtool lib to provide the ability to dump the NIC register set via rte_ethtool_get_regs() in the hopes that perhaps there would be something there in a status register to point us in the right direction. The question now is what is the best way to check the register contents dumped to the binary output file this API creates, for the x552 NIC? Does anyone know if a decoder script exists?

Other ideas to pursue?

Regards,
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
  2019-07-23 15:16 ` [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs() Bly, Mike
@ 2019-07-23 16:03   ` Ananyev, Konstantin
  2019-07-23 17:08     ` Bly, Mike
  0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2019-07-23 16:03 UTC (permalink / raw)
  To: Bly, Mike, 'dev@dpdk.org'




> 
> Hello,
> 
> We are chasing an interesting NIC transmit issue where after some period of time with normal operation the NIC enters a state where it
> refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational and
> is still receiving traffic. It simply refuses to transmit anymore.
> 
> Our application is running DPDK 17.05.1. In digging through the email archives, this appears to be related to the following posts, as we see
> the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set problems they describe:
> http://mails.dpdk.org/archives/dev/2017-August/073240.html
> http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@alibaba-inc.com/
> 
> Having not seen any resolution on the above DPDK posts and after a number of other investigative steps, we incorporated the rte_ethtool
> lib to provide the ability to dump the NIC register set via rte_ethtool_get_regs() in the hopes that perhaps there would be something there
> in a status register to point us in the right direction. The question now is what is the best way to check the register contents dumped to the
> binary output file this API creates, for the x552 NIC? Does anyone know if a decoder script exists?
> 
> Other ideas to pursue?

It is hard to tell without any other information, but sometimes that happens 
when user tries to TX malformed packet.
Might be worth to try using rte_eth_tx_prepare() inside your app.
It does some sanity checks to prevent such situations, especially when RTE_LIBRTE_ETHDEV_DEBUG is on.
Konstantin 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
  2019-07-23 16:03   ` Ananyev, Konstantin
@ 2019-07-23 17:08     ` Bly, Mike
  2019-07-23 23:09       ` Bly, Mike
  0 siblings, 1 reply; 6+ messages in thread
From: Bly, Mike @ 2019-07-23 17:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org'

Konstantin,

Thank you for the prompt reply on this posting. In looking at the single use-case in test-pmd's csumonly.c, it would seem prepare + retry_enabled may have some shortcomings as currently coded when nb_prep < nb_rx. Has anyone looked at this? I happened to notice this when looking for a reference for how it is expected to be used. It would seem nb_rx should be replaced with nb_prep in the retry code. I think the rest of the code should "just work" from there. Thoughts?

Regards,
Mike

-----Original Message-----
From: Ananyev, Konstantin <konstantin.ananyev@intel.com> 
Sent: Tuesday, July 23, 2019 9:03 AM
To: Bly, Mike <mbly@ciena.com>; 'dev@dpdk.org' <dev@dpdk.org>
Subject: [**EXTERNAL**] RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()




> 
> Hello,
> 
> We are chasing an interesting NIC transmit issue where after some 
> period of time with normal operation the NIC enters a state where it 
> refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational and is still receiving traffic. It simply refuses to transmit anymore.
> 
> Our application is running DPDK 17.05.1. In digging through the email 
> archives, this appears to be related to the following posts, as we see the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set problems they describe:
> http://mails.dpdk.org/archives/dev/2017-August/073240.html
> http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@a
> libaba-inc.com/
> 
> Having not seen any resolution on the above DPDK posts and after a 
> number of other investigative steps, we incorporated the rte_ethtool 
> lib to provide the ability to dump the NIC register set via 
> rte_ethtool_get_regs() in the hopes that perhaps there would be something there in a status register to point us in the right direction. The question now is what is the best way to check the register contents dumped to the binary output file this API creates, for the x552 NIC? Does anyone know if a decoder script exists?
> 
> Other ideas to pursue?

It is hard to tell without any other information, but sometimes that happens when user tries to TX malformed packet.
Might be worth to try using rte_eth_tx_prepare() inside your app.
It does some sanity checks to prevent such situations, especially when RTE_LIBRTE_ETHDEV_DEBUG is on.
Konstantin 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
  2019-07-23 17:08     ` Bly, Mike
@ 2019-07-23 23:09       ` Bly, Mike
  2019-07-24  7:52         ` Ananyev, Konstantin
  0 siblings, 1 reply; 6+ messages in thread
From: Bly, Mike @ 2019-07-23 23:09 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org'

Konstantin,

The recommended use of rte_eth_tx_prepare() had no effect, which after looking at it, makes sense. We are using "large" mbufs to support Jumbo frames, so nb-seg will always == 1. Additionally, we are not currently leveraging any HW offload capabilities. As such, rte_eth_tx_prepare() always returns "num-frames".

Taking this a step further, I have reproduced the problem using a simple c-unit test that builds bursts of frames, where each burst contains a max-burst of frames (32 in our application), where the interleaved frames have either a legal frame length (124-bytes) or intentionally a runt frame length (20-bytes++). These are dumb-simple L2 frames, i.e. NOT ip-frames. The NIC is setup to pad and append, so it should just do that without issue as needed. The test repeats this burst sequence a 100 times, resulting in 3200 frames attempting to be transmitted. Run to run, I am seeing anywhere from 750 to 3000 frames get transmitted. Thereafter, the NIC will no longer accept frames for transmit. Using GDB, we have confirmed the same DD status problem is present and preventing ixgbe_tx_free_bufs() from doing any actual freeing of resources.

Is there a minimum runt size officially supported by DPDK and/or Intel on the x550 NIC family? We could certainly do a simple frame-length check and discard accordingly. However, we have seen 3rd party applications send us runts, e.g. 40-byte ARP requests, over vhost-user and tap interfaces, so we are a bit hesitant to blindly enforce this at 60 bytes (min ETH minus CRC).

-Mike

-----Original Message-----
From: Bly, Mike 
Sent: Tuesday, July 23, 2019 10:08 AM
To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>
Subject: RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()

Konstantin,

Thank you for the prompt reply on this posting. In looking at the single use-case in test-pmd's csumonly.c, it would seem prepare + retry_enabled may have some shortcomings as currently coded when nb_prep < nb_rx. Has anyone looked at this? I happened to notice this when looking for a reference for how it is expected to be used. It would seem nb_rx should be replaced with nb_prep in the retry code. I think the rest of the code should "just work" from there. Thoughts?

Regards,
Mike

-----Original Message-----
From: Ananyev, Konstantin <konstantin.ananyev@intel.com> 
Sent: Tuesday, July 23, 2019 9:03 AM
To: Bly, Mike <mbly@ciena.com>; 'dev@dpdk.org' <dev@dpdk.org>
Subject: [**EXTERNAL**] RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()




> 
> Hello,
> 
> We are chasing an interesting NIC transmit issue where after some 
> period of time with normal operation the NIC enters a state where it 
> refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational and is still receiving traffic. It simply refuses to transmit anymore.
> 
> Our application is running DPDK 17.05.1. In digging through the email 
> archives, this appears to be related to the following posts, as we see the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set problems they describe:
> http://mails.dpdk.org/archives/dev/2017-August/073240.html
> http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@a
> libaba-inc.com/
> 
> Having not seen any resolution on the above DPDK posts and after a 
> number of other investigative steps, we incorporated the rte_ethtool 
> lib to provide the ability to dump the NIC register set via 
> rte_ethtool_get_regs() in the hopes that perhaps there would be something there in a status register to point us in the right direction. The question now is what is the best way to check the register contents dumped to the binary output file this API creates, for the x552 NIC? Does anyone know if a decoder script exists?
> 
> Other ideas to pursue?

It is hard to tell without any other information, but sometimes that happens when user tries to TX malformed packet.
Might be worth to try using rte_eth_tx_prepare() inside your app.
It does some sanity checks to prevent such situations, especially when RTE_LIBRTE_ETHDEV_DEBUG is on.
Konstantin 





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
  2019-07-23 23:09       ` Bly, Mike
@ 2019-07-24  7:52         ` Ananyev, Konstantin
  2019-07-25 16:28           ` Bly, Mike
  0 siblings, 1 reply; 6+ messages in thread
From: Ananyev, Konstantin @ 2019-07-24  7:52 UTC (permalink / raw)
  To: Bly, Mike, 'dev@dpdk.org'; +Cc: Zhang, Qi Z, Lu, Wenzhuo

Hi Mike,

> Konstantin,
> 
> The recommended use of rte_eth_tx_prepare() had no effect, which after looking at it, makes sense. We are using "large" mbufs to support
> Jumbo frames, so nb-seg will always == 1. Additionally, we are not currently leveraging any HW offload capabilities. As such,
> rte_eth_tx_prepare() always returns "num-frames".
> 
> Taking this a step further, I have reproduced the problem using a simple c-unit test that builds bursts of frames, where each burst contains a
> max-burst of frames (32 in our application), where the interleaved frames have either a legal frame length (124-bytes) or intentionally a runt
> frame length (20-bytes++). These are dumb-simple L2 frames, i.e. NOT ip-frames. The NIC is setup to pad and append, so it should just do
> that without issue as needed. The test repeats this burst sequence a 100 times, resulting in 3200 frames attempting to be transmitted. Run
> to run, I am seeing anywhere from 750 to 3000 frames get transmitted. Thereafter, the NIC will no longer accept frames for transmit. Using
> GDB, we have confirmed the same DD status problem is present and preventing ixgbe_tx_free_bufs() from doing any actual freeing of
> resources.
> 
> Is there a minimum runt size officially supported by DPDK and/or Intel on the x550 NIC family? We could certainly do a simple frame-length
> check and discard accordingly. However, we have seen 3rd party applications send us runts, e.g. 40-byte ARP requests, over vhost-user and
> tap interfaces, so we are a bit hesitant to blindly enforce this at 60 bytes (min ETH minus CRC).

AFAIK, sending frames smaller then 64B shouldn't cause a problem.
At least I never hit such limitation.
Qi, Wenzhuo - did you ever see such issue?
My suggestion would be to open a new Bugzilla ticket for investigation and 
attach pcap file (or some scapy script to generate it) so it could be reproduced with test-pmd.
Thanks
Konstantin  


> 
> -Mike
> 
> -----Original Message-----
> From: Bly, Mike
> Sent: Tuesday, July 23, 2019 10:08 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>
> Subject: RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
> 
> Konstantin,
> 
> Thank you for the prompt reply on this posting. In looking at the single use-case in test-pmd's csumonly.c, it would seem prepare +
> retry_enabled may have some shortcomings as currently coded when nb_prep < nb_rx. Has anyone looked at this? I happened to notice this
> when looking for a reference for how it is expected to be used. It would seem nb_rx should be replaced with nb_prep in the retry code. I
> think the rest of the code should "just work" from there. Thoughts?
> 
> Regards,
> Mike
> 
> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Tuesday, July 23, 2019 9:03 AM
> To: Bly, Mike <mbly@ciena.com>; 'dev@dpdk.org' <dev@dpdk.org>
> Subject: [**EXTERNAL**] RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
> 
> 
> 
> 
> >
> > Hello,
> >
> > We are chasing an interesting NIC transmit issue where after some
> > period of time with normal operation the NIC enters a state where it
> > refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational
> and is still receiving traffic. It simply refuses to transmit anymore.
> >
> > Our application is running DPDK 17.05.1. In digging through the email
> > archives, this appears to be related to the following posts, as we see the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set
> problems they describe:
> > http://mails.dpdk.org/archives/dev/2017-August/073240.html
> > http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@a
> > libaba-inc.com/
> >
> > Having not seen any resolution on the above DPDK posts and after a
> > number of other investigative steps, we incorporated the rte_ethtool
> > lib to provide the ability to dump the NIC register set via
> > rte_ethtool_get_regs() in the hopes that perhaps there would be something there in a status register to point us in the right direction. The
> question now is what is the best way to check the register contents dumped to the binary output file this API creates, for the x552 NIC?
> Does anyone know if a decoder script exists?
> >
> > Other ideas to pursue?
> 
> It is hard to tell without any other information, but sometimes that happens when user tries to TX malformed packet.
> Might be worth to try using rte_eth_tx_prepare() inside your app.
> It does some sanity checks to prevent such situations, especially when RTE_LIBRTE_ETHDEV_DEBUG is on.
> Konstantin
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
  2019-07-24  7:52         ` Ananyev, Konstantin
@ 2019-07-25 16:28           ` Bly, Mike
  0 siblings, 0 replies; 6+ messages in thread
From: Bly, Mike @ 2019-07-25 16:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, 'dev@dpdk.org'; +Cc: Zhang, Qi Z, Lu, Wenzhuo

Konstantin,

After digging a bit further, we discovered our custom test setup was inadvertently running two threads calling the same transmit sequence, thus wedging the NIC. While this reproduced the DD not ready symptoms we are chasing, it is not a valid reproduction of what our application is capable of doing. We will continue looking and update when we have more to share.

-Mike

-----Original Message-----
From: Ananyev, Konstantin <konstantin.ananyev@intel.com> 
Sent: Wednesday, July 24, 2019 12:53 AM
To: Bly, Mike <mbly@ciena.com>; 'dev@dpdk.org' <dev@dpdk.org>
Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
Subject: [**EXTERNAL**] RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()

Hi Mike,

> Konstantin,
> 
> The recommended use of rte_eth_tx_prepare() had no effect, which after 
> looking at it, makes sense. We are using "large" mbufs to support 
> Jumbo frames, so nb-seg will always == 1. Additionally, we are not 
> currently leveraging any HW offload capabilities. As such,
> rte_eth_tx_prepare() always returns "num-frames".
> 
> Taking this a step further, I have reproduced the problem using a 
> simple c-unit test that builds bursts of frames, where each burst 
> contains a max-burst of frames (32 in our application), where the 
> interleaved frames have either a legal frame length (124-bytes) or 
> intentionally a runt frame length (20-bytes++). These are dumb-simple 
> L2 frames, i.e. NOT ip-frames. The NIC is setup to pad and append, so 
> it should just do that without issue as needed. The test repeats this burst sequence a 100 times, resulting in 3200 frames attempting to be transmitted. Run to run, I am seeing anywhere from 750 to 3000 frames get transmitted. Thereafter, the NIC will no longer accept frames for transmit. Using GDB, we have confirmed the same DD status problem is present and preventing ixgbe_tx_free_bufs() from doing any actual freeing of resources.
> 
> Is there a minimum runt size officially supported by DPDK and/or Intel 
> on the x550 NIC family? We could certainly do a simple frame-length 
> check and discard accordingly. However, we have seen 3rd party applications send us runts, e.g. 40-byte ARP requests, over vhost-user and tap interfaces, so we are a bit hesitant to blindly enforce this at 60 bytes (min ETH minus CRC).

AFAIK, sending frames smaller then 64B shouldn't cause a problem.
At least I never hit such limitation.
Qi, Wenzhuo - did you ever see such issue?
My suggestion would be to open a new Bugzilla ticket for investigation and attach pcap file (or some scapy script to generate it) so it could be reproduced with test-pmd.
Thanks
Konstantin  


> 
> -Mike
> 
> -----Original Message-----
> From: Bly, Mike
> Sent: Tuesday, July 23, 2019 10:08 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; 'dev@dpdk.org' <dev@dpdk.org>
> Subject: RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
> 
> Konstantin,
> 
> Thank you for the prompt reply on this posting. In looking at the single use-case in test-pmd's csumonly.c, it would seem prepare +
> retry_enabled may have some shortcomings as currently coded when nb_prep < nb_rx. Has anyone looked at this? I happened to notice this
> when looking for a reference for how it is expected to be used. It would seem nb_rx should be replaced with nb_prep in the retry code. I
> think the rest of the code should "just work" from there. Thoughts?
> 
> Regards,
> Mike
> 
> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Tuesday, July 23, 2019 9:03 AM
> To: Bly, Mike <mbly@ciena.com>; 'dev@dpdk.org' <dev@dpdk.org>
> Subject: [**EXTERNAL**] RE: [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs()
> 
> 
> 
> 
> >
> > Hello,
> >
> > We are chasing an interesting NIC transmit issue where after some
> > period of time with normal operation the NIC enters a state where it
> > refuses to transmit frames from our DPDK application via rte_eth_tx_burst(). All indications are the port is up and otherwise operational
> and is still receiving traffic. It simply refuses to transmit anymore.
> >
> > Our application is running DPDK 17.05.1. In digging through the email
> > archives, this appears to be related to the following posts, as we see the same nb_free = 0 and IXGBE_ADVTXD_STAT_DD not set
> problems they describe:
> > http://mails.dpdk.org/archives/dev/2017-August/073240.html
> > http://inbox.dpdk.org/dev/b704af91-dcc6-4481-a54c-3e174b744d17.h.liu@a
> > libaba-inc.com/
> >
> > Having not seen any resolution on the above DPDK posts and after a
> > number of other investigative steps, we incorporated the rte_ethtool
> > lib to provide the ability to dump the NIC register set via
> > rte_ethtool_get_regs() in the hopes that perhaps there would be something there in a status register to point us in the right direction. The
> question now is what is the best way to check the register contents dumped to the binary output file this API creates, for the x552 NIC?
> Does anyone know if a decoder script exists?
> >
> > Other ideas to pursue?
> 
> It is hard to tell without any other information, but sometimes that happens when user tries to TX malformed packet.
> Might be worth to try using rte_eth_tx_prepare() inside your app.
> It does some sanity checks to prevent such situations, especially when RTE_LIBRTE_ETHDEV_DEBUG is on.
> Konstantin
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-25 16:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <MWHPR04MB059226F0055B856110C55933CFC80@MWHPR04MB0592.namprd04.prod.outlook.com>
2019-07-23 15:16 ` [dpdk-dev] x552 transmit issue and rte_ethtool - rte_ethtool_get_regs() Bly, Mike
2019-07-23 16:03   ` Ananyev, Konstantin
2019-07-23 17:08     ` Bly, Mike
2019-07-23 23:09       ` Bly, Mike
2019-07-24  7:52         ` Ananyev, Konstantin
2019-07-25 16:28           ` Bly, Mike

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).