DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
@ 2020-02-27 16:16 Min Tang
  2020-02-27 17:47 ` Stephen Hemminger
  2020-03-01 17:54 ` Stephen Hemminger
  0 siblings, 2 replies; 6+ messages in thread
From: Min Tang @ 2020-02-27 16:16 UTC (permalink / raw)
  To: dev, stephen

Hi Stephen:

I saw the following error messages when using DPDK 18.11.2 in Azure:

hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
hn_dev_configure(): subchannel configuration failed

It was not easy to reproduce it and it only occurred with multiple queues
enabled. In hn_nvs_execute it expects the response to match the request. In
the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
looked at the code and found that the NVS_TYPE_RNDIS message needs
completion response but it does not receive the response message anywhere.
The fix would be receiving and discarding the wrong response message(s).

I put the following patches and it has fixed the problem.

--- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
+++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
@@ -92,7 +92,7 @@
  if (hdr->type != type) {
  PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
     hdr->type, type);
- goto retry;
+ return -EINVAL;
  }

  if (len < resplen) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
  2020-02-27 16:16 [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response Min Tang
@ 2020-02-27 17:47 ` Stephen Hemminger
  2020-02-27 18:24   ` Min Tang
  2020-03-01 17:54 ` Stephen Hemminger
  1 sibling, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2020-02-27 17:47 UTC (permalink / raw)
  To: Min Tang; +Cc: dev

On Thu, 27 Feb 2020 11:16:01 -0500
Min Tang <tommytang@gmail.com> wrote:

> Hi Stephen:
> 
> I saw the following error messages when using DPDK 18.11.2 in Azure:
> 
> hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> hn_dev_configure(): subchannel configuration failed
> 
> It was not easy to reproduce it and it only occurred with multiple queues
> enabled. In hn_nvs_execute it expects the response to match the request. In
> the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> looked at the code and found that the NVS_TYPE_RNDIS message needs
> completion response but it does not receive the response message anywhere.
> The fix would be receiving and discarding the wrong response message(s).
> 
> I put the following patches and it has fixed the problem.
> 
> --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> @@ -92,7 +92,7 @@
>   if (hdr->type != type) {
>   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
>      hdr->type, type);
> - goto retry;
> + return -EINVAL;
>   }
> 
>   if (len < resplen) {

Thanks for the analysis. Not sure if this the right fix.
Looks like the control channel needs additional locking.
Having two outstanding requests at once is not going to work well.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
  2020-02-27 17:47 ` Stephen Hemminger
@ 2020-02-27 18:24   ` Min Tang
  0 siblings, 0 replies; 6+ messages in thread
From: Min Tang @ 2020-02-27 18:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

That quick fix was just to verify my guess. I agree that it needs more
comprehensive fix.

Yes, race condition is another issue here. In addition to that, I think in
the function that sends the NVS_TYPE_RNDIS message, it needs to drain the
response message.
I looked at the netvsc driver in Linux kernel, it receives all the VMBus
messages anachronously in another thread. That's probably something we can
think about in the DPDK driver.


On Thu, Feb 27, 2020 at 12:47 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Thu, 27 Feb 2020 11:16:01 -0500
> Min Tang <tommytang@gmail.com> wrote:
>
> > Hi Stephen:
> >
> > I saw the following error messages when using DPDK 18.11.2 in Azure:
> >
> > hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> > hn_dev_configure(): subchannel configuration failed
> >
> > It was not easy to reproduce it and it only occurred with multiple queues
> > enabled. In hn_nvs_execute it expects the response to match the request.
> In
> > the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> > got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> > message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> > looked at the code and found that the NVS_TYPE_RNDIS message needs
> > completion response but it does not receive the response message
> anywhere.
> > The fix would be receiving and discarding the wrong response message(s).
> >
> > I put the following patches and it has fixed the problem.
> >
> > --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> > +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> > @@ -92,7 +92,7 @@
> >   if (hdr->type != type) {
> >   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
> >      hdr->type, type);
> > - goto retry;
> > + return -EINVAL;
> >   }
> >
> >   if (len < resplen) {
>
> Thanks for the analysis. Not sure if this the right fix.
> Looks like the control channel needs additional locking.
> Having two outstanding requests at once is not going to work well.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
  2020-02-27 16:16 [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response Min Tang
  2020-02-27 17:47 ` Stephen Hemminger
@ 2020-03-01 17:54 ` Stephen Hemminger
  2020-03-02 15:40   ` Min Tang
  1 sibling, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2020-03-01 17:54 UTC (permalink / raw)
  To: Min Tang; +Cc: dev

On Thu, 27 Feb 2020 11:16:01 -0500
Min Tang <tommytang@gmail.com> wrote:

> Hi Stephen:
> 
> I saw the following error messages when using DPDK 18.11.2 in Azure:
> 
> hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> hn_dev_configure(): subchannel configuration failed
> 
> It was not easy to reproduce it and it only occurred with multiple queues
> enabled. In hn_nvs_execute it expects the response to match the request. In
> the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> looked at the code and found that the NVS_TYPE_RNDIS message needs
> completion response but it does not receive the response message anywhere.
> The fix would be receiving and discarding the wrong response message(s).
> 
> I put the following patches and it has fixed the problem.
> 
> --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> @@ -92,7 +92,7 @@
>   if (hdr->type != type) {
>   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
>      hdr->type, type);
> - goto retry;
> + return -EINVAL;
>   }
> 
>   if (len < resplen) {


The situation is that NVS_TYPE_RNDIS is a receive packet that is
arriving while subchannel is being setup. For first channel this
doesn't happen because control operations at that level happen
before packets arrive.

Needs some more research before coming up with a good fix.
Either the processing of responses in nvs_execute needs to use
the same receive processing function as normal data. Which
means adding logic to wait for condition; or the incoming
packets there could be dropped; or the device needs to be
stopped before configuring sub channels.

Dropping is probably the easiest to implement.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
  2020-03-01 17:54 ` Stephen Hemminger
@ 2020-03-02 15:40   ` Min Tang
  2020-03-02 16:07     ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: Min Tang @ 2020-03-02 15:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen:

If there is no intention to process the response message of NVS_TYPE_RNDIS,
would it be better to not set the flags to VMBUS_CHANPKT_FLAG_RC so that it
won't receive any response message?

Best Regards,
Min Tang

On Sun, Mar 1, 2020 at 12:54 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Thu, 27 Feb 2020 11:16:01 -0500
> Min Tang <tommytang@gmail.com> wrote:
>
> > Hi Stephen:
> >
> > I saw the following error messages when using DPDK 18.11.2 in Azure:
> >
> > hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> > hn_dev_configure(): subchannel configuration failed
> >
> > It was not easy to reproduce it and it only occurred with multiple queues
> > enabled. In hn_nvs_execute it expects the response to match the request.
> In
> > the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> > got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> > message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> > looked at the code and found that the NVS_TYPE_RNDIS message needs
> > completion response but it does not receive the response message
> anywhere.
> > The fix would be receiving and discarding the wrong response message(s).
> >
> > I put the following patches and it has fixed the problem.
> >
> > --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> > +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> > @@ -92,7 +92,7 @@
> >   if (hdr->type != type) {
> >   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
> >      hdr->type, type);
> > - goto retry;
> > + return -EINVAL;
> >   }
> >
> >   if (len < resplen) {
>
>
> The situation is that NVS_TYPE_RNDIS is a receive packet that is
> arriving while subchannel is being setup. For first channel this
> doesn't happen because control operations at that level happen
> before packets arrive.
>
> Needs some more research before coming up with a good fix.
> Either the processing of responses in nvs_execute needs to use
> the same receive processing function as normal data. Which
> means adding logic to wait for condition; or the incoming
> packets there could be dropped; or the device needs to be
> stopped before configuring sub channels.
>
> Dropping is probably the easiest to implement.
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response
  2020-03-02 15:40   ` Min Tang
@ 2020-03-02 16:07     ` Stephen Hemminger
  0 siblings, 0 replies; 6+ messages in thread
From: Stephen Hemminger @ 2020-03-02 16:07 UTC (permalink / raw)
  To: Min Tang; +Cc: dev

On Mon, 2 Mar 2020 10:40:17 -0500
Min Tang <tommytang@gmail.com> wrote:

> Hi Stephen:
> 
> If there is no intention to process the response message of NVS_TYPE_RNDIS,
> would it be better to not set the flags to VMBUS_CHANPKT_FLAG_RC so that it
> won't receive any response message?
> 
> Best Regards,
> Min Tang
> 
> On Sun, Mar 1, 2020 at 12:54 PM Stephen Hemminger <
> stephen@networkplumber.org> wrote:  
> 
> > On Thu, 27 Feb 2020 11:16:01 -0500
> > Min Tang <tommytang@gmail.com> wrote:
> >  
> > > Hi Stephen:
> > >
> > > I saw the following error messages when using DPDK 18.11.2 in Azure:
> > >
> > > hn_nvs_execute(): unexpected NVS resp 0x6b, expect 0x85
> > > hn_dev_configure(): subchannel configuration failed
> > >
> > > It was not easy to reproduce it and it only occurred with multiple queues
> > > enabled. In hn_nvs_execute it expects the response to match the request.  
> > In  
> > > the failed case, it was expecting NVS_TYPE_SUBCH_REQ (133 or 0x85) but
> > > got NVS_TYPE_RNDIS(107 or 0x6b). Obviously somewhere the NVS_TYPE_RNDIS
> > > message had been sent before the NVS_TYPE_SUBCH_REQ message was sent.  I
> > > looked at the code and found that the NVS_TYPE_RNDIS message needs
> > > completion response but it does not receive the response message  
> > anywhere.  
> > > The fix would be receiving and discarding the wrong response message(s).
> > >
> > > I put the following patches and it has fixed the problem.
> > >
> > > --- a/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:08:29.755530969 -0500
> > > +++ b/drivers/net/netvsc/hn_nvs.c 2020-02-27 11:07:21.567371798 -0500
> > > @@ -92,7 +92,7 @@
> > >   if (hdr->type != type) {
> > >   PMD_DRV_LOG(ERR, "unexpected NVS resp %#x, expect %#x",
> > >      hdr->type, type);
> > > - goto retry;
> > > + return -EINVAL;
> > >   }
> > >
> > >   if (len < resplen) {  
> >
> >
> > The situation is that NVS_TYPE_RNDIS is a receive packet that is
> > arriving while subchannel is being setup. For first channel this
> > doesn't happen because control operations at that level happen
> > before packets arrive.
> >
> > Needs some more research before coming up with a good fix.
> > Either the processing of responses in nvs_execute needs to use
> > the same receive processing function as normal data. Which
> > means adding logic to wait for condition; or the incoming
> > packets there could be dropped; or the device needs to be
> > stopped before configuring sub channels.
> >
> > Dropping is probably the easiest to implement.
> >
> >
> >  

The way transmit works is that NVS_TYPE_RNDIS is sent and the
response indicates a transmit completed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-02 16:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27 16:16 [dpdk-dev] net/netvsc: subchannel configuration failed due to unexpected NVS response Min Tang
2020-02-27 17:47 ` Stephen Hemminger
2020-02-27 18:24   ` Min Tang
2020-03-01 17:54 ` Stephen Hemminger
2020-03-02 15:40   ` Min Tang
2020-03-02 16:07     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).