From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mgamal@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id C48442BFA
 for <dev@dpdk.org>; Sat,  8 Dec 2018 09:10:27 +0100 (CET)
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 96C9C307CDE5;
 Sat,  8 Dec 2018 08:10:26 +0000 (UTC)
Received: from ovpn-112-10.ams2.redhat.com (unknown [10.36.112.10])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 26F39103BAAD;
 Sat,  8 Dec 2018 08:10:20 +0000 (UTC)
Message-ID: <1544256619.5629.8.camel@redhat.com>
From: Mohammed Gamal <mgamal@redhat.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: dev@dpdk.org, maxime coquelin <maxime.coquelin@redhat.com>, Yuhui Jiang
 <yujiang@redhat.com>, Wei Shi <wshi@redhat.com>
Date: Sat, 08 Dec 2018 10:10:19 +0200
In-Reply-To: <20181207111841.29450b51@xeon-e3>
References: <1543575881.5400.33.camel@redhat.com>
 <20181130102756.41332fc2@xeon-e3>
 <1879110132.59852748.1543604812639.JavaMail.zimbra@redhat.com>
 <20181204084858.03ecdf98@shemminger-XPS-13-9360>
 <1543942571.5400.38.camel@redhat.com> <20181205143238.5b4b1ae7@xeon-e3>
 <1544181343.5629.1.camel@redhat.com> <20181207111841.29450b51@xeon-e3>
Organization: Red Hat
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.49]); Sat, 08 Dec 2018 08:10:26 +0000 (UTC)
Subject: Re: [dpdk-dev] Problems running netvsc multiq
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: mgamal@redhat.com
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Dec 2018 08:10:28 -0000

On Fri, 2018-12-07 at 11:18 -0800, Stephen Hemminger wrote:
> On Fri, 07 Dec 2018 13:15:43 +0200
> Mohammed Gamal <mgamal@redhat.com> wrote:
> 
> > On Wed, 2018-12-05 at 14:32 -0800, Stephen Hemminger wrote:
> > > The problem is a regression in 4.20 kernel. Bisecting now.  
> > 
> > I was bisecting the kernel and the change that seems to introduce
> > this
> > regression is this one:
> > 
> > commit ae6935ed7d424ffa74d634da00767e7b03c98fd3
> > Author: Stephen Hemminger <stephen@networkplumber.org>
> > Date:   Fri Sep 14 09:10:17 2018 -0700
> > 
> >     vmbus: split ring buffer allocation from open
> >     
> >     The UIO driver needs the ring buffer to be persistent(reused)
> >     across open/close. Split the allocation and setup of ring
> > buffer
> >     out of vmbus_open. For normal usage vmbus_open/vmbus_close
> > there
> >     are no changes; only impacts uio_hv_generic which needs to keep
> >     ring buffer memory and reuse when application restarts.
> >     
> >     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> 
> Patch posted: 
> 
> From stephen@networkplumber.org Fri Dec  7 10:58:47 2018
> From: Stephen Hemminger <stephen@networkplumber.org>
> Subject: [PATCH] vmbus: fix subchannel removal
> 
> The changes to split ring allocation from open/close, broke
> the cleanup of subchannels. This resulted in problems using
> uio on network devices because the subchannel was left behind
> when the network device was unbound.
> 
> The cause was in the disconnect logic which used list splice
> to move the subchannel list into a local variable. This won't
> work because the subchannel list is needed later during the
> process of the rescind messages (relid2channel).
> 
> The fix is to just leave the subchannel list in place
> which is what the original code did. The list is cleaned
> up later when the host rescind is processed.
> 
> Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---
>  drivers/hv/channel.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index fe00b12e4417..bea4c9850247 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -701,20 +701,12 @@ static int vmbus_close_internal(struct
> vmbus_channel *channel)
>  int vmbus_disconnect_ring(struct vmbus_channel *channel)
>  {
>  	struct vmbus_channel *cur_channel, *tmp;
> -	unsigned long flags;
> -	LIST_HEAD(list);
>  	int ret;
>  
>  	if (channel->primary_channel != NULL)
>  		return -EINVAL;
>  
> -	/* Snapshot the list of subchannels */
> -	spin_lock_irqsave(&channel->lock, flags);
> -	list_splice_init(&channel->sc_list, &list);
> -	channel->num_sc = 0;
> -	spin_unlock_irqrestore(&channel->lock, flags);
> -
> -	list_for_each_entry_safe(cur_channel, tmp, &list, sc_list) {
> +	list_for_each_entry_safe(cur_channel, tmp, &channel-
> >sc_list, sc_list) {
>  		if (cur_channel->rescind)
>  			wait_for_completion(&cur_channel-
> >rescind_event);
> 

Hi Stephen,
This works indeed for the first run. In any subsequent runs, I get this

testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=155456,
size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
hn_dev_configure():  >>
hn_rndis_link_status(): link status 0x40020006
hn_subchan_configure(): open 1 subchannels
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
[...]
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
vmbus_uio_get_subchan(): ring mmap not found (yet) for: 19
^C
Signal 2 received, preparing to exit...
LATENCY_STATS: failed to remove Rx callback for pid=0, qid=0
LATENCY_STATS: failed to remove Rx callback for pid=0, qid=1
LATENCY_STATS: failed to remove Tx callback for pid=0, qid=0
LATENCY_STATS: failed to remove Tx callback for pid=0, qid=1

Shutting down port 0...
Stopping ports...
Done
Closing ports...
Port 0 is now not stopped
Done
Bye...

Do you see that on your end as well?