DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: Kyle Larose <klarose@sandvine.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"huawei.xie@intel.com" <huawei.xie@intel.com>,
	"jianfeng.tan@intel.com" <jianfeng.tan@intel.com>
Subject: Re: [dpdk-dev] virtio kills qemu VM after stopping/starting ports
Date: Mon, 5 Sep 2016 12:10:19 +0800	[thread overview]
Message-ID: <20160905041019.GI30752@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <D76BBBCF97F57144BB5FCF08007244A731220D9B@wtl-exchp-2.sandvine.com>

On Thu, Sep 01, 2016 at 08:53:31PM +0000, Kyle Larose wrote:
> Hello everyone,

Hi,

Firstly, thanks for the report and detailed analysis!

> 
> In my own testing, I recently stumbled across an issue where I could get qemu to exit when sending traffic to my application. To do this, I simply needed to do the following:
> 
> 1) Start my virtio interfaces
> 2) Send some traffic into/out of the interfaces
> 3) Stop the interfaces
> 4) Start the interfaces
> 5) Send some more traffic
> 
> At this point, I would lose connectivity to my VM.  Further investigation revealed qemu exiting with the following log:
> 
> 	2016-09-01T15:45:32.119059Z qemu-kvm: Guest moved used index from 5 to 1
> 
> I found the following bug report against qemu, reported by a user of DPDK: https://bugs.launchpad.net/qemu/+bug/1558175
> 
> That thread seems to have stalled out, so I think we probably should deal with the problem within DPDK itself. Either way, later in the bug report chain, we see a link to this patch to DPDK: http://dpdk.org/browse/dpdk/commit/?id=9a0615af774648. The submitter of the bug report claims that this patch fixes the problem. Perhaps it does. However, it introduces a new problem: If I remove the patch, I cannot reproduce the problem. So, that leads me to believe that it has caused a regression.

Yes, it is a regression from that point of view.

> To summarize the patch’s changes, it basically changes the virtio_dev_stop function to flag the device as stopped, and stops the device when closing/uninitializing it. However, there is a seemingly unintended side-effect. 
> 
> In virtio_dev_start, we have the following block of code:
> 
> 	/* On restart after stop do not touch queues */
> 	if (hw->started)
> 		return 0;
> 
> 	/* Do final configuration before rx/tx engine starts */
> 	virtio_dev_rxtx_start(dev);
> 
> ….
> 
> Prior to the patch, if an interface were stopped then started, without restarting the application, the queues would be left as-is, because hw->started would be set to 1. Now, calling stop sets hw->started to 0, which means the next call to start will “touch the queues”. This is the unintended side-effect that causes the problem.
> 
> I made a change locally to break the state of the device into two: started and opened. The devices starts out neither started nor opened. If the device is accepting packets, it is started. If the device has set up its queues, it is opened. Stopping the device does not close the device. This allows me to change the check above to:
> 
>  	if (hw->opened) {
> 		hw->started=1
> 		return 0;
> 	}

It would work in your case, but it makes thing complex.

So, I talked with Jianfeng and revisited the original issue he meant to
fix: failure (maybe crash) on stop, re-configure queue number and restart.

Yes, that case is broken, but the fix wasn't right, neither: we can't
simply re-alloc and re-setup queue on start, because vhost is only aware
of the first setup.  You could check following link for more information,
including the right fix (you need follow the discussion to find that).

In summary, I will revert commit 9a0615af774 (and carry it to the stable
branch as well). Later, I will fix the virtio multiple queue issue.

	--yliu

  parent reply	other threads:[~2016-09-05  3:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-01 20:53 Kyle Larose
2016-09-02  6:55 ` Tan, Jianfeng
2016-09-02 12:35   ` Kyle Larose
2016-09-02 12:53     ` Tan, Jianfeng
2016-09-05  3:20       ` Tan, Jianfeng
2016-09-05  4:10 ` Yuanhan Liu [this message]
2016-09-06 12:25   ` Kyle Larose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160905041019.GI30752@yliu-dev.sh.intel.com \
    --to=yuanhan.liu@linux.intel.com \
    --cc=dev@dpdk.org \
    --cc=huawei.xie@intel.com \
    --cc=jianfeng.tan@intel.com \
    --cc=klarose@sandvine.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).