2024年3月21日 17:57,Haoqian He <haoqian.he@smartx.com> 写道:

We should cleanup vq resubmit info when set_inflight_fd
before set_vring_kick which will check if there is any
inflight io waiting for resubmission.

Otherwise, when the vm is rebooting immediately after
reconnecting to the vhost target (inflight io has not
been resubmitted yet), the vhost backend still use the
old resubmit info set when reconnection.

Signed-off-by: Haoqian He <haoqian.he@smartx.com>
---
lib/vhost/vhost_user.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 414192500e..7c54afc5fb 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1871,6 +1871,7 @@ vhost_user_set_inflight_fd(struct virtio_net **pdev,
if (!vq)
continue;

+ cleanup_vq_inflight(dev, vq);
if (vq_is_packed(dev)) {
vq->inflight_packed = addr;
vq->inflight_packed->desc_num = queue_size;
--
2.41.0


Ping.

Hi, Maxime.

This patch fix the potential error when VM reboot after vhost live recovery which
could lead to the VM hang as missing resubmit info cleanup.

If inflight io that should be resubmitted during the latest vhost reconnection has
not been submitted yet, so GET_VRING_BASE would not wait these inflight io,
at this time the resubmit info has been set and restart the VM immediately.

Currently, we do not cleanup the resubmit info before VM restart, so when VM
restarts, SET_VRING_KICK will resubmit these inflight io (If resubmit info is not
null, function set_vring_kick will return without updating resubmit info).

It’s an error, any stale inflight io should not be resubmitted after the VM restart.

Thanks,
Haoqian