* [dpdk-dev] [PATCH] vhost: call destroy_device always in vhost_destroy_device_notify @ 2021-09-03 7:56 Li Feng 2021-09-03 8:02 ` [dpdk-dev] [PATCH v2] " Li Feng 0 siblings, 1 reply; 9+ messages in thread From: Li Feng @ 2021-09-03 7:56 UTC (permalink / raw) To: Maxime Coquelin, Chenbo Xia; +Cc: dev, Li Feng Vhost-user client must send the mem table, kick fd, call fd on all vas, then the device will be VIRTIO_DEV_RUNNING. If the vhost-user communication is initialized partly, e.g. - When initializing the vhost-user, try to restart the vhost-user backend; - Seabios only initialized the vhost-scsi req vq. The device is not with flags VIRTIO_DEV_RUNNING.. Root Cause: The vhost session has been created, and added the scsi/blk requestq poller into reactor, but when destroying the device, the requestq is not unregistered. Reproduce the crash on spdk vhost-user backend: 1. Create a VM; 2. Mount a ISO to a VM, start the VM, don't install the OS; 3. Restart the spdk_tgt; Another discusstion is in seabiso: https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ Signed-off-by: Li Feng <fengli@smartx.com> --- lib/vhost/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index 355ff37651..191ba82c41 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) if (vdpa_dev) vdpa_dev->ops->dev_close(dev->vid); dev->flags &= ~VIRTIO_DEV_RUNNING; - dev->notify_ops->destroy_device(dev->vid); } + dev->notify_ops->destroy_device(dev->vid); } /* -- 2.31.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-09-03 7:56 [dpdk-dev] [PATCH] vhost: call destroy_device always in vhost_destroy_device_notify Li Feng @ 2021-09-03 8:02 ` Li Feng 2021-10-14 12:00 ` Maxime Coquelin 0 siblings, 1 reply; 9+ messages in thread From: Li Feng @ 2021-09-03 8:02 UTC (permalink / raw) To: Maxime Coquelin, Chenbo Xia; +Cc: dev, Li Feng Vhost-user client must send the mem table, kick fd, call fd on all virtqueues, then the device will be VIRTIO_DEV_RUNNING. If the vhost-user communication is initialized partly, e.g. - When initializing the vhost-user, try to restart the vhost-user backend; - Seabios only initialized the vhost-scsi req vq. The device is not with flags VIRTIO_DEV_RUNNING.. Root Cause: The vhost session has been created, and added the scsi/blk requestq poller into reactor, but when destroying the device, the requestq is not unregistered. Reproduce the crash on spdk vhost-user backend: 1. Create a VM; 2. Mount a ISO to a VM, start the VM, don't install the OS; 3. Restart the spdk_tgt; Another discusstion is in seabiso: https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ Signed-off-by: Li Feng <fengli@smartx.com> --- v2: Fix the commit msg typo: vas -> virtqueues. -- lib/vhost/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c index 355ff37651..191ba82c41 100644 --- a/lib/vhost/vhost.c +++ b/lib/vhost/vhost.c @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) if (vdpa_dev) vdpa_dev->ops->dev_close(dev->vid); dev->flags &= ~VIRTIO_DEV_RUNNING; - dev->notify_ops->destroy_device(dev->vid); } + dev->notify_ops->destroy_device(dev->vid); } /* -- 2.31.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-09-03 8:02 ` [dpdk-dev] [PATCH v2] " Li Feng @ 2021-10-14 12:00 ` Maxime Coquelin 2021-10-14 12:12 ` Li Feng 0 siblings, 1 reply; 9+ messages in thread From: Maxime Coquelin @ 2021-10-14 12:00 UTC (permalink / raw) To: Li Feng, Chenbo Xia; +Cc: dev Hi Li, The commit message is not compliant with the contributors guidelines: https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line On 9/3/21 10:02, Li Feng wrote: > Vhost-user client must send the mem table, kick fd, call fd on all > virtqueues, then the device will be VIRTIO_DEV_RUNNING. > > If the vhost-user communication is initialized partly, e.g. > - When initializing the vhost-user, try to restart the vhost-user > backend; > - Seabios only initialized the vhost-scsi req vq. > The device is not with flags VIRTIO_DEV_RUNNING.. > > Root Cause: > The vhost session has been created, and added the scsi/blk requestq > poller into reactor, but when destroying the device, the requestq is not > unregistered. > > Reproduce the crash on spdk vhost-user backend: > 1. Create a VM; > 2. Mount a ISO to a VM, start the VM, don't install the OS; > 3. Restart the spdk_tgt; > > Another discusstion is in seabiso: > https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ This is a fix, so you need to add the Fixes tag and cc stable. > Signed-off-by: Li Feng <fengli@smartx.com> > --- > v2: > Fix the commit msg typo: vas -> virtqueues. > -- > lib/vhost/vhost.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > index 355ff37651..191ba82c41 100644 > --- a/lib/vhost/vhost.c > +++ b/lib/vhost/vhost.c > @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) > if (vdpa_dev) > vdpa_dev->ops->dev_close(dev->vid); > dev->flags &= ~VIRTIO_DEV_RUNNING; > - dev->notify_ops->destroy_device(dev->vid); > } > + dev->notify_ops->destroy_device(dev->vid); .destroy_device() is the counter-part of .new_device(). VIRTIO_DEV_RUNNING is set only when .new_device() has been called with success, and cleared when .destroy_device() is called. So I disagree with the fix, we want to keep the correlation between VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise could lead to regressions with other applications than yours. What is not clear from the commit message or the discussion you link is where does it crash exactly. Is it in SPDK, in DPDK? Regards, Maxime > } > > /* > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-10-14 12:00 ` Maxime Coquelin @ 2021-10-14 12:12 ` Li Feng 2021-10-14 12:28 ` Maxime Coquelin 0 siblings, 1 reply; 9+ messages in thread From: Li Feng @ 2021-10-14 12:12 UTC (permalink / raw) To: Maxime Coquelin; +Cc: Chenbo Xia, dev On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin <maxime.coquelin@redhat.com> wrote: > > Hi Li, > > The commit message is not compliant with the contributors guidelines: > https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line OK, I got it. > > On 9/3/21 10:02, Li Feng wrote: > > Vhost-user client must send the mem table, kick fd, call fd on all > > virtqueues, then the device will be VIRTIO_DEV_RUNNING. > > > > If the vhost-user communication is initialized partly, e.g. > > - When initializing the vhost-user, try to restart the vhost-user > > backend; > > - Seabios only initialized the vhost-scsi req vq. > > The device is not with flags VIRTIO_DEV_RUNNING.. > > > > Root Cause: > > The vhost session has been created, and added the scsi/blk requestq > > poller into reactor, but when destroying the device, the requestq is not > > unregistered. > > > > Reproduce the crash on spdk vhost-user backend: > > 1. Create a VM; > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > 3. Restart the spdk_tgt; > > > > Another discusstion is in seabiso: > > https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ > > This is a fix, so you need to add the Fixes tag and cc stable. Acked. > > > Signed-off-by: Li Feng <fengli@smartx.com> > > --- > > v2: > > Fix the commit msg typo: vas -> virtqueues. > > -- > > lib/vhost/vhost.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > > index 355ff37651..191ba82c41 100644 > > --- a/lib/vhost/vhost.c > > +++ b/lib/vhost/vhost.c > > @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) > > if (vdpa_dev) > > vdpa_dev->ops->dev_close(dev->vid); > > dev->flags &= ~VIRTIO_DEV_RUNNING; > > - dev->notify_ops->destroy_device(dev->vid); > > } > > + dev->notify_ops->destroy_device(dev->vid); > > .destroy_device() is the counter-part of .new_device(). > VIRTIO_DEV_RUNNING is set only when .new_device() has been called with > success, and cleared when .destroy_device() is called. > > So I disagree with the fix, we want to keep the correlation between > VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise > could lead to regressions with other applications than yours. > > What is not clear from the commit message or the discussion you link is > where does it crash exactly. Is it in SPDK, in DPDK? The crash is in SPDK, the poller is still running in the reactor, however, the device is freed. I really don't have a good method to handle this partly initialized virt queues. This is another patch I prepared to fix this issue: From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001 From: Li Feng <fengli@smartx.com> Date: Wed, 1 Sep 2021 16:51:44 +0800 Subject: [PATCH] vhost: fix vhost session crash If any vq is inited well, treat the dev is RUNNING status. Root Cause: The session has been created, and added the requestq poller into reactor, but when destroying the device, the requestq is not unregistered. The seabios only initialized the req vq(idx = 2), ignore the controlq and eventq vq. Reproduce: 1. Create a VM; 2. Mount a ISO to a VM, start the VM, don't install the OS; 3. Restart the zbs-chunkd; Signed-off-by: Li Feng <fengli@smartx.com> Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 --- lib/librte_vhost/vhost_user.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index f211ec8..a80e9f4 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) "kickfd: %d callfd: %d enabled: %d\n", dev->ifname, vq, i, vq->desc, vq->avail, vq->used, vq->kickfd, vq->callfd, vq->enabled); - if (!vq_is_ready(dev, vq)) - return 0; + if (vq_is_ready(dev, vq)) + break; } + if (i == nr_vring) + return 0; /* If supported, ensure the frontend is really done with config */ if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) -- 2.1.0 Do you have any better ideas about this issue? Thanks. > > Regards, > Maxime > > > } > > > > /* > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-10-14 12:12 ` Li Feng @ 2021-10-14 12:28 ` Maxime Coquelin 2021-10-14 12:49 ` Li Feng 0 siblings, 1 reply; 9+ messages in thread From: Maxime Coquelin @ 2021-10-14 12:28 UTC (permalink / raw) To: Li Feng; +Cc: Chenbo Xia, dev On 10/14/21 14:12, Li Feng wrote: > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > <maxime.coquelin@redhat.com> wrote: >> >> Hi Li, >> >> The commit message is not compliant with the contributors guidelines: >> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line > OK, I got it. >> >> On 9/3/21 10:02, Li Feng wrote: >>> Vhost-user client must send the mem table, kick fd, call fd on all >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. >>> >>> If the vhost-user communication is initialized partly, e.g. >>> - When initializing the vhost-user, try to restart the vhost-user >>> backend; >>> - Seabios only initialized the vhost-scsi req vq. >>> The device is not with flags VIRTIO_DEV_RUNNING.. >>> >>> Root Cause: >>> The vhost session has been created, and added the scsi/blk requestq >>> poller into reactor, but when destroying the device, the requestq is not >>> unregistered. >>> >>> Reproduce the crash on spdk vhost-user backend: >>> 1. Create a VM; >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; >>> 3. Restart the spdk_tgt; >>> >>> Another discusstion is in seabiso: >>> https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ >> >> This is a fix, so you need to add the Fixes tag and cc stable. > Acked. > >> >>> Signed-off-by: Li Feng <fengli@smartx.com> >>> --- >>> v2: >>> Fix the commit msg typo: vas -> virtqueues. >>> -- >>> lib/vhost/vhost.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c >>> index 355ff37651..191ba82c41 100644 >>> --- a/lib/vhost/vhost.c >>> +++ b/lib/vhost/vhost.c >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) >>> if (vdpa_dev) >>> vdpa_dev->ops->dev_close(dev->vid); >>> dev->flags &= ~VIRTIO_DEV_RUNNING; >>> - dev->notify_ops->destroy_device(dev->vid); >>> } >>> + dev->notify_ops->destroy_device(dev->vid); >> >> .destroy_device() is the counter-part of .new_device(). >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called with >> success, and cleared when .destroy_device() is called. >> >> So I disagree with the fix, we want to keep the correlation between >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise >> could lead to regressions with other applications than yours. >> >> What is not clear from the commit message or the discussion you link is >> where does it crash exactly. Is it in SPDK, in DPDK? > > The crash is in SPDK, the poller is still running in the reactor, > however, the device is freed. > > I really don't have a good method to handle this partly initialized virt queues. > This is another patch I prepared to fix this issue: > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001 > From: Li Feng <fengli@smartx.com> > Date: Wed, 1 Sep 2021 16:51:44 +0800 > Subject: [PATCH] vhost: fix vhost session crash > > If any vq is inited well, treat the dev is RUNNING status. > > Root Cause: > The session has been created, and added the requestq poller into > reactor, but when destroying the device, the requestq is not > unregistered. > The seabios only initialized the req vq(idx = 2), ignore the controlq > and eventq vq. > > Reproduce: > 1. Create a VM; > 2. Mount a ISO to a VM, start the VM, don't install the OS; > 3. Restart the zbs-chunkd; > > Signed-off-by: Li Feng <fengli@smartx.com> > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > --- > lib/librte_vhost/vhost_user.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > index f211ec8..a80e9f4 100644 > --- a/lib/librte_vhost/vhost_user.c > +++ b/lib/librte_vhost/vhost_user.c > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > "kickfd: %d callfd: %d enabled: %d\n", > dev->ifname, vq, i, vq->desc, vq->avail, > vq->used, vq->kickfd, vq->callfd, vq->enabled); > - if (!vq_is_ready(dev, vq)) > - return 0; > + if (vq_is_ready(dev, vq)) > + break; > } > + if (i == nr_vring) > + return 0; > > /* If supported, ensure the frontend is really done with config */ > if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) > Above patch will also cause regression, as networking backends work with queue pairs. So your issue is that SPDK is processing vrings while DPDK considers the device as not running. Instead of working around that issue, maybe what you should do is to introduce a new API and mechanism to help DPDK determine whether it should consider the device ready based on the backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ 2 is ready? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-10-14 12:28 ` Maxime Coquelin @ 2021-10-14 12:49 ` Li Feng 2021-11-05 3:18 ` Li Feng 0 siblings, 1 reply; 9+ messages in thread From: Li Feng @ 2021-10-14 12:49 UTC (permalink / raw) To: Maxime Coquelin; +Cc: Chenbo Xia, dev Thanks, Feng Li On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin <maxime.coquelin@redhat.com> wrote: > > > > On 10/14/21 14:12, Li Feng wrote: > > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > > <maxime.coquelin@redhat.com> wrote: > >> > >> Hi Li, > >> > >> The commit message is not compliant with the contributors guidelines: > >> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line > > OK, I got it. > >> > >> On 9/3/21 10:02, Li Feng wrote: > >>> Vhost-user client must send the mem table, kick fd, call fd on all > >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. > >>> > >>> If the vhost-user communication is initialized partly, e.g. > >>> - When initializing the vhost-user, try to restart the vhost-user > >>> backend; > >>> - Seabios only initialized the vhost-scsi req vq. > >>> The device is not with flags VIRTIO_DEV_RUNNING.. > >>> > >>> Root Cause: > >>> The vhost session has been created, and added the scsi/blk requestq > >>> poller into reactor, but when destroying the device, the requestq is not > >>> unregistered. > >>> > >>> Reproduce the crash on spdk vhost-user backend: > >>> 1. Create a VM; > >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; > >>> 3. Restart the spdk_tgt; > >>> > >>> Another discusstion is in seabiso: > >>> https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ > >> > >> This is a fix, so you need to add the Fixes tag and cc stable. > > Acked. > > > >> > >>> Signed-off-by: Li Feng <fengli@smartx.com> > >>> --- > >>> v2: > >>> Fix the commit msg typo: vas -> virtqueues. > >>> -- > >>> lib/vhost/vhost.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > >>> index 355ff37651..191ba82c41 100644 > >>> --- a/lib/vhost/vhost.c > >>> +++ b/lib/vhost/vhost.c > >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) > >>> if (vdpa_dev) > >>> vdpa_dev->ops->dev_close(dev->vid); > >>> dev->flags &= ~VIRTIO_DEV_RUNNING; > >>> - dev->notify_ops->destroy_device(dev->vid); > >>> } > >>> + dev->notify_ops->destroy_device(dev->vid); > >> > >> .destroy_device() is the counter-part of .new_device(). > >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called with > >> success, and cleared when .destroy_device() is called. > >> > >> So I disagree with the fix, we want to keep the correlation between > >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise > >> could lead to regressions with other applications than yours. > >> > >> What is not clear from the commit message or the discussion you link is > >> where does it crash exactly. Is it in SPDK, in DPDK? > > > > The crash is in SPDK, the poller is still running in the reactor, > > however, the device is freed. > > > > I really don't have a good method to handle this partly initialized virt queues. > > This is another patch I prepared to fix this issue: > > > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001 > > From: Li Feng <fengli@smartx.com> > > Date: Wed, 1 Sep 2021 16:51:44 +0800 > > Subject: [PATCH] vhost: fix vhost session crash > > > > If any vq is inited well, treat the dev is RUNNING status. > > > > Root Cause: > > The session has been created, and added the requestq poller into > > reactor, but when destroying the device, the requestq is not > > unregistered. > > The seabios only initialized the req vq(idx = 2), ignore the controlq > > and eventq vq. > > > > Reproduce: > > 1. Create a VM; > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > 3. Restart the zbs-chunkd; > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > > --- > > lib/librte_vhost/vhost_user.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > index f211ec8..a80e9f4 100644 > > --- a/lib/librte_vhost/vhost_user.c > > +++ b/lib/librte_vhost/vhost_user.c > > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > > "kickfd: %d callfd: %d enabled: %d\n", > > dev->ifname, vq, i, vq->desc, vq->avail, > > vq->used, vq->kickfd, vq->callfd, vq->enabled); > > - if (!vq_is_ready(dev, vq)) > > - return 0; > > + if (vq_is_ready(dev, vq)) > > + break; > > } > > + if (i == nr_vring) > > + return 0; > > > > /* If supported, ensure the frontend is really done with config */ > > if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) > > > > Above patch will also cause regression, as networking backends work > with queue pairs. > > So your issue is that SPDK is processing vrings while DPDK considers the > device as not running. Instead of working around that issue, maybe what > you should do is to introduce a new API and mechanism to help DPDK > determine whether it should consider the device ready based on the > backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ > 2 is ready? Yes, VQ 2 is ready, the vhost-iscsi device should be ready. I had this idea months ago, if it's acceptable, I will try to do it. Thanks. > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-10-14 12:49 ` Li Feng @ 2021-11-05 3:18 ` Li Feng 2021-11-05 4:21 ` Liu, Changpeng 0 siblings, 1 reply; 9+ messages in thread From: Li Feng @ 2021-11-05 3:18 UTC (permalink / raw) To: Liu, Changpeng; +Cc: Chenbo Xia, dev, Maxime Coquelin Add Liu Changpeng. On Thu, Oct 14, 2021 at 8:49 PM Li Feng <fengli@smartx.com> wrote: > > Thanks, > Feng Li > > On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin > <maxime.coquelin@redhat.com> wrote: > > > > > > > > On 10/14/21 14:12, Li Feng wrote: > > > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > > > <maxime.coquelin@redhat.com> wrote: > > >> > > >> Hi Li, > > >> > > >> The commit message is not compliant with the contributors guidelines: > > >> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-subject-line > > > OK, I got it. > > >> > > >> On 9/3/21 10:02, Li Feng wrote: > > >>> Vhost-user client must send the mem table, kick fd, call fd on all > > >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. > > >>> > > >>> If the vhost-user communication is initialized partly, e.g. > > >>> - When initializing the vhost-user, try to restart the vhost-user > > >>> backend; > > >>> - Seabios only initialized the vhost-scsi req vq. > > >>> The device is not with flags VIRTIO_DEV_RUNNING.. > > >>> > > >>> Root Cause: > > >>> The vhost session has been created, and added the scsi/blk requestq > > >>> poller into reactor, but when destroying the device, the requestq is not > > >>> unregistered. > > >>> > > >>> Reproduce the crash on spdk vhost-user backend: > > >>> 1. Create a VM; > > >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; > > >>> 3. Restart the spdk_tgt; > > >>> > > >>> Another discusstion is in seabiso: > > >>> https://patchew.org/Seabios/20210831122339.2591585-1-fengli@smartx.com/ > > >> > > >> This is a fix, so you need to add the Fixes tag and cc stable. > > > Acked. > > > > > >> > > >>> Signed-off-by: Li Feng <fengli@smartx.com> > > >>> --- > > >>> v2: > > >>> Fix the commit msg typo: vas -> virtqueues. > > >>> -- > > >>> lib/vhost/vhost.c | 2 +- > > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > > >>> > > >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > > >>> index 355ff37651..191ba82c41 100644 > > >>> --- a/lib/vhost/vhost.c > > >>> +++ b/lib/vhost/vhost.c > > >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net *dev) > > >>> if (vdpa_dev) > > >>> vdpa_dev->ops->dev_close(dev->vid); > > >>> dev->flags &= ~VIRTIO_DEV_RUNNING; > > >>> - dev->notify_ops->destroy_device(dev->vid); > > >>> } > > >>> + dev->notify_ops->destroy_device(dev->vid); > > >> > > >> .destroy_device() is the counter-part of .new_device(). > > >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called with > > >> success, and cleared when .destroy_device() is called. > > >> > > >> So I disagree with the fix, we want to keep the correlation between > > >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing otherwise > > >> could lead to regressions with other applications than yours. > > >> > > >> What is not clear from the commit message or the discussion you link is > > >> where does it crash exactly. Is it in SPDK, in DPDK? > > > > > > The crash is in SPDK, the poller is still running in the reactor, > > > however, the device is freed. > > > > > > I really don't have a good method to handle this partly initialized virt queues. > > > This is another patch I prepared to fix this issue: > > > > > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 00:00:00 2001 > > > From: Li Feng <fengli@smartx.com> > > > Date: Wed, 1 Sep 2021 16:51:44 +0800 > > > Subject: [PATCH] vhost: fix vhost session crash > > > > > > If any vq is inited well, treat the dev is RUNNING status. > > > > > > Root Cause: > > > The session has been created, and added the requestq poller into > > > reactor, but when destroying the device, the requestq is not > > > unregistered. > > > The seabios only initialized the req vq(idx = 2), ignore the controlq > > > and eventq vq. > > > > > > Reproduce: > > > 1. Create a VM; > > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > > 3. Restart the zbs-chunkd; > > > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > > > --- > > > lib/librte_vhost/vhost_user.c | 6 ++++-- > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > > index f211ec8..a80e9f4 100644 > > > --- a/lib/librte_vhost/vhost_user.c > > > +++ b/lib/librte_vhost/vhost_user.c > > > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > > > "kickfd: %d callfd: %d enabled: %d\n", > > > dev->ifname, vq, i, vq->desc, vq->avail, > > > vq->used, vq->kickfd, vq->callfd, vq->enabled); > > > - if (!vq_is_ready(dev, vq)) > > > - return 0; > > > + if (vq_is_ready(dev, vq)) > > > + break; > > > } > > > + if (i == nr_vring) > > > + return 0; > > > > > > /* If supported, ensure the frontend is really done with config */ > > > if (dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_STATUS)) > > > > > > > Above patch will also cause regression, as networking backends work > > with queue pairs. > > > > So your issue is that SPDK is processing vrings while DPDK considers the > > device as not running. Instead of working around that issue, maybe what > > you should do is to introduce a new API and mechanism to help DPDK > > determine whether it should consider the device ready based on the > > backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ > > 2 is ready? > > Yes, VQ 2 is ready, the vhost-iscsi device should be ready. > I had this idea months ago, if it's acceptable, I will try to do it. > > Thanks. > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-11-05 3:18 ` Li Feng @ 2021-11-05 4:21 ` Liu, Changpeng 2021-11-05 4:38 ` Li Feng 0 siblings, 1 reply; 9+ messages in thread From: Liu, Changpeng @ 2021-11-05 4:21 UTC (permalink / raw) To: Li Feng; +Cc: Xia, Chenbo, dev, Maxime Coquelin DPDK already provides pre_msg_handle and post_msg_handle to let vhost-backend to process the differences for different virtio devices. And it's not good idea to start all the virtio queues in SeaBIOS, we tried that long time ago. I suggest you to submit a SPDK issue so that we can take a look too. > -----Original Message----- > From: Li Feng <fengli@smartx.com> > Sent: Friday, November 5, 2021 11:18 AM > To: Liu, Changpeng <changpeng.liu@intel.com> > Cc: Xia, Chenbo <chenbo.xia@intel.com>; dev <dev@dpdk.org>; Maxime > Coquelin <maxime.coquelin@redhat.com> > Subject: Re: [PATCH v2] vhost: call destroy_device always in > vhost_destroy_device_notify > > Add Liu Changpeng. > > > On Thu, Oct 14, 2021 at 8:49 PM Li Feng <fengli@smartx.com> wrote: > > > > Thanks, > > Feng Li > > > > On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin > > <maxime.coquelin@redhat.com> wrote: > > > > > > > > > > > > On 10/14/21 14:12, Li Feng wrote: > > > > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > > > > <maxime.coquelin@redhat.com> wrote: > > > >> > > > >> Hi Li, > > > >> > > > >> The commit message is not compliant with the contributors guidelines: > > > >> https://doc.dpdk.org/guides/contributing/patches.html#commit- > messages-subject-line > > > > OK, I got it. > > > >> > > > >> On 9/3/21 10:02, Li Feng wrote: > > > >>> Vhost-user client must send the mem table, kick fd, call fd on all > > > >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. > > > >>> > > > >>> If the vhost-user communication is initialized partly, e.g. > > > >>> - When initializing the vhost-user, try to restart the vhost-user > > > >>> backend; > > > >>> - Seabios only initialized the vhost-scsi req vq. > > > >>> The device is not with flags VIRTIO_DEV_RUNNING.. > > > >>> > > > >>> Root Cause: > > > >>> The vhost session has been created, and added the scsi/blk requestq > > > >>> poller into reactor, but when destroying the device, the requestq is not > > > >>> unregistered. > > > >>> > > > >>> Reproduce the crash on spdk vhost-user backend: > > > >>> 1. Create a VM; > > > >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; > > > >>> 3. Restart the spdk_tgt; > > > >>> > > > >>> Another discusstion is in seabiso: > > > >>> https://patchew.org/Seabios/20210831122339.2591585-1- > fengli@smartx.com/ > > > >> > > > >> This is a fix, so you need to add the Fixes tag and cc stable. > > > > Acked. > > > > > > > >> > > > >>> Signed-off-by: Li Feng <fengli@smartx.com> > > > >>> --- > > > >>> v2: > > > >>> Fix the commit msg typo: vas -> virtqueues. > > > >>> -- > > > >>> lib/vhost/vhost.c | 2 +- > > > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > > > >>> > > > >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > > > >>> index 355ff37651..191ba82c41 100644 > > > >>> --- a/lib/vhost/vhost.c > > > >>> +++ b/lib/vhost/vhost.c > > > >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net > *dev) > > > >>> if (vdpa_dev) > > > >>> vdpa_dev->ops->dev_close(dev->vid); > > > >>> dev->flags &= ~VIRTIO_DEV_RUNNING; > > > >>> - dev->notify_ops->destroy_device(dev->vid); > > > >>> } > > > >>> + dev->notify_ops->destroy_device(dev->vid); > > > >> > > > >> .destroy_device() is the counter-part of .new_device(). > > > >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called > with > > > >> success, and cleared when .destroy_device() is called. > > > >> > > > >> So I disagree with the fix, we want to keep the correlation between > > > >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing > otherwise > > > >> could lead to regressions with other applications than yours. > > > >> > > > >> What is not clear from the commit message or the discussion you link is > > > >> where does it crash exactly. Is it in SPDK, in DPDK? > > > > > > > > The crash is in SPDK, the poller is still running in the reactor, > > > > however, the device is freed. > > > > > > > > I really don't have a good method to handle this partly initialized virt queues. > > > > This is another patch I prepared to fix this issue: > > > > > > > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 > 00:00:00 2001 > > > > From: Li Feng <fengli@smartx.com> > > > > Date: Wed, 1 Sep 2021 16:51:44 +0800 > > > > Subject: [PATCH] vhost: fix vhost session crash > > > > > > > > If any vq is inited well, treat the dev is RUNNING status. > > > > > > > > Root Cause: > > > > The session has been created, and added the requestq poller into > > > > reactor, but when destroying the device, the requestq is not > > > > unregistered. > > > > The seabios only initialized the req vq(idx = 2), ignore the controlq > > > > and eventq vq. > > > > > > > > Reproduce: > > > > 1. Create a VM; > > > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > > > 3. Restart the zbs-chunkd; > > > > > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > > > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > > > > --- > > > > lib/librte_vhost/vhost_user.c | 6 ++++-- > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > > > index f211ec8..a80e9f4 100644 > > > > --- a/lib/librte_vhost/vhost_user.c > > > > +++ b/lib/librte_vhost/vhost_user.c > > > > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > > > > "kickfd: %d callfd: %d enabled: %d\n", > > > > dev->ifname, vq, i, vq->desc, vq->avail, > > > > vq->used, vq->kickfd, vq->callfd, vq->enabled); > > > > - if (!vq_is_ready(dev, vq)) > > > > - return 0; > > > > + if (vq_is_ready(dev, vq)) > > > > + break; > > > > } > > > > + if (i == nr_vring) > > > > + return 0; > > > > > > > > /* If supported, ensure the frontend is really done with config */ > > > > if (dev->protocol_features & (1ULL << > VHOST_USER_PROTOCOL_F_STATUS)) > > > > > > > > > > Above patch will also cause regression, as networking backends work > > > with queue pairs. > > > > > > So your issue is that SPDK is processing vrings while DPDK considers the > > > device as not running. Instead of working around that issue, maybe what > > > you should do is to introduce a new API and mechanism to help DPDK > > > determine whether it should consider the device ready based on the > > > backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ > > > 2 is ready? > > > > Yes, VQ 2 is ready, the vhost-iscsi device should be ready. > > I had this idea months ago, if it's acceptable, I will try to do it. > > > > Thanks. > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: call destroy_device always in vhost_destroy_device_notify 2021-11-05 4:21 ` Liu, Changpeng @ 2021-11-05 4:38 ` Li Feng 0 siblings, 0 replies; 9+ messages in thread From: Li Feng @ 2021-11-05 4:38 UTC (permalink / raw) To: Liu, Changpeng; +Cc: Xia, Chenbo, dev, Maxime Coquelin On Fri, Nov 5, 2021 at 12:21 PM Liu, Changpeng <changpeng.liu@intel.com> wrote: > > DPDK already provides pre_msg_handle and post_msg_handle to let vhost-backend to process > the differences for different virtio devices. And it's not good idea to start all the virtio queues in SeaBIOS, > we tried that long time ago. I suggest you to submit a SPDK issue so that we can take a look too. > Thanks, I think just changing spdk may not solve this problem. OK, I have submit a issue here: https://github.com/spdk/spdk/issues/2228 > > -----Original Message----- > > From: Li Feng <fengli@smartx.com> > > Sent: Friday, November 5, 2021 11:18 AM > > To: Liu, Changpeng <changpeng.liu@intel.com> > > Cc: Xia, Chenbo <chenbo.xia@intel.com>; dev <dev@dpdk.org>; Maxime > > Coquelin <maxime.coquelin@redhat.com> > > Subject: Re: [PATCH v2] vhost: call destroy_device always in > > vhost_destroy_device_notify > > > > Add Liu Changpeng. > > > > > > On Thu, Oct 14, 2021 at 8:49 PM Li Feng <fengli@smartx.com> wrote: > > > > > > Thanks, > > > Feng Li > > > > > > On Thu, Oct 14, 2021 at 8:28 PM Maxime Coquelin > > > <maxime.coquelin@redhat.com> wrote: > > > > > > > > > > > > > > > > On 10/14/21 14:12, Li Feng wrote: > > > > > On Thu, Oct 14, 2021 at 8:01 PM Maxime Coquelin > > > > > <maxime.coquelin@redhat.com> wrote: > > > > >> > > > > >> Hi Li, > > > > >> > > > > >> The commit message is not compliant with the contributors guidelines: > > > > >> https://doc.dpdk.org/guides/contributing/patches.html#commit- > > messages-subject-line > > > > > OK, I got it. > > > > >> > > > > >> On 9/3/21 10:02, Li Feng wrote: > > > > >>> Vhost-user client must send the mem table, kick fd, call fd on all > > > > >>> virtqueues, then the device will be VIRTIO_DEV_RUNNING. > > > > >>> > > > > >>> If the vhost-user communication is initialized partly, e.g. > > > > >>> - When initializing the vhost-user, try to restart the vhost-user > > > > >>> backend; > > > > >>> - Seabios only initialized the vhost-scsi req vq. > > > > >>> The device is not with flags VIRTIO_DEV_RUNNING.. > > > > >>> > > > > >>> Root Cause: > > > > >>> The vhost session has been created, and added the scsi/blk requestq > > > > >>> poller into reactor, but when destroying the device, the requestq is not > > > > >>> unregistered. > > > > >>> > > > > >>> Reproduce the crash on spdk vhost-user backend: > > > > >>> 1. Create a VM; > > > > >>> 2. Mount a ISO to a VM, start the VM, don't install the OS; > > > > >>> 3. Restart the spdk_tgt; > > > > >>> > > > > >>> Another discusstion is in seabiso: > > > > >>> https://patchew.org/Seabios/20210831122339.2591585-1- > > fengli@smartx.com/ > > > > >> > > > > >> This is a fix, so you need to add the Fixes tag and cc stable. > > > > > Acked. > > > > > > > > > >> > > > > >>> Signed-off-by: Li Feng <fengli@smartx.com> > > > > >>> --- > > > > >>> v2: > > > > >>> Fix the commit msg typo: vas -> virtqueues. > > > > >>> -- > > > > >>> lib/vhost/vhost.c | 2 +- > > > > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > > > > >>> > > > > >>> diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c > > > > >>> index 355ff37651..191ba82c41 100644 > > > > >>> --- a/lib/vhost/vhost.c > > > > >>> +++ b/lib/vhost/vhost.c > > > > >>> @@ -710,8 +710,8 @@ vhost_destroy_device_notify(struct virtio_net > > *dev) > > > > >>> if (vdpa_dev) > > > > >>> vdpa_dev->ops->dev_close(dev->vid); > > > > >>> dev->flags &= ~VIRTIO_DEV_RUNNING; > > > > >>> - dev->notify_ops->destroy_device(dev->vid); > > > > >>> } > > > > >>> + dev->notify_ops->destroy_device(dev->vid); > > > > >> > > > > >> .destroy_device() is the counter-part of .new_device(). > > > > >> VIRTIO_DEV_RUNNING is set only when .new_device() has been called > > with > > > > >> success, and cleared when .destroy_device() is called. > > > > >> > > > > >> So I disagree with the fix, we want to keep the correlation between > > > > >> VIRTIO_DEV_RUNNING and .new_device()/.destroy_device(). Doing > > otherwise > > > > >> could lead to regressions with other applications than yours. > > > > >> > > > > >> What is not clear from the commit message or the discussion you link is > > > > >> where does it crash exactly. Is it in SPDK, in DPDK? > > > > > > > > > > The crash is in SPDK, the poller is still running in the reactor, > > > > > however, the device is freed. > > > > > > > > > > I really don't have a good method to handle this partly initialized virt queues. > > > > > This is another patch I prepared to fix this issue: > > > > > > > > > > From 63142ec60088d08b27b9657640b82e837557b5d4 Mon Sep 17 > > 00:00:00 2001 > > > > > From: Li Feng <fengli@smartx.com> > > > > > Date: Wed, 1 Sep 2021 16:51:44 +0800 > > > > > Subject: [PATCH] vhost: fix vhost session crash > > > > > > > > > > If any vq is inited well, treat the dev is RUNNING status. > > > > > > > > > > Root Cause: > > > > > The session has been created, and added the requestq poller into > > > > > reactor, but when destroying the device, the requestq is not > > > > > unregistered. > > > > > The seabios only initialized the req vq(idx = 2), ignore the controlq > > > > > and eventq vq. > > > > > > > > > > Reproduce: > > > > > 1. Create a VM; > > > > > 2. Mount a ISO to a VM, start the VM, don't install the OS; > > > > > 3. Restart the zbs-chunkd; > > > > > > > > > > Signed-off-by: Li Feng <fengli@smartx.com> > > > > > Change-Id: I21292e58b0b08237b5d105359095ec6a31907752 > > > > > --- > > > > > lib/librte_vhost/vhost_user.c | 6 ++++-- > > > > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c > > > > > index f211ec8..a80e9f4 100644 > > > > > --- a/lib/librte_vhost/vhost_user.c > > > > > +++ b/lib/librte_vhost/vhost_user.c > > > > > @@ -1394,9 +1394,11 @@ virtio_is_ready(struct virtio_net *dev) > > > > > "kickfd: %d callfd: %d enabled: %d\n", > > > > > dev->ifname, vq, i, vq->desc, vq->avail, > > > > > vq->used, vq->kickfd, vq->callfd, vq->enabled); > > > > > - if (!vq_is_ready(dev, vq)) > > > > > - return 0; > > > > > + if (vq_is_ready(dev, vq)) > > > > > + break; > > > > > } > > > > > + if (i == nr_vring) > > > > > + return 0; > > > > > > > > > > /* If supported, ensure the frontend is really done with config */ > > > > > if (dev->protocol_features & (1ULL << > > VHOST_USER_PROTOCOL_F_STATUS)) > > > > > > > > > > > > > Above patch will also cause regression, as networking backends work > > > > with queue pairs. > > > > > > > > So your issue is that SPDK is processing vrings while DPDK considers the > > > > device as not running. Instead of working around that issue, maybe what > > > > you should do is to introduce a new API and mechanism to help DPDK > > > > determine whether it should consider the device ready based on the > > > > backend type. IIUC, in your Vhost-scsi case, it should be as soon as VQ > > > > 2 is ready? > > > > > > Yes, VQ 2 is ready, the vhost-iscsi device should be ready. > > > I had this idea months ago, if it's acceptable, I will try to do it. > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-11-05 4:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-03 7:56 [dpdk-dev] [PATCH] vhost: call destroy_device always in vhost_destroy_device_notify Li Feng 2021-09-03 8:02 ` [dpdk-dev] [PATCH v2] " Li Feng 2021-10-14 12:00 ` Maxime Coquelin 2021-10-14 12:12 ` Li Feng 2021-10-14 12:28 ` Maxime Coquelin 2021-10-14 12:49 ` Li Feng 2021-11-05 3:18 ` Li Feng 2021-11-05 4:21 ` Liu, Changpeng 2021-11-05 4:38 ` Li Feng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).