* [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
@ 2018-03-05 16:11 Tomasz Kulasek
  2018-03-06 16:26 ` Maxime Coquelin
  0 siblings, 1 reply; 6+ messages in thread
From: Tomasz Kulasek @ 2018-03-05 16:11 UTC (permalink / raw)
  To: yliu
  Cc: daniel.verkamp, james.r.harris, pawelx.wodkowski, dev, Dariusz Stojaczyk
For now DPDK assumes that callfd, kickfd and last_idx are being set just
once during vring initialization and device cannot be running while DPDK
receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
However, that assumption is wrong. For Vhost SCSI messages might arrive
at any point of time, possibly multiple times, one after another.
QEMU issues SET_VRING_CALL once during device initialization, then again
during device start. The second message will close previous callfd,
which is still being used by the user-implementation of vhost device.
This results in writing to invalid (closed) callfd.
Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change
internal state of VQ or device. To prevent race condition device should
also be stopped before updateing vring data.
Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
Signed-off-by: Pawel Wodkowski <pawelx.wodkowski@intel.com>
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 lib/librte_vhost/vhost_user.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 65ee33919..3895e6edd 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -172,6 +172,10 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 
 		if (dev->notify_ops->features_changed)
 			dev->notify_ops->features_changed(dev->vid, features);
+		else {
+			dev->flags &= ~VIRTIO_DEV_RUNNING;
+			dev->notify_ops->destroy_device(dev->vid);
+		}
 	}
 
 	dev->features = features;
@@ -487,6 +491,12 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
 	if (dev->mem == NULL)
 		return -1;
 
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
 	vq = dev->virtqueue[msg->payload.addr.index];
 
@@ -517,6 +527,12 @@ static int
 vhost_user_set_vring_base(struct virtio_net *dev,
 			  VhostUserMsg *msg)
 {
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	dev->virtqueue[msg->payload.state.index]->last_used_idx  =
 			msg->payload.state.num;
 	dev->virtqueue[msg->payload.state.index]->last_avail_idx =
@@ -796,6 +812,12 @@ vhost_user_set_vring_call(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	struct vhost_vring_file file;
 	struct vhost_virtqueue *vq;
 
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
 	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
 		file.fd = VIRTIO_INVALID_EVENTFD;
@@ -818,6 +840,12 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *pmsg)
 	struct vhost_virtqueue *vq;
 	struct virtio_net *dev = *pdev;
 
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
 	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
 		file.fd = VIRTIO_INVALID_EVENTFD;
@@ -959,6 +987,12 @@ vhost_user_set_protocol_features(struct virtio_net *dev,
 	if (protocol_features & ~VHOST_USER_PROTOCOL_FEATURES)
 		return;
 
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	dev->protocol_features = protocol_features;
 }
 
@@ -981,6 +1015,12 @@ vhost_user_set_log_base(struct virtio_net *dev, struct VhostUserMsg *msg)
 		return -1;
 	}
 
+	/* Remove from the data plane. */
+	if (dev->flags & VIRTIO_DEV_RUNNING) {
+		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		dev->notify_ops->destroy_device(dev->vid);
+	}
+
 	size = msg->payload.log.mmap_size;
 	off  = msg->payload.log.mmap_offset;
 	RTE_LOG(INFO, VHOST_CONFIG,
-- 
2.14.1
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
  2018-03-05 16:11 [dpdk-dev] [PATCH] vhost: stop device before updating public vring data Tomasz Kulasek
@ 2018-03-06 16:26 ` Maxime Coquelin
  2018-03-07  9:16   ` Wodkowski, PawelX
  0 siblings, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2018-03-06 16:26 UTC (permalink / raw)
  To: Tomasz Kulasek, yliu
  Cc: daniel.verkamp, james.r.harris, pawelx.wodkowski, dev, Dariusz Stojaczyk
Hi Tomasz,
On 03/05/2018 05:11 PM, Tomasz Kulasek wrote:
> For now DPDK assumes that callfd, kickfd and last_idx are being set just
> once during vring initialization and device cannot be running while DPDK
> receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
> However, that assumption is wrong. For Vhost SCSI messages might arrive
> at any point of time, possibly multiple times, one after another.
> 
> QEMU issues SET_VRING_CALL once during device initialization, then again
> during device start. The second message will close previous callfd,
> which is still being used by the user-implementation of vhost device.
> This results in writing to invalid (closed) callfd.
> 
> Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change
> internal state of VQ or device. To prevent race condition device should
> also be stopped before updateing vring data.
> 
> Signed-off-by: Dariusz Stojaczyk<dariuszx.stojaczyk@intel.com>
> Signed-off-by: Pawel Wodkowski<pawelx.wodkowski@intel.com>
> Signed-off-by: Tomasz Kulasek<tomaszx.kulasek@intel.com>
> ---
>   lib/librte_vhost/vhost_user.c | 40 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 40 insertions(+)
In last release, we have introduced a per-virtqueue lock to protect
vring handling against asynchronous device changes.
I think that would solve the issue you are facing, but you would need
to export the VQs locking functions to the vhost-user lib API to be
able to use it.
I don't think your current patch is the right solution anyway, because
it destroys the device in case we don't want it to remain alive, like
set_log_base, or set_features when only the logging feature gets
enabled.
Cheers,
Maxime
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
  2018-03-06 16:26 ` Maxime Coquelin
@ 2018-03-07  9:16   ` Wodkowski, PawelX
  2018-03-07 10:09     ` Maxime Coquelin
  0 siblings, 1 reply; 6+ messages in thread
From: Wodkowski, PawelX @ 2018-03-07  9:16 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Stojaczyk, DariuszX
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, March 6, 2018 5:27 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
> <dariuszx.stojaczyk@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring
> data
> 
> Hi Tomasz,
> 
> On 03/05/2018 05:11 PM, Tomasz Kulasek wrote:
> > For now DPDK assumes that callfd, kickfd and last_idx are being set just
> > once during vring initialization and device cannot be running while DPDK
> > receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
> > However, that assumption is wrong. For Vhost SCSI messages might arrive
> > at any point of time, possibly multiple times, one after another.
> >
> > QEMU issues SET_VRING_CALL once during device initialization, then again
> > during device start. The second message will close previous callfd,
> > which is still being used by the user-implementation of vhost device.
> > This results in writing to invalid (closed) callfd.
> >
> > Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change
> > internal state of VQ or device. To prevent race condition device should
> > also be stopped before updateing vring data.
> >
> > Signed-off-by: Dariusz Stojaczyk<dariuszx.stojaczyk@intel.com>
> > Signed-off-by: Pawel Wodkowski<pawelx.wodkowski@intel.com>
> > Signed-off-by: Tomasz Kulasek<tomaszx.kulasek@intel.com>
> > ---
> >   lib/librte_vhost/vhost_user.c | 40
> ++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 40 insertions(+)
> 
> In last release, we have introduced a per-virtqueue lock to protect
> vring handling against asynchronous device changes.
> 
> I think that would solve the issue you are facing, but you would need
> to export the VQs locking functions to the vhost-user lib API to be
> able to use it.
> 
> I don't think your current patch is the right solution anyway, because
> it destroys the device in case we don't want it to remain alive, like
> set_log_base, or set_features when only the logging feature gets
> enabled.
Please correct me if I can't see something obvious, but how this lock protect against eg 
SET_MEM_TABLE message? Current flow you are thinking of is:
DPDK vhost-user thread
1.1. vhost_user_lock_all_queue_pairs()
1.2. vhost_user_set_mem_table()
1.3. vhost_user_unlock_all_queue_pairs()
BACKEND: virito-net:
2.1. rte_spinlock_lock(&vq->access_lock);
2.2. Process vrings and copy all data 
2.3. rte_spinlock_unlock(&vq->access_lock);
Yes, it will synchronize access to virtio_net structure but what if the BACKEND  is in
zero copy mode and/or pass buffers to physical device? The request will
not end in 2.2 and you unmap the memory regions in the middle of request.
Even worse, the physical device will just abort the request but BACKEND can segfault
or write random memory because BACKEND try to use invalid memory address
(retrieved at request start).
To use this per-virtqueue lock:
1. the lock need to be held from request start to the end - but this can starve DPDK
vhost-user thread as there might be many request on-the-fly and when one is done
the new one might be started.
2. Becouse we don't know if something changed between requst start and request end
BACKEND need walk through all descriptors chain at the request end and do the
rte_vhost_gpa_to_vva() again.
The SET_MEM_TABLE is most obvious message but the same is true for other like
VHOST_IOTLB_INVALIDATE or SET_FEATURES.
Pawel
> 
> Cheers,
> Maxime
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
  2018-03-07  9:16   ` Wodkowski, PawelX
@ 2018-03-07 10:09     ` Maxime Coquelin
  2018-03-07 10:59       ` Wodkowski, PawelX
  0 siblings, 1 reply; 6+ messages in thread
From: Maxime Coquelin @ 2018-03-07 10:09 UTC (permalink / raw)
  To: Wodkowski, PawelX, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Stojaczyk, DariuszX
On 03/07/2018 10:16 AM, Wodkowski, PawelX wrote:
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Tuesday, March 6, 2018 5:27 PM
>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; Wodkowski, PawelX
>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
>> <dariuszx.stojaczyk@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring
>> data
>>
>> Hi Tomasz,
>>
>> On 03/05/2018 05:11 PM, Tomasz Kulasek wrote:
>>> For now DPDK assumes that callfd, kickfd and last_idx are being set just
>>> once during vring initialization and device cannot be running while DPDK
>>> receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE messages.
>>> However, that assumption is wrong. For Vhost SCSI messages might arrive
>>> at any point of time, possibly multiple times, one after another.
>>>
>>> QEMU issues SET_VRING_CALL once during device initialization, then again
>>> during device start. The second message will close previous callfd,
>>> which is still being used by the user-implementation of vhost device.
>>> This results in writing to invalid (closed) callfd.
>>>
>>> Other messages like SET_FEATURES, SET_VRING_ADDR etc also will change
>>> internal state of VQ or device. To prevent race condition device should
>>> also be stopped before updateing vring data.
>>>
>>> Signed-off-by: Dariusz Stojaczyk<dariuszx.stojaczyk@intel.com>
>>> Signed-off-by: Pawel Wodkowski<pawelx.wodkowski@intel.com>
>>> Signed-off-by: Tomasz Kulasek<tomaszx.kulasek@intel.com>
>>> ---
>>>    lib/librte_vhost/vhost_user.c | 40
>> ++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 40 insertions(+)
>>
>> In last release, we have introduced a per-virtqueue lock to protect
>> vring handling against asynchronous device changes.
>>
>> I think that would solve the issue you are facing, but you would need
>> to export the VQs locking functions to the vhost-user lib API to be
>> able to use it.
>>
>> I don't think your current patch is the right solution anyway, because
>> it destroys the device in case we don't want it to remain alive, like
>> set_log_base, or set_features when only the logging feature gets
>> enabled.
> 
> Please correct me if I can't see something obvious, but how this lock protect against eg
> SET_MEM_TABLE message? Current flow you are thinking of is:
> 
> DPDK vhost-user thread
> 1.1. vhost_user_lock_all_queue_pairs()
> 1.2. vhost_user_set_mem_table()
> 1.3. vhost_user_unlock_all_queue_pairs()
> 
> BACKEND: virito-net:
> 2.1. rte_spinlock_lock(&vq->access_lock);
> 2.2. Process vrings and copy all data
> 2.3. rte_spinlock_unlock(&vq->access_lock);
> 
> Yes, it will synchronize access to virtio_net structure but what if the BACKEND  is in
> zero copy mode and/or pass buffers to physical device? The request will
> not end in 2.2 and you unmap the memory regions in the middle of request.
> Even worse, the physical device will just abort the request but BACKEND can segfault
> or write random memory because BACKEND try to use invalid memory address
> (retrieved at request start).
Right, it doesn't work with zero-copy.
> To use this per-virtqueue lock:
> 1. the lock need to be held from request start to the end - but this can starve DPDK
> vhost-user thread as there might be many request on-the-fly and when one is done
> the new one might be started.
> 2. Becouse we don't know if something changed between requst start and request end
> BACKEND need walk through all descriptors chain at the request end and do the
> rte_vhost_gpa_to_vva() again.
> 
> The SET_MEM_TABLE is most obvious message but the same is true for other like
> VHOST_IOTLB_INVALIDATE or SET_FEATURES.
SET_FEATURE should never be sent as soon as the device is started,
except to enable logging.
For VHOST_IOTLB_INVALIDATE, the solution might be to have a ref counter
per entry, and to only remove it for the cache once it is zero and send
the reply-ack tothe master once this is done. But the cost would be huge
as with large entries, a lot of threads might increment/decrement the
same variable so there will be contention.
For all other cases, like SET_MEM_TABLE, maybe the solution is to
disable/enable all the queues using the existing ops.
The application or library would have to take care that no guest buffers
are in the wild before returning from the disable.
Do you think that would work?
Cheers,
Maxime
> Pawel
> 
>>
>> Cheers,
>> Maxime
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
  2018-03-07 10:09     ` Maxime Coquelin
@ 2018-03-07 10:59       ` Wodkowski, PawelX
  2018-03-07 11:29         ` Maxime Coquelin
  0 siblings, 1 reply; 6+ messages in thread
From: Wodkowski, PawelX @ 2018-03-07 10:59 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Stojaczyk, DariuszX
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, March 7, 2018 11:09 AM
> To: Wodkowski, PawelX <pawelx.wodkowski@intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
> <dariuszx.stojaczyk@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring
> data
> 
> 
> 
> On 03/07/2018 10:16 AM, Wodkowski, PawelX wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Tuesday, March 6, 2018 5:27 PM
> >> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> >> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> >> <james.r.harris@intel.com>; Wodkowski, PawelX
> >> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
> >> <dariuszx.stojaczyk@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public
> vring
> >> data
> >>
> >> Hi Tomasz,
> >>
> >> On 03/05/2018 05:11 PM, Tomasz Kulasek wrote:
> >>> For now DPDK assumes that callfd, kickfd and last_idx are being set just
> >>> once during vring initialization and device cannot be running while DPDK
> >>> receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE
> messages.
> >>> However, that assumption is wrong. For Vhost SCSI messages might arrive
> >>> at any point of time, possibly multiple times, one after another.
> >>>
> >>> QEMU issues SET_VRING_CALL once during device initialization, then again
> >>> during device start. The second message will close previous callfd,
> >>> which is still being used by the user-implementation of vhost device.
> >>> This results in writing to invalid (closed) callfd.
> >>>
> >>> Other messages like SET_FEATURES, SET_VRING_ADDR etc also will
> change
> >>> internal state of VQ or device. To prevent race condition device should
> >>> also be stopped before updateing vring data.
> >>>
> >>> Signed-off-by: Dariusz Stojaczyk<dariuszx.stojaczyk@intel.com>
> >>> Signed-off-by: Pawel Wodkowski<pawelx.wodkowski@intel.com>
> >>> Signed-off-by: Tomasz Kulasek<tomaszx.kulasek@intel.com>
> >>> ---
> >>>    lib/librte_vhost/vhost_user.c | 40
> >> ++++++++++++++++++++++++++++++++++++++++
> >>>    1 file changed, 40 insertions(+)
> >>
> >> In last release, we have introduced a per-virtqueue lock to protect
> >> vring handling against asynchronous device changes.
> >>
> >> I think that would solve the issue you are facing, but you would need
> >> to export the VQs locking functions to the vhost-user lib API to be
> >> able to use it.
> >>
> >> I don't think your current patch is the right solution anyway, because
> >> it destroys the device in case we don't want it to remain alive, like
> >> set_log_base, or set_features when only the logging feature gets
> >> enabled.
> >
> > Please correct me if I can't see something obvious, but how this lock protect
> against eg
> > SET_MEM_TABLE message? Current flow you are thinking of is:
> >
> > DPDK vhost-user thread
> > 1.1. vhost_user_lock_all_queue_pairs()
> > 1.2. vhost_user_set_mem_table()
> > 1.3. vhost_user_unlock_all_queue_pairs()
> >
> > BACKEND: virito-net:
> > 2.1. rte_spinlock_lock(&vq->access_lock);
> > 2.2. Process vrings and copy all data
> > 2.3. rte_spinlock_unlock(&vq->access_lock);
> >
> > Yes, it will synchronize access to virtio_net structure but what if the BACKEND
> is in
> > zero copy mode and/or pass buffers to physical device? The request will
> > not end in 2.2 and you unmap the memory regions in the middle of request.
> > Even worse, the physical device will just abort the request but BACKEND can
> segfault
> > or write random memory because BACKEND try to use invalid memory
> address
> > (retrieved at request start).
> 
> Right, it doesn't work with zero-copy.
> 
> > To use this per-virtqueue lock:
> > 1. the lock need to be held from request start to the end - but this can starve
> DPDK
> > vhost-user thread as there might be many request on-the-fly and when one is
> done
> > the new one might be started.
> > 2. Becouse we don't know if something changed between requst start and
> request end
> > BACKEND need walk through all descriptors chain at the request end and do
> the
> > rte_vhost_gpa_to_vva() again.
> >
> > The SET_MEM_TABLE is most obvious message but the same is true for other
> like
> > VHOST_IOTLB_INVALIDATE or SET_FEATURES.
> 
> SET_FEATURE should never be sent as soon as the device is started,
> except to enable logging.
> 
> For VHOST_IOTLB_INVALIDATE, the solution might be to have a ref counter
> per entry, and to only remove it for the cache once it is zero and send
> the reply-ack tothe master once this is done. But the cost would be huge
> as with large entries, a lot of threads might increment/decrement the
> same variable so there will be contention.
> 
> For all other cases, like SET_MEM_TABLE, maybe the solution is to
> disable/enable all the queues using the existing ops.
> The application or library would have to take care that no guest buffers
> are in the wild before returning from the disable.
> 
> Do you think that would work?
What kind of ops can be used to reliably disable all queues and inform backend what
changed beside new_device/destroy_device? Those informations are very well hidden
inside vhost.c and vhost-user.c files.
I think we need new set of ops/callbacks in vhost_device_ops struct that let the backend
decide how to behave in response on certain messages like SET_MEM_TABLE.
Also I would propose to revert the a3688046995f "vhost: protect active rings from
async ring changes" and replace it by behavior of this patch. I see now that this patch
need to handle other cases like VHOST_USER_IOTLB_MSG but until we have new set of
callbacks I think this is only reliable and fast solution.
Pawel
> 
> Cheers,
> Maxime
> 
> > Pawel
> >
> >>
> >> Cheers,
> >> Maxime
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring data
  2018-03-07 10:59       ` Wodkowski, PawelX
@ 2018-03-07 11:29         ` Maxime Coquelin
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2018-03-07 11:29 UTC (permalink / raw)
  To: Wodkowski, PawelX, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Stojaczyk, DariuszX
On 03/07/2018 11:59 AM, Wodkowski, PawelX wrote:
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Wednesday, March 7, 2018 11:09 AM
>> To: Wodkowski, PawelX <pawelx.wodkowski@intel.com>; Kulasek, TomaszX
>> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
>> <dariuszx.stojaczyk@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public vring
>> data
>>
>>
>>
>> On 03/07/2018 10:16 AM, Wodkowski, PawelX wrote:
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Tuesday, March 6, 2018 5:27 PM
>>>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>>>> <james.r.harris@intel.com>; Wodkowski, PawelX
>>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Stojaczyk, DariuszX
>>>> <dariuszx.stojaczyk@intel.com>
>>>> Subject: Re: [dpdk-dev] [PATCH] vhost: stop device before updating public
>> vring
>>>> data
>>>>
>>>> Hi Tomasz,
>>>>
>>>> On 03/05/2018 05:11 PM, Tomasz Kulasek wrote:
>>>>> For now DPDK assumes that callfd, kickfd and last_idx are being set just
>>>>> once during vring initialization and device cannot be running while DPDK
>>>>> receives SET_VRING_KICK, SET_VRING_CALL and SET_VRING_BASE
>> messages.
>>>>> However, that assumption is wrong. For Vhost SCSI messages might arrive
>>>>> at any point of time, possibly multiple times, one after another.
>>>>>
>>>>> QEMU issues SET_VRING_CALL once during device initialization, then again
>>>>> during device start. The second message will close previous callfd,
>>>>> which is still being used by the user-implementation of vhost device.
>>>>> This results in writing to invalid (closed) callfd.
>>>>>
>>>>> Other messages like SET_FEATURES, SET_VRING_ADDR etc also will
>> change
>>>>> internal state of VQ or device. To prevent race condition device should
>>>>> also be stopped before updateing vring data.
>>>>>
>>>>> Signed-off-by: Dariusz Stojaczyk<dariuszx.stojaczyk@intel.com>
>>>>> Signed-off-by: Pawel Wodkowski<pawelx.wodkowski@intel.com>
>>>>> Signed-off-by: Tomasz Kulasek<tomaszx.kulasek@intel.com>
>>>>> ---
>>>>>     lib/librte_vhost/vhost_user.c | 40
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>     1 file changed, 40 insertions(+)
>>>>
>>>> In last release, we have introduced a per-virtqueue lock to protect
>>>> vring handling against asynchronous device changes.
>>>>
>>>> I think that would solve the issue you are facing, but you would need
>>>> to export the VQs locking functions to the vhost-user lib API to be
>>>> able to use it.
>>>>
>>>> I don't think your current patch is the right solution anyway, because
>>>> it destroys the device in case we don't want it to remain alive, like
>>>> set_log_base, or set_features when only the logging feature gets
>>>> enabled.
>>>
>>> Please correct me if I can't see something obvious, but how this lock protect
>> against eg
>>> SET_MEM_TABLE message? Current flow you are thinking of is:
>>>
>>> DPDK vhost-user thread
>>> 1.1. vhost_user_lock_all_queue_pairs()
>>> 1.2. vhost_user_set_mem_table()
>>> 1.3. vhost_user_unlock_all_queue_pairs()
>>>
>>> BACKEND: virito-net:
>>> 2.1. rte_spinlock_lock(&vq->access_lock);
>>> 2.2. Process vrings and copy all data
>>> 2.3. rte_spinlock_unlock(&vq->access_lock);
>>>
>>> Yes, it will synchronize access to virtio_net structure but what if the BACKEND
>> is in
>>> zero copy mode and/or pass buffers to physical device? The request will
>>> not end in 2.2 and you unmap the memory regions in the middle of request.
>>> Even worse, the physical device will just abort the request but BACKEND can
>> segfault
>>> or write random memory because BACKEND try to use invalid memory
>> address
>>> (retrieved at request start).
>>
>> Right, it doesn't work with zero-copy.
>>
>>> To use this per-virtqueue lock:
>>> 1. the lock need to be held from request start to the end - but this can starve
>> DPDK
>>> vhost-user thread as there might be many request on-the-fly and when one is
>> done
>>> the new one might be started.
>>> 2. Becouse we don't know if something changed between requst start and
>> request end
>>> BACKEND need walk through all descriptors chain at the request end and do
>> the
>>> rte_vhost_gpa_to_vva() again.
>>>
>>> The SET_MEM_TABLE is most obvious message but the same is true for other
>> like
>>> VHOST_IOTLB_INVALIDATE or SET_FEATURES.
>>
>> SET_FEATURE should never be sent as soon as the device is started,
>> except to enable logging.
>>
>> For VHOST_IOTLB_INVALIDATE, the solution might be to have a ref counter
>> per entry, and to only remove it for the cache once it is zero and send
>> the reply-ack tothe master once this is done. But the cost would be huge
>> as with large entries, a lot of threads might increment/decrement the
>> same variable so there will be contention.
>>
>> For all other cases, like SET_MEM_TABLE, maybe the solution is to
>> disable/enable all the queues using the existing ops.
>> The application or library would have to take care that no guest buffers
>> are in the wild before returning from the disable.
>>
>> Do you think that would work?
> 
> What kind of ops can be used to reliably disable all queues and inform backend what
> changed beside new_device/destroy_device? Those informations are very well hidden
> inside vhost.c and vhost-user.c files.
(struct vhost_device_ops).vring_state_changed()
When a queue is disabled, I think we can expect the application won't
use its resources anymore.
> 
> I think we need new set of ops/callbacks in vhost_device_ops struct that let the backend
> decide how to behave in response on certain messages like SET_MEM_TABLE.
Yes, we can think of extending the API with new ops if
vring_state_change is not enough.
> Also I would propose to revert the a3688046995f "vhost: protect active rings from
> async ring changes" and replace it by behavior of this patch. I see now that this patch
> need to handle other cases like VHOST_USER_IOTLB_MSG but until we have new set of
> callbacks I think this is only reliable and fast solution.
Nack, we won't destroy the device every time we receive a vhost-user
request.
And for reverting the patch you propose, it won't happen as long as we
don't have a viable alternative, and as long as the applications like
OVS won't use it.
This patch fixes real issues faced by the user, like live migration with
multiple queues or memory hotplug. The problem is that it does not
take zero-copy case into account.
Maxime
> Pawel
> 
>>
>> Cheers,
>> Maxime
>>
>>> Pawel
>>>
>>>>
>>>> Cheers,
>>>> Maxime
^ permalink raw reply	[flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-03-07 11:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 16:11 [dpdk-dev] [PATCH] vhost: stop device before updating public vring data Tomasz Kulasek
2018-03-06 16:26 ` Maxime Coquelin
2018-03-07  9:16   ` Wodkowski, PawelX
2018-03-07 10:09     ` Maxime Coquelin
2018-03-07 10:59       ` Wodkowski, PawelX
2018-03-07 11:29         ` Maxime Coquelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).