>  >  > Since backend and frontend message are synchronous in the same thread,
>  >  > there will be a probability of message deadlock.
>  >  > Consider each driver to determine whether to wait for response.
>  >  >
>  >  > Fixes: d90cf7d111ac ("vhost: support host notifier")
>  >  > Cc: maxime.coquelin@redhat.com
>  >  > Signed-off-by: Rma Ma <rma.ma@jaguarmicro.com>
>  >  > ---
>  >  > v2 - fix format error in commit message
>  >  > v3 - add --in-reply-to
>  >  > ---
>  >
>  > Hi Maxime,
>  >
>  > This patch helps to fix vhost-user message deadlock, could you help
>  > review it?
>  
>  The patch introduces a new device op, but it is used nowhere in vDPA
>  drivers.
>  
>  What vDPA driver is it going to be used with?
>  
>  Regards,
>  Maxime


Hi,

Our company's jmnd vdpa driver, which requires the rte_vhost_host_notifier_ctrl interface,
replicates the problem with the following scenario:

QEMU start vhost-user with modern net and blk, backend use dpdk-vdpa process,
after live migration, dest QEMU deadlock with dpdk-vdpa.

- QEMU sends VHOST_USER_SET_VRING_KICK to dpdk-vdpa net
- QEMU does not need to wait for a response to this message
- QEMU then sends VHOST_USER_SET_MEM_TABLE to dpdk-vdpa blk
- QEMU needs to wait reply in this message
- when dpdk-vdpa recv VHOST_USER_SET_VRING_KICK,
- it will send VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG to QEMU
- dpdk-vdpa needs to wait for a response to this message
- QEMU will deadlock with dpdk-vdpa

I tried to add a patch to the qemu community that uses a new thread to loop backend channel,
But there will be some multi-threaded synchronization issues

I think this is a public issue, and other backend messages take this into account,
so I think this message also needs a flag to fix it.

and jmnd vdpa driver will subsequently be open-sourced to the community.



Best wishes,

Rma


发件人: Maxime Coquelin <maxime.coquelin@redhat.com>
发送时间: 2023年7月11日 21:25
收件人: Rma Ma <rma.ma@jaguarmicro.com>; dpdk-dev <dev@dpdk.org>
抄送: Chenbo Xia <chenbo.xia@intel.com>
主题: Re: 回复: [PATCH v3] vhost: add notify reply ops to fix message deadlock
 
Hi,

On 7/11/23 11:25, Rma Ma wrote:
>  > Since backend and frontend message are synchronous in the same thread,
>  > there will be a probability of message deadlock.
>  > Consider each driver to determine whether to wait for response.
>  >
>  > Fixes: d90cf7d111ac ("vhost: support host notifier")
>  > Cc: maxime.coquelin@redhat.com
>  > Signed-off-by: Rma Ma <rma.ma@jaguarmicro.com>
>  > ---
>  > v2 - fix format error in commit message
>  > v3 - add --in-reply-to
>  > ---
>
> Hi Maxime,
>
> This patch helps to fix vhost-user message deadlock, could you help
> review it?

The patch introduces a new device op, but it is used nowhere in vDPA
drivers.

What vDPA driver is it going to be used with?

Regards,
Maxime

> Thanks.
>
> Best wishes,
>
> Rma
>
> ------------------------------------------------------------------------
> *发件人:* Rma Ma
> *发送时间:* 2023年7月4日 10:52
> *收件人:* dpdk-dev <dev@dpdk.org>
> *抄送:* Maxime Coquelin <maxime.coquelin@redhat.com>; Chenbo Xia
> <chenbo.xia@intel.com>; Rma Ma <rma.ma@jaguarmicro.com>
> *主题:* [PATCH v3] vhost: add notify reply ops to fix message deadlock
> Since backend and frontend message are synchronous in the same thread,
> there will be a probability of message deadlock.
> Consider each driver to determine whether to wait for response.
>
> Fixes: d90cf7d111ac ("vhost: support host notifier")
> Cc: maxime.coquelin@redhat.com
> Signed-off-by: Rma Ma <rma.ma@jaguarmicro.com>
> ---
> v2 - fix format error in commit message
> v3 - add --in-reply-to
> ---
>   lib/vhost/vdpa_driver.h |  3 +++
>   lib/vhost/vhost_user.c  | 23 ++++++++++++++++++-----
>   2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/lib/vhost/vdpa_driver.h b/lib/vhost/vdpa_driver.h
> index 8db4ab9f4d..3d2ea3c90e 100644
> --- a/lib/vhost/vdpa_driver.h
> +++ b/lib/vhost/vdpa_driver.h
> @@ -81,6 +81,9 @@ struct rte_vdpa_dev_ops {
>
>           /** get device type: net device, blk device... */
>           int (*get_dev_type)(struct rte_vdpa_device *dev, uint32_t *type);
> +
> +       /** Get the notify reply flag */
> +       int (*get_notify_reply_flag)(int vid, bool *need_reply);
>   };
>
>   /**
> diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
> index 901a80bbaa..aa61992939 100644
> --- a/lib/vhost/vhost_user.c
> +++ b/lib/vhost/vhost_user.c
> @@ -3365,13 +3365,14 @@ rte_vhost_backend_config_change(int vid, bool
> need_reply)
>   static int vhost_user_backend_set_vring_host_notifier(struct
> virtio_net *dev,
>                                                       int index, int fd,
>                                                       uint64_t offset,
> -                                                   uint64_t size)
> +                                                   uint64_t size,
> +                                                       bool need_reply)
>   {
>           int ret;
>           struct vhu_msg_context ctx = {
>                   .msg = {
>                           .request.backend =
> VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG,
> -                       .flags = VHOST_USER_VERSION | VHOST_USER_NEED_REPLY,
> +                       .flags = VHOST_USER_VERSION,
>                           .size = sizeof(ctx.msg.payload.area),
>                           .payload.area = {
>                                   .u64 = index & VHOST_USER_VRING_IDX_MASK,
> @@ -3388,7 +3389,13 @@ static int
> vhost_user_backend_set_vring_host_notifier(struct virtio_net *dev,
>                   ctx.fd_num = 1;
>           }
>
> -       ret = send_vhost_backend_message_process_reply(dev, &ctx);
> +       if (!need_reply)
> +               ret = send_vhost_backend_message(dev, &ctx);
> +       else {
> +               ctx.msg.flags |= VHOST_USER_NEED_REPLY;
> +               ret = send_vhost_backend_message_process_reply(dev, &ctx);
> +       }
> +
>           if (ret < 0)
>                   VHOST_LOG_CONFIG(dev->ifname, ERR, "failed to set host
> notifier (%d)\n", ret);
>
> @@ -3402,6 +3409,7 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
>           int vfio_device_fd, ret = 0;
>           uint64_t offset, size;
>           unsigned int i, q_start, q_last;
> +       bool need_reply;
>
>           dev = get_device(vid);
>           if (!dev)
> @@ -3440,6 +3448,11 @@ int rte_vhost_host_notifier_ctrl(int vid,
> uint16_t qid, bool enable)
>           if (vfio_device_fd < 0)
>                   return -ENOTSUP;
>
> +       if (vdpa_dev->ops->get_notify_reply_flag == NULL)
> +               need_reply = true;
> +       else
> +               vdpa_dev->ops->get_notify_reply_flag(vid, &need_reply);
> +
>           if (enable) {
>                   for (i = q_start; i <= q_last; i++) {
>                           if (vdpa_dev->ops->get_notify_area(vid, i,
> &offset,
> @@ -3449,7 +3462,7 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
>                           }
>
>                           if
> (vhost_user_backend_set_vring_host_notifier(dev, i,
> -                                       vfio_device_fd, offset, size) < 0) {
> +                                       vfio_device_fd, offset, size,
> need_reply) < 0) {
>                                   ret = -EFAULT;
>                                   goto disable;
>                           }
> @@ -3458,7 +3471,7 @@ int rte_vhost_host_notifier_ctrl(int vid, uint16_t
> qid, bool enable)
>   disable:
>                   for (i = q_start; i <= q_last; i++) {
>                          
> vhost_user_backend_set_vring_host_notifier(dev, i, -1,
> -                                       0, 0);
> +                                       0, 0, need_reply);
>                   }
>           }
>
> --
> 2.17.1
>