DPDK patches and discussions
 help / color / mirror / Atom feed
From: Matan Azrad <matan@mellanox.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
	Xiao Wang <xiao.w.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
Date: Tue, 23 Jun 2020 14:52:16 +0000	[thread overview]
Message-ID: <AM0PR0502MB4019FA243C2664F8A16B0BFBD2940@AM0PR0502MB4019.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com>



> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, June 23, 2020 4:56 PM
> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> <xiao.w.wang@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> 
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >
> >
> > From: Maxime Coquelin:
> >> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>> Cc: dev@dpdk.org
> >>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>> definition
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Maxime
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>> The issue is if you only check ready state only before and
> >>>>>>>>>>>> after the message affecting the ring is handled, it can be
> >>>>>>>>>>>> ready at both stages, while the rings have changed and
> >>>>>>>>>>>> state change callback should
> >>>>>>>>>> have been called.
> >>>>>>>>>>> But in this version I checked twice, before message handler
> >>>>>>>>>>> and after
> >>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>
> >>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>> handler but
> >>>> with different values.
> >>>>>>>>>>
> >>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
> >>>>>> virtio_net
> >>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>
> >>>>>>>>>> ...
> >>>>>>>>>>
> >>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>> +
> >>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>> +
> >>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>> +
> >>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>
> >>>>>>>>> The driver will be notified in the next
> >>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>> message according to v1.
> >>>>>>>>>
> >>>>>>>>> One of our assumption we agreed on in the design mail is that
> >>>>>>>>> it doesn't
> >>>>>>>> make sense that QEMU will change queue configuration without
> >>>>>>>> enabling the queue again.
> >>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>> again when
> >>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even
> if
> >> the
> >>>>>> queue is
> >>>>>>>> already ready.
> >>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>> into account
> >>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>
> >>>>>>>>> I think that this assumption is correct according to the QEMU
> code.
> >>>>>>>>
> >>>>>>>> Yes, this was our initial assumption.
> >>>>>>>> But now looking into the details of the implementation, I find
> >>>>>>>> it is even cleaner & clearer not to do this assumption.
> >>>>>>>>
> >>>>>>>>> That's why I prefer to collect all the ready checks callbacks
> >>>>>>>>> (queue state and
> >>>>>>>> device new\conf) to one function that will be called after the
> >>>>>>>> message
> >>>>>>>> handler:
> >>>>>>>>> Pseudo:
> >>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>> 	switch (msg):
> >>>>>>>>> 		case enable:
> >>>>>>>>> 			if(enable is 1)
> >>>>>>>>> 				force queue state =1.
> >>>>>>>>> 		case callfd
> >>>>>>>>> 		case kickfd
> >>>>>>>>> 				.....
> >>>>>>>>> 		Check queue and device ready + call callbacks if
> >> needed..
> >>>>>>>>> 		Default
> >>>>>>>>> 			Return;
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> I find it more natural to "invalidate" ready state where it is
> >>>>>>>> handled (after vring_invalidate(), before setting new FD for
> >>>>>>>> call & kick, ...)
> >>>>>>>
> >>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>> pair is invalidated,
> >>>>>> you need to notify app\driver also about device ready change.
> >>>>>>> Also it will cause 2 notifications to the driver instead of one
> >>>>>>> in case of FD
> >>>>>> change.
> >>>>>>
> >>>>>> You'll always end-up with two notifications, either Qemu has sent
> >>>>>> the disable and so you'll have one notification for the disable
> >>>>>> and one for the enable, or it didn't sent the disable and it will
> >>>>>> happen at old value invalidation time and after new value is
> >>>>>> taken into
> >> account.
> >>>>>>
> >>>>>
> >>>>> I don't see it in current QEMU behavior.
> >>>>> When working MQ I see that some virtqs get configuration message
> >>>>> while
> >>>> they are in enabled state.
> >>>>> Then, enable message is sent again later.
> >>>>
> >>>> I guess you mean the first queue pair? And it would not be in ready
> >>>> state as it would be the initial configuration of the queue?
> >>>
> >>> Even after initialization when queue is ready.
> >>>
> >>>>>
> >>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>> only in one
> >>>>>> point in the code instead of doing it in all the configuration
> >>>>>> handlers
> >>>> around?
> >>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>
> >>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>> sure this is a correct assumption:
> >>>>>>
> >>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>> client must support changing some configuration aspects on the fly."
> >>>>>
> >>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>
> >>>> I agree it lacks a bit of clarity.
> >>>>
> >>>>> As I mentioned, QEMU sends enable message always after
> >>>>> configuration
> >>>> message.
> >>>>
> >>>> Yes, but we should not do assumptions on current Qemu version when
> >>>> possible. Better to be safe and follow the specification, it will
> >>>> be more
> >> robust.
> >>>> There is also the Virtio-user PMD to take into account for example.
> >>>
> >>> I understand your point here but do you really want to be ready for
> >>> any
> >> configuration update in run time?
> >>> What does it mean? How datatpath should handle configuration from
> >> control thread in run time while traffic is on?
> >>> For example, changing queue size \ addresses must stop traffic before...
> >>> Also changing FDs is very sensitive.
> >>>
> >>> It doesn't make sense to me.
> >>>
> >>> Also, according to "on the fly" direction we should not disable the
> >>> queue
> >> unless enable message is coming to disable it.
> >
> > No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so we
> are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues
> so we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
> >>> In addition:
> >>> Do you really want to toggle vDPA drivers\app for any configuration
> >> message? It may cause queue recreation for each one (at least for mlx5).
> >>
> >> I want to have something robust and maintainable.
> >
> > Me too.
> >
> >> These messages arriving after a queue have been configured once are
> >> rare events, but this is usually the kind of things that cause maintenance
> burden.
> >
> > In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
> 
> Right.
> 
> >> If you look at my example patch, you will understand that with my
> >> proposal, there won't be any more state change notification than with
> >> your proposal when Qemu or any other Vhost-user master send a disable
> >> request before sending the request that impact the queue state.
> >
> > we didn't talk about disable time - this one is very simple.
> >
> > Yes, In case the queue is disabled your proposal doesn't send extra
> notification as my.
> > But in case the queue is ready, your proposal send extra not ready
> notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user master (in
> case the master asks for this synchronization, like set_mem_table for
> instance when reply-ack is enabled).
> 
> >> It just adds more robustness if this unlikely event happens, by
> >> invalidating the ring state to not ready before doing the actual ring
> configuration change.
> >> So that this config change is not missed by the vDPA driver or the
> application.
> >
> > One more issue here is that there is some time that device is ready (already
> configured) and the first vittq-pair is not ready (your invalidate proposal for
> set_vring_base).
> 
> 
> 
> > It doesn’t save the concept that device is ready only in case the first virtq-
> pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue pair is
> ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the first
> queue pair receive a disable request or, with my patch, when one of the
> rings receives a request that changes the ring state?

I means, your proposal actually may make first virtq-pair ready state disabled when device ready.
So, yes, it leads to call device close\destroy.

> > I will not insist anymore on waiting for enable for notifying although I not
> fan with it.
> >
> > So, I suggest to create 1 notification function to be called after message
> handler and before reply.
> > This function is the only one which notify ready states in the next options:
> >
> > 1. virtq ready state is changed in the queue.
> > 2. virtq ready state stays on after configuration message handler.
> > 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is that
> correct? If so, I think it does not work with memory hotplug case I mentioned
> earlier.

It will do enable again which mean - something was modified.

> Even for the callfd double change it can be problematic as Vhost-lib will close
> the first one while it will still be used by the driver (Btw, I see my example
> patch is also buggy in this regards, it should reset the call_fd value in the
> virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).

Yes, this one leads for different handle for each message.

Maybe it leads for new queue modify operation.
So, queue doesn't send the state - just does configuration change on the fly.

What do you think?

 
> Thanks,
> Maxime
> >
> > Matan
> >
> >
> >
> >> Maxime
> >


  parent reply	other threads:[~2020-06-23 14:52 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44   ` Maxime Coquelin
2020-06-19 13:28     ` Matan Azrad
2020-06-19 14:01       ` Maxime Coquelin
2020-06-21  6:26         ` Matan Azrad
2020-06-22  8:06           ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19  6:49   ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19  7:41   ` Maxime Coquelin
2020-06-19 12:04     ` Maxime Coquelin
2020-06-19 13:11     ` Matan Azrad
2020-06-19 13:54       ` Maxime Coquelin
2020-06-21  6:20         ` Matan Azrad
2020-06-22  8:04           ` Maxime Coquelin
2020-06-22  8:41             ` Matan Azrad
2020-06-22  8:56               ` Maxime Coquelin
2020-06-22 10:06                 ` Matan Azrad
2020-06-22 12:32                   ` Maxime Coquelin
2020-06-22 13:43                     ` Matan Azrad
2020-06-22 14:55                       ` Maxime Coquelin
2020-06-22 15:51                         ` Matan Azrad
2020-06-22 16:47                           ` Maxime Coquelin
2020-06-23  9:02                             ` Matan Azrad
2020-06-23  9:19                               ` Maxime Coquelin
2020-06-23 11:53                                 ` Matan Azrad
2020-06-23 13:55                                   ` Maxime Coquelin
2020-06-23 14:33                                     ` Maxime Coquelin
2020-06-23 14:52                                     ` Matan Azrad [this message]
2020-06-23 15:18                                       ` Maxime Coquelin
2020-06-24  5:54                                         ` Matan Azrad
2020-06-24  7:22                                           ` Maxime Coquelin
2020-06-24  8:38                                             ` Matan Azrad
2020-06-24  9:12                                               ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28  3:06     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10     ` Maxime Coquelin
2020-06-28  3:08     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15     ` Maxime Coquelin
2020-06-28  3:18     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19     ` Maxime Coquelin
2020-06-28  3:19     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29     ` Maxime Coquelin
2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17  1:41       ` Wang, Yinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR0502MB4019FA243C2664F8A16B0BFBD2940@AM0PR0502MB4019.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=maxime.coquelin@redhat.com \
    --cc=xiao.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).