DPDK patches and discussions
 help / color / mirror / Atom feed
From: Matan Azrad <matan@mellanox.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
	Xiao Wang <xiao.w.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
Date: Wed, 24 Jun 2020 05:54:09 +0000
Message-ID: <DB3PR0502MB40283E1DF652937C21319354D2950@DB3PR0502MB4028.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <dcf32bbe-9034-870c-5f1f-b5646742e1b3@redhat.com>

Ho Maxime

Good morning

From: Maxime Coquelin:
> On 6/23/20 4:52 PM, Matan Azrad wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Tuesday, June 23, 2020 4:56 PM
> >> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >> <xiao.w.wang@intel.com>
> >> Cc: dev@dpdk.org
> >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
> >>
> >> Hi Matan,
> >>
> >> On 6/23/20 1:53 PM, Matan Azrad wrote:
> >>>
> >>>
> >>> From: Maxime Coquelin:
> >>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
> >>>>>
> >>>>>
> >>>>> From: Maxime Coquelin:
> >>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Maxime Coquelin:
> >>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Maxime Coquelin:
> >>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
> >>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>> definition
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Maxime
> >>>>>>>>>>>
> >>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
> >>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
> >>>>>>>>>>>> <xiao.w.wang@intel.com>
> >>>>>>>>>>>> Cc: dev@dpdk.org
> >>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
> >>>>>>>>>>>> definition
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
> >>>>>>>>>>>>>> The issue is if you only check ready state only before
> >>>>>>>>>>>>>> and after the message affecting the ring is handled, it
> >>>>>>>>>>>>>> can be ready at both stages, while the rings have changed
> >>>>>>>>>>>>>> and state change callback should
> >>>>>>>>>>>> have been called.
> >>>>>>>>>>>>> But in this version I checked twice, before message
> >>>>>>>>>>>>> handler and after
> >>>>>>>>>>>> message handler, so it should catch any update.
> >>>>>>>>>>>>
> >>>>>>>>>>>> No, this is not enough, we have to check also during some
> >>>>>>>>>>>> handlers, so that the ready state is invalidated because
> >>>>>>>>>>>> sometimes it will be ready before and after the message
> >>>>>>>>>>>> handler but
> >>>>>> with different values.
> >>>>>>>>>>>>
> >>>>>>>>>>>> That's what I did in my example patch:
> >>>>>>>>>>>> @@ -1847,15 +1892,16 @@
> vhost_user_set_vring_kick(struct
> >>>>>>>> virtio_net
> >>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
> >>>>>>>>>>>>
> >>>>>>>>>>>> ...
> >>>>>>>>>>>>
> >>>>>>>>>>>>         if (vq->kickfd >= 0)
> >>>>>>>>>>>>                 close(vq->kickfd);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>         vq->kickfd = file.fd;
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Without that, the ready check will return ready before and
> >>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
> >>>>>>>>>>>
> >>>>>>>>>>> The driver will be notified in the next
> >>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
> >>>>>>>>>> message according to v1.
> >>>>>>>>>>>
> >>>>>>>>>>> One of our assumption we agreed on in the design mail is
> >>>>>>>>>>> that it doesn't
> >>>>>>>>>> make sense that QEMU will change queue configuration
> without
> >>>>>>>>>> enabling the queue again.
> >>>>>>>>>>> Because of that we decided to force calling state callback
> >>>>>>>>>>> again when
> >>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
> even
> >> if
> >>>> the
> >>>>>>>> queue is
> >>>>>>>>>> already ready.
> >>>>>>>>>>> So when driver/app see state enable->enable, it should take
> >>>>>>>>>>> into account
> >>>>>>>>>> that the queue configuration was probably changed.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that this assumption is correct according to the
> >>>>>>>>>>> QEMU
> >> code.
> >>>>>>>>>>
> >>>>>>>>>> Yes, this was our initial assumption.
> >>>>>>>>>> But now looking into the details of the implementation, I
> >>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
> >>>>>>>>>>
> >>>>>>>>>>> That's why I prefer to collect all the ready checks
> >>>>>>>>>>> callbacks (queue state and
> >>>>>>>>>> device new\conf) to one function that will be called after
> >>>>>>>>>> the message
> >>>>>>>>>> handler:
> >>>>>>>>>>> Pseudo:
> >>>>>>>>>>>  vhost_user_update_ready_statuses() {
> >>>>>>>>>>> 	switch (msg):
> >>>>>>>>>>> 		case enable:
> >>>>>>>>>>> 			if(enable is 1)
> >>>>>>>>>>> 				force queue state =1.
> >>>>>>>>>>> 		case callfd
> >>>>>>>>>>> 		case kickfd
> >>>>>>>>>>> 				.....
> >>>>>>>>>>> 		Check queue and device ready + call callbacks if
> >>>> needed..
> >>>>>>>>>>> 		Default
> >>>>>>>>>>> 			Return;
> >>>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> I find it more natural to "invalidate" ready state where it
> >>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
> >>>>>>>>>> for call & kick, ...)
> >>>>>>>>>
> >>>>>>>>> I think that if you go with this direction, if the first queue
> >>>>>>>>> pair is invalidated,
> >>>>>>>> you need to notify app\driver also about device ready change.
> >>>>>>>>> Also it will cause 2 notifications to the driver instead of
> >>>>>>>>> one in case of FD
> >>>>>>>> change.
> >>>>>>>>
> >>>>>>>> You'll always end-up with two notifications, either Qemu has
> >>>>>>>> sent the disable and so you'll have one notification for the
> >>>>>>>> disable and one for the enable, or it didn't sent the disable
> >>>>>>>> and it will happen at old value invalidation time and after new
> >>>>>>>> value is taken into
> >>>> account.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I don't see it in current QEMU behavior.
> >>>>>>> When working MQ I see that some virtqs get configuration
> message
> >>>>>>> while
> >>>>>> they are in enabled state.
> >>>>>>> Then, enable message is sent again later.
> >>>>>>
> >>>>>> I guess you mean the first queue pair? And it would not be in
> >>>>>> ready state as it would be the initial configuration of the queue?
> >>>>>
> >>>>> Even after initialization when queue is ready.
> >>>>>
> >>>>>>>
> >>>>>>>>> Why not to take this correct assumption and update ready state
> >>>>>>>>> only in one
> >>>>>>>> point in the code instead of doing it in all the configuration
> >>>>>>>> handlers
> >>>>>> around?
> >>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
> >>>>>>>>
> >>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
> >>>>>>>> sure this is a correct assumption:
> >>>>>>>>
> >>>>>>>> "While processing the rings (whether they are enabled or not),
> >>>>>>>> client must support changing some configuration aspects on the
> fly."
> >>>>>>>
> >>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
> >>>>>>
> >>>>>> I agree it lacks a bit of clarity.
> >>>>>>
> >>>>>>> As I mentioned, QEMU sends enable message always after
> >>>>>>> configuration
> >>>>>> message.
> >>>>>>
> >>>>>> Yes, but we should not do assumptions on current Qemu version
> >>>>>> when possible. Better to be safe and follow the specification, it
> >>>>>> will be more
> >>>> robust.
> >>>>>> There is also the Virtio-user PMD to take into account for example.
> >>>>>
> >>>>> I understand your point here but do you really want to be ready
> >>>>> for any
> >>>> configuration update in run time?
> >>>>> What does it mean? How datatpath should handle configuration from
> >>>> control thread in run time while traffic is on?
> >>>>> For example, changing queue size \ addresses must stop traffic
> before...
> >>>>> Also changing FDs is very sensitive.
> >>>>>
> >>>>> It doesn't make sense to me.
> >>>>>
> >>>>> Also, according to "on the fly" direction we should not disable
> >>>>> the queue
> >>>> unless enable message is coming to disable it.
> >>>
> >>> No response, so looks like you agree that it doesn't make sense.
> >>
> >> No, my reply was general to all your comments.
> >>
> >> With SW backend, I agree we don't need to disable the rings in case
> >> of asynchronous changes to the ring because we protect it with a
> >> lock, so we are sure the ring won't be accessed by another thread
> >> while doing the change.
> >>
> >> For vDPA case that's more problematic because we have no such locking
> >> mechanism.
> >>
> >> For example memory hotplug, Qemu does not seem to disable the
> queues
> >> so we need to stop the vDPA device one way or another so that it does
> >> not process the rings while the Vhost lib remaps the memory areas.
> >>
> >>>>> In addition:
> >>>>> Do you really want to toggle vDPA drivers\app for any
> >>>>> configuration
> >>>> message? It may cause queue recreation for each one (at least for
> mlx5).
> >>>>
> >>>> I want to have something robust and maintainable.
> >>>
> >>> Me too.
> >>>
> >>>> These messages arriving after a queue have been configured once are
> >>>> rare events, but this is usually the kind of things that cause
> >>>> maintenance
> >> burden.
> >>>
> >>> In case of guest poll mode (testpmd virtio) we all the time get callfd
> twice.
> >>
> >> Right.
> >>
> >>>> If you look at my example patch, you will understand that with my
> >>>> proposal, there won't be any more state change notification than
> >>>> with your proposal when Qemu or any other Vhost-user master send a
> >>>> disable request before sending the request that impact the queue
> state.
> >>>
> >>> we didn't talk about disable time - this one is very simple.
> >>>
> >>> Yes, In case the queue is disabled your proposal doesn't send extra
> >> notification as my.
> >>> But in case the queue is ready, your proposal send extra not ready
> >> notification for kikfd,callfd,set_vring_base configurations.
> >>
> >> I think this is necessary for synchronization with the Vhost-user
> >> master (in case the master asks for this synchronization, like
> >> set_mem_table for instance when reply-ack is enabled).
> >>
> >>>> It just adds more robustness if this unlikely event happens, by
> >>>> invalidating the ring state to not ready before doing the actual
> >>>> ring
> >> configuration change.
> >>>> So that this config change is not missed by the vDPA driver or the
> >> application.
> >>>
> >>> One more issue here is that there is some time that device is ready
> >>> (already
> >> configured) and the first vittq-pair is not ready (your invalidate
> >> proposal for set_vring_base).
> >>
> >>
> >>
> >>> It doesn’t save the concept that device is ready only in case the
> >>> first virtq-
> >> pair is ready.
> >>
> >> I understand the spec as "the device is ready as soon as the first
> >> queue pair is ready", but I might be wrong.
> >>
> >> Do you suggest to call the dev_close() vDPA callback and the
> >> destroy_device() application callback as soon as one of the ring of
> >> the first queue pair receive a disable request or, with my patch,
> >> when one of the rings receives a request that changes the ring state?
> >
> > I means, your proposal actually may make first virtq-pair ready state
> disabled when device ready.
> > So, yes, it leads to call device close\destroy.
> 
> No it doesn't, there is no call to .dev_close()/.destroy_device() with my
> patch if first queue pair gets disabled.
> 
> >>> I will not insist anymore on waiting for enable for notifying
> >>> although I not
> >> fan with it.
> >>>
> >>> So, I suggest to create 1 notification function to be called after
> >>> message
> >> handler and before reply.
> >>> This function is the only one which notify ready states in the next
> options:
> >>>
> >>> 1. virtq ready state is changed in the queue.
> >>> 2. virtq ready state stays on after configuration message handler.
> >>> 3. device state will be enabled when the first queue pair is ready.
> >>
> >> IIUC, it will not disable the queues when there is a state change, is
> >> that correct? If so, I think it does not work with memory hotplug
> >> case I mentioned earlier.
> >
> > It will do enable again which mean - something was modified.
> 
> Ok, thanks for the clarification.
> 
> I think it is not enough for the examples I gave below. For set_mem_table,
> we need to stop the device from processing the vrings before the
> set_mem_table handler calls the munmap(), and re-enable it after the
> mmap() (I did that wrong in my example patch, I just did that after the
> munmap/mmap happened, which is too late).
> 
> >> Even for the callfd double change it can be problematic as Vhost-lib
> >> will close the first one while it will still be used by the driver
> >> (Btw, I see my example patch is also buggy in this regards, it should
> >> reset the call_fd value in the virtqueue, then call
> >> vhost_user_update_vring_state() and finally close the FD).
> >
> > Yes, this one leads for different handle for each message.
> >
> > Maybe it leads for new queue modify operation.
> > So, queue doesn't send the state - just does configuration change on the
> fly.
> >
> > What do you think?
> 
> I think that configuration on the fly doesn't fly.
> We would at least need to stop the device from processing the rings for
> memory hotplug case, so why not just send a disable notification?

Yes, driver need notification here.

> And for the double callfd, that does not look right to me not to request the
> driver to stop using it before it is closed, isn't it?

Yes, and some drivers (include mlx5) may stop the traffic in this case too.

modify\update operation will solve all:

For example:

In memory hotplug:
Do new mmap
Call modify
Do munmup for old.

In callfd\kickfd change:

Set new FD.
Call modify.
Close old FD.

Modify is clearer, save calls and faster (datapath will back faster).


>  Thanks,
> Maxime
> 
> >
> >> Thanks,
> >> Maxime
> >>>
> >>> Matan
> >>>
> >>>
> >>>
> >>>> Maxime
> >>>
> >


  reply	other threads:[~2020-06-24  5:54 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44   ` Maxime Coquelin
2020-06-19 13:28     ` Matan Azrad
2020-06-19 14:01       ` Maxime Coquelin
2020-06-21  6:26         ` Matan Azrad
2020-06-22  8:06           ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19  6:49   ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19  7:41   ` Maxime Coquelin
2020-06-19 12:04     ` Maxime Coquelin
2020-06-19 13:11     ` Matan Azrad
2020-06-19 13:54       ` Maxime Coquelin
2020-06-21  6:20         ` Matan Azrad
2020-06-22  8:04           ` Maxime Coquelin
2020-06-22  8:41             ` Matan Azrad
2020-06-22  8:56               ` Maxime Coquelin
2020-06-22 10:06                 ` Matan Azrad
2020-06-22 12:32                   ` Maxime Coquelin
2020-06-22 13:43                     ` Matan Azrad
2020-06-22 14:55                       ` Maxime Coquelin
2020-06-22 15:51                         ` Matan Azrad
2020-06-22 16:47                           ` Maxime Coquelin
2020-06-23  9:02                             ` Matan Azrad
2020-06-23  9:19                               ` Maxime Coquelin
2020-06-23 11:53                                 ` Matan Azrad
2020-06-23 13:55                                   ` Maxime Coquelin
2020-06-23 14:33                                     ` Maxime Coquelin
2020-06-23 14:52                                     ` Matan Azrad
2020-06-23 15:18                                       ` Maxime Coquelin
2020-06-24  5:54                                         ` Matan Azrad [this message]
2020-06-24  7:22                                           ` Maxime Coquelin
2020-06-24  8:38                                             ` Matan Azrad
2020-06-24  9:12                                               ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28  3:06     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10     ` Maxime Coquelin
2020-06-28  3:08     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15     ` Maxime Coquelin
2020-06-28  3:18     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19     ` Maxime Coquelin
2020-06-28  3:19     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29     ` Maxime Coquelin
2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17  1:41       ` Wang, Yinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB3PR0502MB40283E1DF652937C21319354D2950@DB3PR0502MB4028.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=maxime.coquelin@redhat.com \
    --cc=xiao.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git