DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Matan Azrad <matan@mellanox.com>, Xiao Wang <xiao.w.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
Date: Wed, 24 Jun 2020 09:22:02 +0200
Message-ID: <0cde445d-7008-b4e0-9f3c-8e7d59f2751d@redhat.com> (raw)
In-Reply-To: <DB3PR0502MB40283E1DF652937C21319354D2950@DB3PR0502MB4028.eurprd05.prod.outlook.com>

Good morning Matan,

On 6/24/20 7:54 AM, Matan Azrad wrote:
> Ho Maxime
> 
> Good morning
> 
> From: Maxime Coquelin:
>> On 6/23/20 4:52 PM, Matan Azrad wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>> Sent: Tuesday, June 23, 2020 4:56 PM
>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>> <xiao.w.wang@intel.com>
>>>> Cc: dev@dpdk.org
>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition
>>>>
>>>> Hi Matan,
>>>>
>>>> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>>>>
>>>>>
>>>>> From: Maxime Coquelin:
>>>>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>>>>
>>>>>>>
>>>>>>> From: Maxime Coquelin:
>>>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> From: Maxime Coquelin:
>>>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>> definition
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maxime
>>>>>>>>>>>>>
>>>>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>>>>> The issue is if you only check ready state only before
>>>>>>>>>>>>>>>> and after the message affecting the ring is handled, it
>>>>>>>>>>>>>>>> can be ready at both stages, while the rings have changed
>>>>>>>>>>>>>>>> and state change callback should
>>>>>>>>>>>>>> have been called.
>>>>>>>>>>>>>>> But in this version I checked twice, before message
>>>>>>>>>>>>>>> handler and after
>>>>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>>>>> sometimes it will be ready before and after the message
>>>>>>>>>>>>>> handler but
>>>>>>>> with different values.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>>>>> @@ -1847,15 +1892,16 @@
>> vhost_user_set_vring_kick(struct
>>>>>>>>>> virtio_net
>>>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>>>>> message according to v1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One of our assumption we agreed on in the design mail is
>>>>>>>>>>>>> that it doesn't
>>>>>>>>>>>> make sense that QEMU will change queue configuration
>> without
>>>>>>>>>>>> enabling the queue again.
>>>>>>>>>>>>> Because of that we decided to force calling state callback
>>>>>>>>>>>>> again when
>>>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message
>> even
>>>> if
>>>>>> the
>>>>>>>>>> queue is
>>>>>>>>>>>> already ready.
>>>>>>>>>>>>> So when driver/app see state enable->enable, it should take
>>>>>>>>>>>>> into account
>>>>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think that this assumption is correct according to the
>>>>>>>>>>>>> QEMU
>>>> code.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>>>>> But now looking into the details of the implementation, I
>>>>>>>>>>>> find it is even cleaner & clearer not to do this assumption.
>>>>>>>>>>>>
>>>>>>>>>>>>> That's why I prefer to collect all the ready checks
>>>>>>>>>>>>> callbacks (queue state and
>>>>>>>>>>>> device new\conf) to one function that will be called after
>>>>>>>>>>>> the message
>>>>>>>>>>>> handler:
>>>>>>>>>>>>> Pseudo:
>>>>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>>>>> 	switch (msg):
>>>>>>>>>>>>> 		case enable:
>>>>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>>>>> 				force queue state =1.
>>>>>>>>>>>>> 		case callfd
>>>>>>>>>>>>> 		case kickfd
>>>>>>>>>>>>> 				.....
>>>>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>>>>> needed..
>>>>>>>>>>>>> 		Default
>>>>>>>>>>>>> 			Return;
>>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> I find it more natural to "invalidate" ready state where it
>>>>>>>>>>>> is handled (after vring_invalidate(), before setting new FD
>>>>>>>>>>>> for call & kick, ...)
>>>>>>>>>>>
>>>>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>>>>> pair is invalidated,
>>>>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>>>>> Also it will cause 2 notifications to the driver instead of
>>>>>>>>>>> one in case of FD
>>>>>>>>>> change.
>>>>>>>>>>
>>>>>>>>>> You'll always end-up with two notifications, either Qemu has
>>>>>>>>>> sent the disable and so you'll have one notification for the
>>>>>>>>>> disable and one for the enable, or it didn't sent the disable
>>>>>>>>>> and it will happen at old value invalidation time and after new
>>>>>>>>>> value is taken into
>>>>>> account.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don't see it in current QEMU behavior.
>>>>>>>>> When working MQ I see that some virtqs get configuration
>> message
>>>>>>>>> while
>>>>>>>> they are in enabled state.
>>>>>>>>> Then, enable message is sent again later.
>>>>>>>>
>>>>>>>> I guess you mean the first queue pair? And it would not be in
>>>>>>>> ready state as it would be the initial configuration of the queue?
>>>>>>>
>>>>>>> Even after initialization when queue is ready.
>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>>>>> only in one
>>>>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>>>>> handlers
>>>>>>>> around?
>>>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>>>>
>>>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>>>>> sure this is a correct assumption:
>>>>>>>>>>
>>>>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>>>>> client must support changing some configuration aspects on the
>> fly."
>>>>>>>>>
>>>>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>>>>
>>>>>>>> I agree it lacks a bit of clarity.
>>>>>>>>
>>>>>>>>> As I mentioned, QEMU sends enable message always after
>>>>>>>>> configuration
>>>>>>>> message.
>>>>>>>>
>>>>>>>> Yes, but we should not do assumptions on current Qemu version
>>>>>>>> when possible. Better to be safe and follow the specification, it
>>>>>>>> will be more
>>>>>> robust.
>>>>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>>>>
>>>>>>> I understand your point here but do you really want to be ready
>>>>>>> for any
>>>>>> configuration update in run time?
>>>>>>> What does it mean? How datatpath should handle configuration from
>>>>>> control thread in run time while traffic is on?
>>>>>>> For example, changing queue size \ addresses must stop traffic
>> before...
>>>>>>> Also changing FDs is very sensitive.
>>>>>>>
>>>>>>> It doesn't make sense to me.
>>>>>>>
>>>>>>> Also, according to "on the fly" direction we should not disable
>>>>>>> the queue
>>>>>> unless enable message is coming to disable it.
>>>>>
>>>>> No response, so looks like you agree that it doesn't make sense.
>>>>
>>>> No, my reply was general to all your comments.
>>>>
>>>> With SW backend, I agree we don't need to disable the rings in case
>>>> of asynchronous changes to the ring because we protect it with a
>>>> lock, so we are sure the ring won't be accessed by another thread
>>>> while doing the change.
>>>>
>>>> For vDPA case that's more problematic because we have no such locking
>>>> mechanism.
>>>>
>>>> For example memory hotplug, Qemu does not seem to disable the
>> queues
>>>> so we need to stop the vDPA device one way or another so that it does
>>>> not process the rings while the Vhost lib remaps the memory areas.
>>>>
>>>>>>> In addition:
>>>>>>> Do you really want to toggle vDPA drivers\app for any
>>>>>>> configuration
>>>>>> message? It may cause queue recreation for each one (at least for
>> mlx5).
>>>>>>
>>>>>> I want to have something robust and maintainable.
>>>>>
>>>>> Me too.
>>>>>
>>>>>> These messages arriving after a queue have been configured once are
>>>>>> rare events, but this is usually the kind of things that cause
>>>>>> maintenance
>>>> burden.
>>>>>
>>>>> In case of guest poll mode (testpmd virtio) we all the time get callfd
>> twice.
>>>>
>>>> Right.
>>>>
>>>>>> If you look at my example patch, you will understand that with my
>>>>>> proposal, there won't be any more state change notification than
>>>>>> with your proposal when Qemu or any other Vhost-user master send a
>>>>>> disable request before sending the request that impact the queue
>> state.
>>>>>
>>>>> we didn't talk about disable time - this one is very simple.
>>>>>
>>>>> Yes, In case the queue is disabled your proposal doesn't send extra
>>>> notification as my.
>>>>> But in case the queue is ready, your proposal send extra not ready
>>>> notification for kikfd,callfd,set_vring_base configurations.
>>>>
>>>> I think this is necessary for synchronization with the Vhost-user
>>>> master (in case the master asks for this synchronization, like
>>>> set_mem_table for instance when reply-ack is enabled).
>>>>
>>>>>> It just adds more robustness if this unlikely event happens, by
>>>>>> invalidating the ring state to not ready before doing the actual
>>>>>> ring
>>>> configuration change.
>>>>>> So that this config change is not missed by the vDPA driver or the
>>>> application.
>>>>>
>>>>> One more issue here is that there is some time that device is ready
>>>>> (already
>>>> configured) and the first vittq-pair is not ready (your invalidate
>>>> proposal for set_vring_base).
>>>>
>>>>
>>>>
>>>>> It doesn’t save the concept that device is ready only in case the
>>>>> first virtq-
>>>> pair is ready.
>>>>
>>>> I understand the spec as "the device is ready as soon as the first
>>>> queue pair is ready", but I might be wrong.
>>>>
>>>> Do you suggest to call the dev_close() vDPA callback and the
>>>> destroy_device() application callback as soon as one of the ring of
>>>> the first queue pair receive a disable request or, with my patch,
>>>> when one of the rings receives a request that changes the ring state?
>>>
>>> I means, your proposal actually may make first virtq-pair ready state
>> disabled when device ready.
>>> So, yes, it leads to call device close\destroy.
>>
>> No it doesn't, there is no call to .dev_close()/.destroy_device() with my
>> patch if first queue pair gets disabled.
>>
>>>>> I will not insist anymore on waiting for enable for notifying
>>>>> although I not
>>>> fan with it.
>>>>>
>>>>> So, I suggest to create 1 notification function to be called after
>>>>> message
>>>> handler and before reply.
>>>>> This function is the only one which notify ready states in the next
>> options:
>>>>>
>>>>> 1. virtq ready state is changed in the queue.
>>>>> 2. virtq ready state stays on after configuration message handler.
>>>>> 3. device state will be enabled when the first queue pair is ready.
>>>>
>>>> IIUC, it will not disable the queues when there is a state change, is
>>>> that correct? If so, I think it does not work with memory hotplug
>>>> case I mentioned earlier.
>>>
>>> It will do enable again which mean - something was modified.
>>
>> Ok, thanks for the clarification.
>>
>> I think it is not enough for the examples I gave below. For set_mem_table,
>> we need to stop the device from processing the vrings before the
>> set_mem_table handler calls the munmap(), and re-enable it after the
>> mmap() (I did that wrong in my example patch, I just did that after the
>> munmap/mmap happened, which is too late).
>>
>>>> Even for the callfd double change it can be problematic as Vhost-lib
>>>> will close the first one while it will still be used by the driver
>>>> (Btw, I see my example patch is also buggy in this regards, it should
>>>> reset the call_fd value in the virtqueue, then call
>>>> vhost_user_update_vring_state() and finally close the FD).
>>>
>>> Yes, this one leads for different handle for each message.
>>>
>>> Maybe it leads for new queue modify operation.
>>> So, queue doesn't send the state - just does configuration change on the
>> fly.
>>>
>>> What do you think?
>>
>> I think that configuration on the fly doesn't fly.
>> We would at least need to stop the device from processing the rings for
>> memory hotplug case, so why not just send a disable notification?
> 
> Yes, driver need notification here.
> 
>> And for the double callfd, that does not look right to me not to request the
>> driver to stop using it before it is closed, isn't it?
> 
> Yes, and some drivers (include mlx5) may stop the traffic in this case too.
> 
> modify\update operation will solve all:
> 
> For example:
> 
> In memory hotplug:
> Do new mmap
> Call modify
> Do munmup for old.
> 
> In callfd\kickfd change:
> 
> Set new FD.
> Call modify.
> Close old FD.
> 
> Modify is clearer, save calls and faster (datapath will back faster).

It should work, but that is not light modifications to do in
set_mem_table handler (the function is quite complex already with
postcopy live-migration support).

With a modify callback, won't the driver part be more complex? Since it
would have to check which state has changed in the ring, and based on
that decide whether it should stop the ring or not.

As you says that in case of memory hotplug and double callfd, the driver
may stop processing the rings anyway, so would it be that much faster
than disabling/enabling the vring?

These events having a very rare occurrence, does it really matter if
it is a bit longer?

Thanks,
Maxime

> 
>>  Thanks,
>> Maxime
>>
>>>
>>>> Thanks,
>>>> Maxime
>>>>>
>>>>> Matan
>>>>>
>>>>>
>>>>>
>>>>>> Maxime
>>>>>
>>>
> 


  reply	other threads:[~2020-06-24  7:22 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44   ` Maxime Coquelin
2020-06-19 13:28     ` Matan Azrad
2020-06-19 14:01       ` Maxime Coquelin
2020-06-21  6:26         ` Matan Azrad
2020-06-22  8:06           ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19  6:49   ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19  7:41   ` Maxime Coquelin
2020-06-19 12:04     ` Maxime Coquelin
2020-06-19 13:11     ` Matan Azrad
2020-06-19 13:54       ` Maxime Coquelin
2020-06-21  6:20         ` Matan Azrad
2020-06-22  8:04           ` Maxime Coquelin
2020-06-22  8:41             ` Matan Azrad
2020-06-22  8:56               ` Maxime Coquelin
2020-06-22 10:06                 ` Matan Azrad
2020-06-22 12:32                   ` Maxime Coquelin
2020-06-22 13:43                     ` Matan Azrad
2020-06-22 14:55                       ` Maxime Coquelin
2020-06-22 15:51                         ` Matan Azrad
2020-06-22 16:47                           ` Maxime Coquelin
2020-06-23  9:02                             ` Matan Azrad
2020-06-23  9:19                               ` Maxime Coquelin
2020-06-23 11:53                                 ` Matan Azrad
2020-06-23 13:55                                   ` Maxime Coquelin
2020-06-23 14:33                                     ` Maxime Coquelin
2020-06-23 14:52                                     ` Matan Azrad
2020-06-23 15:18                                       ` Maxime Coquelin
2020-06-24  5:54                                         ` Matan Azrad
2020-06-24  7:22                                           ` Maxime Coquelin [this message]
2020-06-24  8:38                                             ` Matan Azrad
2020-06-24  9:12                                               ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28  3:06     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10     ` Maxime Coquelin
2020-06-28  3:08     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15     ` Maxime Coquelin
2020-06-28  3:18     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19     ` Maxime Coquelin
2020-06-28  3:19     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29     ` Maxime Coquelin
2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17  1:41       ` Wang, Yinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cde445d-7008-b4e0-9f3c-8e7d59f2751d@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=xiao.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git