DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Matan Azrad <matan@mellanox.com>, Xiao Wang <xiao.w.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
Date: Tue, 23 Jun 2020 16:33:57 +0200	[thread overview]
Message-ID: <ffc2afcf-b30b-a854-53bd-d42abf3d8fb6@redhat.com> (raw)
In-Reply-To: <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com>



On 6/23/20 3:55 PM, Maxime Coquelin wrote:
> Hi Matan,
> 
> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>
>>
>> From: Maxime Coquelin:
>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>
>>>>
>>>> From: Maxime Coquelin:
>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>
>>>>>>
>>>>>> From: Maxime Coquelin:
>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Maxime Coquelin:
>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>> definition
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Maxime
>>>>>>>>>>
>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>> definition
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>>>>> change callback should
>>>>>>>>>>> have been called.
>>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>>> and after
>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>
>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>> sometimes it will be ready before and after the message handler
>>>>>>>>>>> but
>>>>> with different values.
>>>>>>>>>>>
>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>>> virtio_net
>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>         if (vq->kickfd >= 0)
>>>>>>>>>>>                 close(vq->kickfd);
>>>>>>>>>>> +
>>>>>>>>>>> +       vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>> +
>>>>>>>>>>> +       vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>> +
>>>>>>>>>>>         vq->kickfd = file.fd;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>
>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>> message according to v1.
>>>>>>>>>>
>>>>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>>>>> doesn't
>>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>>> enabling the queue again.
>>>>>>>>>> Because of that we decided to force calling state callback again
>>>>>>>>>> when
>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
>>> the
>>>>>>> queue is
>>>>>>>>> already ready.
>>>>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>>>>> account
>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>>>>
>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>> But now looking into the details of the implementation, I find it
>>>>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>>>>
>>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>>> (queue state and
>>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>>> message
>>>>>>>>> handler:
>>>>>>>>>> Pseudo:
>>>>>>>>>>  vhost_user_update_ready_statuses() {
>>>>>>>>>> 	switch (msg):
>>>>>>>>>> 		case enable:
>>>>>>>>>> 			if(enable is 1)
>>>>>>>>>> 				force queue state =1.
>>>>>>>>>> 		case callfd
>>>>>>>>>> 		case kickfd
>>>>>>>>>> 				.....
>>>>>>>>>> 		Check queue and device ready + call callbacks if
>>> needed..
>>>>>>>>>> 		Default
>>>>>>>>>> 			Return;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>>> handled (after vring_invalidate(), before setting new FD for call
>>>>>>>>> & kick, ...)
>>>>>>>>
>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>> pair is invalidated,
>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>>>>> case of FD
>>>>>>> change.
>>>>>>>
>>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>>> the disable and so you'll have one notification for the disable and
>>>>>>> one for the enable, or it didn't sent the disable and it will
>>>>>>> happen at old value invalidation time and after new value is taken into
>>> account.
>>>>>>>
>>>>>>
>>>>>> I don't see it in current QEMU behavior.
>>>>>> When working MQ I see that some virtqs get configuration message
>>>>>> while
>>>>> they are in enabled state.
>>>>>> Then, enable message is sent again later.
>>>>>
>>>>> I guess you mean the first queue pair? And it would not be in ready
>>>>> state as it would be the initial configuration of the queue?
>>>>
>>>> Even after initialization when queue is ready.
>>>>
>>>>>>
>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>> only in one
>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>> handlers
>>>>> around?
>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>
>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>> sure this is a correct assumption:
>>>>>>>
>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>> client must support changing some configuration aspects on the fly."
>>>>>>
>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>
>>>>> I agree it lacks a bit of clarity.
>>>>>
>>>>>> As I mentioned, QEMU sends enable message always after configuration
>>>>> message.
>>>>>
>>>>> Yes, but we should not do assumptions on current Qemu version when
>>>>> possible. Better to be safe and follow the specification, it will be more
>>> robust.
>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>
>>>> I understand your point here but do you really want to be ready for any
>>> configuration update in run time?
>>>> What does it mean? How datatpath should handle configuration from
>>> control thread in run time while traffic is on?
>>>> For example, changing queue size \ addresses must stop traffic before...
>>>> Also changing FDs is very sensitive.
>>>>
>>>> It doesn't make sense to me.
>>>>
>>>> Also, according to "on the fly" direction we should not disable the queue
>>> unless enable message is coming to disable it.
>>
>> No response, so looks like you agree that it doesn't make sense.
> 
> No, my reply was general to all your comments.
> 
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so
> we are sure the ring won't be accessed by another thread while doing the
> change.
> 
> For vDPA case that's more problematic because we have no such locking
> mechanism.
> 
> For example memory hotplug, Qemu does not seem to disable the queues so
> we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
> 
>>>> In addition:
>>>> Do you really want to toggle vDPA drivers\app for any configuration
>>> message? It may cause queue recreation for each one (at least for mlx5).
>>>
>>> I want to have something robust and maintainable.
>>
>> Me too.
>>
>>> These messages arriving after a queue have been configured once are rare
>>> events, but this is usually the kind of things that cause maintenance burden.
>>
>> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
> 
> Right.
> 
>>> If you look at my example patch, you will understand that with my proposal,
>>> there won't be any more state change notification than with your proposal
>>> when Qemu or any other Vhost-user master send a disable request before
>>> sending the request that impact the queue state.
>>
>> we didn't talk about disable time - this one is very simple.
>>
>> Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
>> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.
> 
> I think this is necessary for synchronization with the Vhost-user
> master (in case the master asks for this synchronization, like
> set_mem_table for instance when reply-ack is enabled).
> 
>>> It just adds more robustness if this unlikely event happens, by invalidating
>>> the ring state to not ready before doing the actual ring configuration change.
>>> So that this config change is not missed by the vDPA driver or the application.
>>
>> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).
> 

Sorry, I forgot to reply here.
I am not sure about what do you mean about my invalidate proposal for
set_vring_base?

> 
>> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.
> 
> I understand the spec as "the device is ready as soon as the first queue
> pair is ready", but I might be wrong.
> 
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the
> first queue pair receive a disable request or, with my patch, when one
> of the rings receives a request that changes the ring state?
> 
>>
>> I will not insist anymore on waiting for enable for notifying although I not fan with it.
>>
>> So, I suggest to create 1 notification function to be called after message handler and before reply.
>> This function is the only one which notify ready states in the next options:
>>
>> 1. virtq ready state is changed in the queue.
>> 2. virtq ready state stays on after configuration message handler.
>> 3. device state will be enabled when the first queue pair is ready.
> 
> IIUC, it will not disable the queues when there is a state change, is
> that correct? If so, I think it does not work with memory hotplug case I
> mentioned earlier.
> 
> Even for the callfd double change it can be problematic as Vhost-lib
> will close the first one while it will still be used by the driver (Btw,
> I see my example patch is also buggy in this regards, it should reset
> the call_fd value in the virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).
> 
> Thanks,
> Maxime
>>
>> Matan
>>
>>
>>
>>> Maxime
>>
> 


  reply	other threads:[~2020-06-23 14:34 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19  6:44   ` Maxime Coquelin
2020-06-19 13:28     ` Matan Azrad
2020-06-19 14:01       ` Maxime Coquelin
2020-06-21  6:26         ` Matan Azrad
2020-06-22  8:06           ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19  6:49   ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19  7:41   ` Maxime Coquelin
2020-06-19 12:04     ` Maxime Coquelin
2020-06-19 13:11     ` Matan Azrad
2020-06-19 13:54       ` Maxime Coquelin
2020-06-21  6:20         ` Matan Azrad
2020-06-22  8:04           ` Maxime Coquelin
2020-06-22  8:41             ` Matan Azrad
2020-06-22  8:56               ` Maxime Coquelin
2020-06-22 10:06                 ` Matan Azrad
2020-06-22 12:32                   ` Maxime Coquelin
2020-06-22 13:43                     ` Matan Azrad
2020-06-22 14:55                       ` Maxime Coquelin
2020-06-22 15:51                         ` Matan Azrad
2020-06-22 16:47                           ` Maxime Coquelin
2020-06-23  9:02                             ` Matan Azrad
2020-06-23  9:19                               ` Maxime Coquelin
2020-06-23 11:53                                 ` Matan Azrad
2020-06-23 13:55                                   ` Maxime Coquelin
2020-06-23 14:33                                     ` Maxime Coquelin [this message]
2020-06-23 14:52                                     ` Matan Azrad
2020-06-23 15:18                                       ` Maxime Coquelin
2020-06-24  5:54                                         ` Matan Azrad
2020-06-24  7:22                                           ` Maxime Coquelin
2020-06-24  8:38                                             ` Matan Azrad
2020-06-24  9:12                                               ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28  3:06     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10     ` Maxime Coquelin
2020-06-28  3:08     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15     ` Maxime Coquelin
2020-06-28  3:18     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19     ` Maxime Coquelin
2020-06-28  3:19     ` Xia, Chenbo
2020-06-25 13:38   ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29     ` Maxime Coquelin
2020-06-29 14:08   ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24     ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17  1:41       ` Wang, Yinan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ffc2afcf-b30b-a854-53bd-d42abf3d8fb6@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=xiao.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).