From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: Matan Azrad <matan@mellanox.com>, Xiao Wang <xiao.w.wang@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition
Date: Tue, 23 Jun 2020 16:33:57 +0200 [thread overview]
Message-ID: <ffc2afcf-b30b-a854-53bd-d42abf3d8fb6@redhat.com> (raw)
In-Reply-To: <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com>
On 6/23/20 3:55 PM, Maxime Coquelin wrote:
> Hi Matan,
>
> On 6/23/20 1:53 PM, Matan Azrad wrote:
>>
>>
>> From: Maxime Coquelin:
>>> On 6/23/20 11:02 AM, Matan Azrad wrote:
>>>>
>>>>
>>>> From: Maxime Coquelin:
>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote:
>>>>>>
>>>>>>
>>>>>> From: Maxime Coquelin:
>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Maxime Coquelin:
>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM
>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>> definition
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Maxime
>>>>>>>>>>
>>>>>>>>>> From: Maxime Coquelin <maxime.coquelin@redhat.com>
>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM
>>>>>>>>>>> To: Matan Azrad <matan@mellanox.com>; Xiao Wang
>>>>>>>>>>> <xiao.w.wang@intel.com>
>>>>>>>>>>> Cc: dev@dpdk.org
>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready
>>>>>>>>>>> definition
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote:
>>>>>>>>>>>>> The issue is if you only check ready state only before and
>>>>>>>>>>>>> after the message affecting the ring is handled, it can be
>>>>>>>>>>>>> ready at both stages, while the rings have changed and state
>>>>>>>>>>>>> change callback should
>>>>>>>>>>> have been called.
>>>>>>>>>>>> But in this version I checked twice, before message handler
>>>>>>>>>>>> and after
>>>>>>>>>>> message handler, so it should catch any update.
>>>>>>>>>>>
>>>>>>>>>>> No, this is not enough, we have to check also during some
>>>>>>>>>>> handlers, so that the ready state is invalidated because
>>>>>>>>>>> sometimes it will be ready before and after the message handler
>>>>>>>>>>> but
>>>>> with different values.
>>>>>>>>>>>
>>>>>>>>>>> That's what I did in my example patch:
>>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct
>>>>>>> virtio_net
>>>>>>>>>>> **pdev, struct VhostUserMsg *msg,
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> if (vq->kickfd >= 0)
>>>>>>>>>>> close(vq->kickfd);
>>>>>>>>>>> +
>>>>>>>>>>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
>>>>>>>>>>> +
>>>>>>>>>>> + vhost_user_update_vring_state(dev, file.index);
>>>>>>>>>>> +
>>>>>>>>>>> vq->kickfd = file.fd;
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Without that, the ready check will return ready before and
>>>>>>>>>>> after the kickfd changed and the driver won't be notified.
>>>>>>>>>>
>>>>>>>>>> The driver will be notified in the next
>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE
>>>>>>>>> message according to v1.
>>>>>>>>>>
>>>>>>>>>> One of our assumption we agreed on in the design mail is that it
>>>>>>>>>> doesn't
>>>>>>>>> make sense that QEMU will change queue configuration without
>>>>>>>>> enabling the queue again.
>>>>>>>>>> Because of that we decided to force calling state callback again
>>>>>>>>>> when
>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if
>>> the
>>>>>>> queue is
>>>>>>>>> already ready.
>>>>>>>>>> So when driver/app see state enable->enable, it should take into
>>>>>>>>>> account
>>>>>>>>> that the queue configuration was probably changed.
>>>>>>>>>>
>>>>>>>>>> I think that this assumption is correct according to the QEMU code.
>>>>>>>>>
>>>>>>>>> Yes, this was our initial assumption.
>>>>>>>>> But now looking into the details of the implementation, I find it
>>>>>>>>> is even cleaner & clearer not to do this assumption.
>>>>>>>>>
>>>>>>>>>> That's why I prefer to collect all the ready checks callbacks
>>>>>>>>>> (queue state and
>>>>>>>>> device new\conf) to one function that will be called after the
>>>>>>>>> message
>>>>>>>>> handler:
>>>>>>>>>> Pseudo:
>>>>>>>>>> vhost_user_update_ready_statuses() {
>>>>>>>>>> switch (msg):
>>>>>>>>>> case enable:
>>>>>>>>>> if(enable is 1)
>>>>>>>>>> force queue state =1.
>>>>>>>>>> case callfd
>>>>>>>>>> case kickfd
>>>>>>>>>> .....
>>>>>>>>>> Check queue and device ready + call callbacks if
>>> needed..
>>>>>>>>>> Default
>>>>>>>>>> Return;
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I find it more natural to "invalidate" ready state where it is
>>>>>>>>> handled (after vring_invalidate(), before setting new FD for call
>>>>>>>>> & kick, ...)
>>>>>>>>
>>>>>>>> I think that if you go with this direction, if the first queue
>>>>>>>> pair is invalidated,
>>>>>>> you need to notify app\driver also about device ready change.
>>>>>>>> Also it will cause 2 notifications to the driver instead of one in
>>>>>>>> case of FD
>>>>>>> change.
>>>>>>>
>>>>>>> You'll always end-up with two notifications, either Qemu has sent
>>>>>>> the disable and so you'll have one notification for the disable and
>>>>>>> one for the enable, or it didn't sent the disable and it will
>>>>>>> happen at old value invalidation time and after new value is taken into
>>> account.
>>>>>>>
>>>>>>
>>>>>> I don't see it in current QEMU behavior.
>>>>>> When working MQ I see that some virtqs get configuration message
>>>>>> while
>>>>> they are in enabled state.
>>>>>> Then, enable message is sent again later.
>>>>>
>>>>> I guess you mean the first queue pair? And it would not be in ready
>>>>> state as it would be the initial configuration of the queue?
>>>>
>>>> Even after initialization when queue is ready.
>>>>
>>>>>>
>>>>>>>> Why not to take this correct assumption and update ready state
>>>>>>>> only in one
>>>>>>> point in the code instead of doing it in all the configuration
>>>>>>> handlers
>>>>> around?
>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner.
>>>>>>>
>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so
>>>>>>> sure this is a correct assumption:
>>>>>>>
>>>>>>> "While processing the rings (whether they are enabled or not),
>>>>>>> client must support changing some configuration aspects on the fly."
>>>>>>
>>>>>> Ok, this doesn't explain how configuration is changed on the fly.
>>>>>
>>>>> I agree it lacks a bit of clarity.
>>>>>
>>>>>> As I mentioned, QEMU sends enable message always after configuration
>>>>> message.
>>>>>
>>>>> Yes, but we should not do assumptions on current Qemu version when
>>>>> possible. Better to be safe and follow the specification, it will be more
>>> robust.
>>>>> There is also the Virtio-user PMD to take into account for example.
>>>>
>>>> I understand your point here but do you really want to be ready for any
>>> configuration update in run time?
>>>> What does it mean? How datatpath should handle configuration from
>>> control thread in run time while traffic is on?
>>>> For example, changing queue size \ addresses must stop traffic before...
>>>> Also changing FDs is very sensitive.
>>>>
>>>> It doesn't make sense to me.
>>>>
>>>> Also, according to "on the fly" direction we should not disable the queue
>>> unless enable message is coming to disable it.
>>
>> No response, so looks like you agree that it doesn't make sense.
>
> No, my reply was general to all your comments.
>
> With SW backend, I agree we don't need to disable the rings in case of
> asynchronous changes to the ring because we protect it with a lock, so
> we are sure the ring won't be accessed by another thread while doing the
> change.
>
> For vDPA case that's more problematic because we have no such locking
> mechanism.
>
> For example memory hotplug, Qemu does not seem to disable the queues so
> we need to stop the vDPA device one way or another so that it does not
> process the rings while the Vhost lib remaps the memory areas.
>
>>>> In addition:
>>>> Do you really want to toggle vDPA drivers\app for any configuration
>>> message? It may cause queue recreation for each one (at least for mlx5).
>>>
>>> I want to have something robust and maintainable.
>>
>> Me too.
>>
>>> These messages arriving after a queue have been configured once are rare
>>> events, but this is usually the kind of things that cause maintenance burden.
>>
>> In case of guest poll mode (testpmd virtio) we all the time get callfd twice.
>
> Right.
>
>>> If you look at my example patch, you will understand that with my proposal,
>>> there won't be any more state change notification than with your proposal
>>> when Qemu or any other Vhost-user master send a disable request before
>>> sending the request that impact the queue state.
>>
>> we didn't talk about disable time - this one is very simple.
>>
>> Yes, In case the queue is disabled your proposal doesn't send extra notification as my.
>> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations.
>
> I think this is necessary for synchronization with the Vhost-user
> master (in case the master asks for this synchronization, like
> set_mem_table for instance when reply-ack is enabled).
>
>>> It just adds more robustness if this unlikely event happens, by invalidating
>>> the ring state to not ready before doing the actual ring configuration change.
>>> So that this config change is not missed by the vDPA driver or the application.
>>
>> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base).
>
Sorry, I forgot to reply here.
I am not sure about what do you mean about my invalidate proposal for
set_vring_base?
>
>> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready.
>
> I understand the spec as "the device is ready as soon as the first queue
> pair is ready", but I might be wrong.
>
> Do you suggest to call the dev_close() vDPA callback and the
> destroy_device() application callback as soon as one of the ring of the
> first queue pair receive a disable request or, with my patch, when one
> of the rings receives a request that changes the ring state?
>
>>
>> I will not insist anymore on waiting for enable for notifying although I not fan with it.
>>
>> So, I suggest to create 1 notification function to be called after message handler and before reply.
>> This function is the only one which notify ready states in the next options:
>>
>> 1. virtq ready state is changed in the queue.
>> 2. virtq ready state stays on after configuration message handler.
>> 3. device state will be enabled when the first queue pair is ready.
>
> IIUC, it will not disable the queues when there is a state change, is
> that correct? If so, I think it does not work with memory hotplug case I
> mentioned earlier.
>
> Even for the callfd double change it can be problematic as Vhost-lib
> will close the first one while it will still be used by the driver (Btw,
> I see my example patch is also buggy in this regards, it should reset
> the call_fd value in the virtqueue, then call
> vhost_user_update_vring_state() and finally close the FD).
>
> Thanks,
> Maxime
>>
>> Matan
>>
>>
>>
>>> Maxime
>>
>
next prev parent reply other threads:[~2020-06-23 14:34 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-18 16:28 [dpdk-dev] [PATCH v1 0/4] vhost: improve ready state Matan Azrad
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 1/4] vhost: support host notifier queue configuration Matan Azrad
2020-06-19 6:44 ` Maxime Coquelin
2020-06-19 13:28 ` Matan Azrad
2020-06-19 14:01 ` Maxime Coquelin
2020-06-21 6:26 ` Matan Azrad
2020-06-22 8:06 ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 2/4] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-19 6:49 ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition Matan Azrad
2020-06-19 7:41 ` Maxime Coquelin
2020-06-19 12:04 ` Maxime Coquelin
2020-06-19 13:11 ` Matan Azrad
2020-06-19 13:54 ` Maxime Coquelin
2020-06-21 6:20 ` Matan Azrad
2020-06-22 8:04 ` Maxime Coquelin
2020-06-22 8:41 ` Matan Azrad
2020-06-22 8:56 ` Maxime Coquelin
2020-06-22 10:06 ` Matan Azrad
2020-06-22 12:32 ` Maxime Coquelin
2020-06-22 13:43 ` Matan Azrad
2020-06-22 14:55 ` Maxime Coquelin
2020-06-22 15:51 ` Matan Azrad
2020-06-22 16:47 ` Maxime Coquelin
2020-06-23 9:02 ` Matan Azrad
2020-06-23 9:19 ` Maxime Coquelin
2020-06-23 11:53 ` Matan Azrad
2020-06-23 13:55 ` Maxime Coquelin
2020-06-23 14:33 ` Maxime Coquelin [this message]
2020-06-23 14:52 ` Matan Azrad
2020-06-23 15:18 ` Maxime Coquelin
2020-06-24 5:54 ` Matan Azrad
2020-06-24 7:22 ` Maxime Coquelin
2020-06-24 8:38 ` Matan Azrad
2020-06-24 9:12 ` Maxime Coquelin
2020-06-18 16:28 ` [dpdk-dev] [PATCH v1 4/4] vdpa/mlx5: support queue update Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 1/5] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-28 3:06 ` Xia, Chenbo
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 2/5] vhost: improve device readiness notifications Matan Azrad
2020-06-26 12:10 ` Maxime Coquelin
2020-06-28 3:08 ` Xia, Chenbo
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 3/5] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-26 12:15 ` Maxime Coquelin
2020-06-28 3:18 ` Xia, Chenbo
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 4/5] vhost: notify virtq file descriptor update Matan Azrad
2020-06-26 12:19 ` Maxime Coquelin
2020-06-28 3:19 ` Xia, Chenbo
2020-06-25 13:38 ` [dpdk-dev] [PATCH v2 5/5] vdpa/mlx5: support queue update Matan Azrad
2020-06-26 12:29 ` Maxime Coquelin
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 2/6] vhost: skip access lock when vDPA is configured Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 3/6] vhost: improve device readiness notifications Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 4/6] vhost: handle memory hotplug with vDPA devices Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 5/6] vhost: notify virtq file descriptor update Matan Azrad
2020-06-29 14:08 ` [dpdk-dev] [PATCH v3 6/6] vdpa/mlx5: support queue update Matan Azrad
2020-06-29 17:24 ` [dpdk-dev] [PATCH v3 0/6] vhost: improve ready state Maxime Coquelin
2020-07-17 1:41 ` Wang, Yinan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ffc2afcf-b30b-a854-53bd-d42abf3d8fb6@redhat.com \
--to=maxime.coquelin@redhat.com \
--cc=dev@dpdk.org \
--cc=matan@mellanox.com \
--cc=xiao.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).