From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 36B0DA0350; Wed, 24 Jun 2020 11:13:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E11C01D5E9; Wed, 24 Jun 2020 11:13:08 +0200 (CEST) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by dpdk.org (Postfix) with ESMTP id 6036E1D578 for ; Wed, 24 Jun 2020 11:13:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592989986; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=3Hh4erIIckzGlyGaGvP1xs+hGVZNj/MJxAjDUAcfaJw=; b=U0G+JYBZQ10irGmx78ID+ThM3R17Zm0fI0qPL6TRq8I4KTlwFLbMg6PtVQSgAX2bFIMX03 9dN0sjoz6j5MiAZUPMXcM5bm8SLvXUd1FD4mtu4X3vgOm7kBJ8TlGB5VJKbWf4ibnZtdL9 LtxkeIoZ8kgbLTdjMlp4O9wKgkTKOcQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-366-8HSgGIK8OCqS8HZty9cGIQ-1; Wed, 24 Jun 2020 05:13:04 -0400 X-MC-Unique: 8HSgGIK8OCqS8HZty9cGIQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A98491005512; Wed, 24 Jun 2020 09:13:03 +0000 (UTC) Received: from [10.36.110.7] (unknown [10.36.110.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C0D7B61169; Wed, 24 Jun 2020 09:13:01 +0000 (UTC) To: Matan Azrad , Xiao Wang Cc: "dev@dpdk.org" References: <1592497686-433697-1-git-send-email-matan@mellanox.com> <6a2d8143-e48d-cdef-5b16-1c340d99b1ae@redhat.com> <3572c65c-394c-496f-483c-a57019550339@redhat.com> <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com> <0cde445d-7008-b4e0-9f3c-8e7d59f2751d@redhat.com> From: Maxime Coquelin Autocrypt: addr=maxime.coquelin@redhat.com; keydata= mQINBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABtCxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPokCOAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASq5Ag0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAGJ Ah8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= Message-ID: <6c7a4198-2048-e573-da33-2d5f9025870a@redhat.com> Date: Wed, 24 Jun 2020 11:12:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 6/24/20 10:38 AM, Matan Azrad wrote: > > >> -----Original Message----- >> From: Maxime Coquelin >> Sent: Wednesday, June 24, 2020 10:22 AM >> To: Matan Azrad ; Xiao Wang >> >> Cc: dev@dpdk.org >> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition >> >> Good morning Matan, >> >> On 6/24/20 7:54 AM, Matan Azrad wrote: >>> Ho Maxime >>> >>> Good morning >>> >>> From: Maxime Coquelin: >>>> On 6/23/20 4:52 PM, Matan Azrad wrote: >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Maxime Coquelin >>>>>> Sent: Tuesday, June 23, 2020 4:56 PM >>>>>> To: Matan Azrad ; Xiao Wang >>>>>> >>>>>> Cc: dev@dpdk.org >>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready definition >>>>>> >>>>>> Hi Matan, >>>>>> >>>>>> On 6/23/20 1:53 PM, Matan Azrad wrote: >>>>>>> >>>>>>> >>>>>>> From: Maxime Coquelin: >>>>>>>> On 6/23/20 11:02 AM, Matan Azrad wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> From: Maxime Coquelin: >>>>>>>>>> On 6/22/20 5:51 PM, Matan Azrad wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> From: Maxime Coquelin: >>>>>>>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> From: Maxime Coquelin: >>>>>>>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM >>>>>>>>>>>>>> To: Matan Azrad ; Xiao Wang >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cc: dev@dpdk.org >>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready >>>>>>>>>>>>>> definition >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Maxime >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> From: Maxime Coquelin >>>>>>>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM >>>>>>>>>>>>>>>> To: Matan Azrad ; Xiao Wang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cc: dev@dpdk.org >>>>>>>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready >>>>>>>>>>>>>>>> definition >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote: >>>>>>>>>>>>>>>>>> The issue is if you only check ready state only before >>>>>>>>>>>>>>>>>> and after the message affecting the ring is handled, it >>>>>>>>>>>>>>>>>> can be ready at both stages, while the rings have >>>>>>>>>>>>>>>>>> changed and state change callback should >>>>>>>>>>>>>>>> have been called. >>>>>>>>>>>>>>>>> But in this version I checked twice, before message >>>>>>>>>>>>>>>>> handler and after >>>>>>>>>>>>>>>> message handler, so it should catch any update. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No, this is not enough, we have to check also during some >>>>>>>>>>>>>>>> handlers, so that the ready state is invalidated because >>>>>>>>>>>>>>>> sometimes it will be ready before and after the message >>>>>>>>>>>>>>>> handler but >>>>>>>>>> with different values. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That's what I did in my example patch: >>>>>>>>>>>>>>>> @@ -1847,15 +1892,16 @@ >>>> vhost_user_set_vring_kick(struct >>>>>>>>>>>> virtio_net >>>>>>>>>>>>>>>> **pdev, struct VhostUserMsg *msg, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if (vq->kickfd >= 0) >>>>>>>>>>>>>>>> close(vq->kickfd); >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> + vhost_user_update_vring_state(dev, file.index); >>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>> vq->kickfd = file.fd; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Without that, the ready check will return ready before >>>>>>>>>>>>>>>> and after the kickfd changed and the driver won't be >> notified. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The driver will be notified in the next >>>>>>>>>>>>>>> VHOST_USER_SET_VRING_ENABLE >>>>>>>>>>>>>> message according to v1. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One of our assumption we agreed on in the design mail is >>>>>>>>>>>>>>> that it doesn't >>>>>>>>>>>>>> make sense that QEMU will change queue configuration >>>> without >>>>>>>>>>>>>> enabling the queue again. >>>>>>>>>>>>>>> Because of that we decided to force calling state callback >>>>>>>>>>>>>>> again when >>>>>>>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message >>>> even >>>>>> if >>>>>>>> the >>>>>>>>>>>> queue is >>>>>>>>>>>>>> already ready. >>>>>>>>>>>>>>> So when driver/app see state enable->enable, it should >>>>>>>>>>>>>>> take into account >>>>>>>>>>>>>> that the queue configuration was probably changed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think that this assumption is correct according to the >>>>>>>>>>>>>>> QEMU >>>>>> code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, this was our initial assumption. >>>>>>>>>>>>>> But now looking into the details of the implementation, I >>>>>>>>>>>>>> find it is even cleaner & clearer not to do this assumption. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> That's why I prefer to collect all the ready checks >>>>>>>>>>>>>>> callbacks (queue state and >>>>>>>>>>>>>> device new\conf) to one function that will be called after >>>>>>>>>>>>>> the message >>>>>>>>>>>>>> handler: >>>>>>>>>>>>>>> Pseudo: >>>>>>>>>>>>>>> vhost_user_update_ready_statuses() { >>>>>>>>>>>>>>> switch (msg): >>>>>>>>>>>>>>> case enable: >>>>>>>>>>>>>>> if(enable is 1) >>>>>>>>>>>>>>> force queue state =1. >>>>>>>>>>>>>>> case callfd >>>>>>>>>>>>>>> case kickfd >>>>>>>>>>>>>>> ..... >>>>>>>>>>>>>>> Check queue and device ready + call callbacks >> if >>>>>>>> needed.. >>>>>>>>>>>>>>> Default >>>>>>>>>>>>>>> Return; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> I find it more natural to "invalidate" ready state where it >>>>>>>>>>>>>> is handled (after vring_invalidate(), before setting new FD >>>>>>>>>>>>>> for call & kick, ...) >>>>>>>>>>>>> >>>>>>>>>>>>> I think that if you go with this direction, if the first >>>>>>>>>>>>> queue pair is invalidated, >>>>>>>>>>>> you need to notify app\driver also about device ready change. >>>>>>>>>>>>> Also it will cause 2 notifications to the driver instead of >>>>>>>>>>>>> one in case of FD >>>>>>>>>>>> change. >>>>>>>>>>>> >>>>>>>>>>>> You'll always end-up with two notifications, either Qemu has >>>>>>>>>>>> sent the disable and so you'll have one notification for the >>>>>>>>>>>> disable and one for the enable, or it didn't sent the disable >>>>>>>>>>>> and it will happen at old value invalidation time and after >>>>>>>>>>>> new value is taken into >>>>>>>> account. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I don't see it in current QEMU behavior. >>>>>>>>>>> When working MQ I see that some virtqs get configuration >>>> message >>>>>>>>>>> while >>>>>>>>>> they are in enabled state. >>>>>>>>>>> Then, enable message is sent again later. >>>>>>>>>> >>>>>>>>>> I guess you mean the first queue pair? And it would not be in >>>>>>>>>> ready state as it would be the initial configuration of the queue? >>>>>>>>> >>>>>>>>> Even after initialization when queue is ready. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Why not to take this correct assumption and update ready >>>>>>>>>>>>> state only in one >>>>>>>>>>>> point in the code instead of doing it in all the >>>>>>>>>>>> configuration handlers >>>>>>>>>> around? >>>>>>>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner. >>>>>>>>>>>> >>>>>>>>>>>> I just looked closer at the Vhost-user spec, and I'm no more >>>>>>>>>>>> so sure this is a correct assumption: >>>>>>>>>>>> >>>>>>>>>>>> "While processing the rings (whether they are enabled or >>>>>>>>>>>> not), client must support changing some configuration aspects >>>>>>>>>>>> on the >>>> fly." >>>>>>>>>>> >>>>>>>>>>> Ok, this doesn't explain how configuration is changed on the fly. >>>>>>>>>> >>>>>>>>>> I agree it lacks a bit of clarity. >>>>>>>>>> >>>>>>>>>>> As I mentioned, QEMU sends enable message always after >>>>>>>>>>> configuration >>>>>>>>>> message. >>>>>>>>>> >>>>>>>>>> Yes, but we should not do assumptions on current Qemu version >>>>>>>>>> when possible. Better to be safe and follow the specification, >>>>>>>>>> it will be more >>>>>>>> robust. >>>>>>>>>> There is also the Virtio-user PMD to take into account for >> example. >>>>>>>>> >>>>>>>>> I understand your point here but do you really want to be ready >>>>>>>>> for any >>>>>>>> configuration update in run time? >>>>>>>>> What does it mean? How datatpath should handle configuration >>>>>>>>> from >>>>>>>> control thread in run time while traffic is on? >>>>>>>>> For example, changing queue size \ addresses must stop traffic >>>> before... >>>>>>>>> Also changing FDs is very sensitive. >>>>>>>>> >>>>>>>>> It doesn't make sense to me. >>>>>>>>> >>>>>>>>> Also, according to "on the fly" direction we should not disable >>>>>>>>> the queue >>>>>>>> unless enable message is coming to disable it. >>>>>>> >>>>>>> No response, so looks like you agree that it doesn't make sense. >>>>>> >>>>>> No, my reply was general to all your comments. >>>>>> >>>>>> With SW backend, I agree we don't need to disable the rings in case >>>>>> of asynchronous changes to the ring because we protect it with a >>>>>> lock, so we are sure the ring won't be accessed by another thread >>>>>> while doing the change. >>>>>> >>>>>> For vDPA case that's more problematic because we have no such >>>>>> locking mechanism. >>>>>> >>>>>> For example memory hotplug, Qemu does not seem to disable the >>>> queues >>>>>> so we need to stop the vDPA device one way or another so that it >>>>>> does not process the rings while the Vhost lib remaps the memory >> areas. >>>>>> >>>>>>>>> In addition: >>>>>>>>> Do you really want to toggle vDPA drivers\app for any >>>>>>>>> configuration >>>>>>>> message? It may cause queue recreation for each one (at least for >>>> mlx5). >>>>>>>> >>>>>>>> I want to have something robust and maintainable. >>>>>>> >>>>>>> Me too. >>>>>>> >>>>>>>> These messages arriving after a queue have been configured once >>>>>>>> are rare events, but this is usually the kind of things that >>>>>>>> cause maintenance >>>>>> burden. >>>>>>> >>>>>>> In case of guest poll mode (testpmd virtio) we all the time get >>>>>>> callfd >>>> twice. >>>>>> >>>>>> Right. >>>>>> >>>>>>>> If you look at my example patch, you will understand that with my >>>>>>>> proposal, there won't be any more state change notification than >>>>>>>> with your proposal when Qemu or any other Vhost-user master >> send >>>>>>>> a disable request before sending the request that impact the >>>>>>>> queue >>>> state. >>>>>>> >>>>>>> we didn't talk about disable time - this one is very simple. >>>>>>> >>>>>>> Yes, In case the queue is disabled your proposal doesn't send >>>>>>> extra >>>>>> notification as my. >>>>>>> But in case the queue is ready, your proposal send extra not ready >>>>>> notification for kikfd,callfd,set_vring_base configurations. >>>>>> >>>>>> I think this is necessary for synchronization with the Vhost-user >>>>>> master (in case the master asks for this synchronization, like >>>>>> set_mem_table for instance when reply-ack is enabled). >>>>>> >>>>>>>> It just adds more robustness if this unlikely event happens, by >>>>>>>> invalidating the ring state to not ready before doing the actual >>>>>>>> ring >>>>>> configuration change. >>>>>>>> So that this config change is not missed by the vDPA driver or >>>>>>>> the >>>>>> application. >>>>>>> >>>>>>> One more issue here is that there is some time that device is >>>>>>> ready (already >>>>>> configured) and the first vittq-pair is not ready (your invalidate >>>>>> proposal for set_vring_base). >>>>>> >>>>>> >>>>>> >>>>>>> It doesn’t save the concept that device is ready only in case the >>>>>>> first virtq- >>>>>> pair is ready. >>>>>> >>>>>> I understand the spec as "the device is ready as soon as the first >>>>>> queue pair is ready", but I might be wrong. >>>>>> >>>>>> Do you suggest to call the dev_close() vDPA callback and the >>>>>> destroy_device() application callback as soon as one of the ring of >>>>>> the first queue pair receive a disable request or, with my patch, >>>>>> when one of the rings receives a request that changes the ring state? >>>>> >>>>> I means, your proposal actually may make first virtq-pair ready >>>>> state >>>> disabled when device ready. >>>>> So, yes, it leads to call device close\destroy. >>>> >>>> No it doesn't, there is no call to .dev_close()/.destroy_device() >>>> with my patch if first queue pair gets disabled. >>>> >>>>>>> I will not insist anymore on waiting for enable for notifying >>>>>>> although I not >>>>>> fan with it. >>>>>>> >>>>>>> So, I suggest to create 1 notification function to be called after >>>>>>> message >>>>>> handler and before reply. >>>>>>> This function is the only one which notify ready states in the >>>>>>> next >>>> options: >>>>>>> >>>>>>> 1. virtq ready state is changed in the queue. >>>>>>> 2. virtq ready state stays on after configuration message handler. >>>>>>> 3. device state will be enabled when the first queue pair is ready. >>>>>> >>>>>> IIUC, it will not disable the queues when there is a state change, >>>>>> is that correct? If so, I think it does not work with memory >>>>>> hotplug case I mentioned earlier. >>>>> >>>>> It will do enable again which mean - something was modified. >>>> >>>> Ok, thanks for the clarification. >>>> >>>> I think it is not enough for the examples I gave below. For >>>> set_mem_table, we need to stop the device from processing the vrings >>>> before the set_mem_table handler calls the munmap(), and re-enable it >>>> after the >>>> mmap() (I did that wrong in my example patch, I just did that after >>>> the munmap/mmap happened, which is too late). >>>> >>>>>> Even for the callfd double change it can be problematic as >>>>>> Vhost-lib will close the first one while it will still be used by >>>>>> the driver (Btw, I see my example patch is also buggy in this >>>>>> regards, it should reset the call_fd value in the virtqueue, then >>>>>> call >>>>>> vhost_user_update_vring_state() and finally close the FD). >>>>> >>>>> Yes, this one leads for different handle for each message. >>>>> >>>>> Maybe it leads for new queue modify operation. >>>>> So, queue doesn't send the state - just does configuration change on >>>>> the >>>> fly. >>>>> >>>>> What do you think? >>>> >>>> I think that configuration on the fly doesn't fly. >>>> We would at least need to stop the device from processing the rings >>>> for memory hotplug case, so why not just send a disable notification? >>> >>> Yes, driver need notification here. >>> >>>> And for the double callfd, that does not look right to me not to >>>> request the driver to stop using it before it is closed, isn't it? >>> >>> Yes, and some drivers (include mlx5) may stop the traffic in this case too. >>> >>> modify\update operation will solve all: >>> >>> For example: >>> >>> In memory hotplug: >>> Do new mmap >>> Call modify >>> Do munmup for old. >>> >>> In callfd\kickfd change: >>> >>> Set new FD. >>> Call modify. >>> Close old FD. >>> >>> Modify is clearer, save calls and faster (datapath will back faster). >> >> It should work, but that is not light modifications to do in set_mem_table >> handler (the function is quite complex already with postcopy live-migration >> support). >> >> With a modify callback, won't the driver part be more complex? Since it >> would have to check which state has changed in the ring, and based on that >> decide whether it should stop the ring or not. >> >> As you says that in case of memory hotplug and double callfd, the driver may >> stop processing the rings anyway, so would it be that much faster than >> disabling/enabling the vring? >> >> These events having a very rare occurrence, does it really matter if it is a bit >> longer? > > > Just thinking again about memory hotplug: > > Mlx5 driver device need to be reinitialized in this case because the NIC has memory translation which must be updated before the virtqs creation. > > So, maybe we need to close and config the vDPA device in this case. Right, disabling vrings is not enough for memory hotplug. It would make sense to call dev_close and dev_conf here, that's the most conservative approach. > @Xiao Wang, can you comment the IFC behavior here. > > Matan > > >> Thanks, >> Maxime >> >>> >>>> Thanks, >>>> Maxime >>>> >>>>> >>>>>> Thanks, >>>>>> Maxime >>>>>>> >>>>>>> Matan >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Maxime >>>>>>> >>>>> >>> >