From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id AFADAA0350; Tue, 23 Jun 2020 16:34:07 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 183121D686; Tue, 23 Jun 2020 16:34:07 +0200 (CEST) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by dpdk.org (Postfix) with ESMTP id 5C6831D615 for ; Tue, 23 Jun 2020 16:34:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1592922844; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=0FAuf5+gv2P4OvrpMYuVZnTRW1ItCJ2eK7ciMhl8YWM=; b=hATYTfjOhTmXtS5pytejWJumHpmrCxuyQLELAqhQMx6jvDHAEzwIxDFTOEqEOJkikgFHjR S35Lyqsw/mz1+xF4Qe5dTZvS04BYb76/fhlsj7DaouuwGzueUNrOXJ8jos88pngc9bDA/A FvWIWWOPQ5bcTqYlS3fKJKOaXJZ9EAk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-230-GaUNA5pqPRezJTKq3ntljg-1; Tue, 23 Jun 2020 10:34:02 -0400 X-MC-Unique: GaUNA5pqPRezJTKq3ntljg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D57D01005512; Tue, 23 Jun 2020 14:34:00 +0000 (UTC) Received: from [10.36.110.7] (unknown [10.36.110.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 37A085BAE5; Tue, 23 Jun 2020 14:33:59 +0000 (UTC) From: Maxime Coquelin To: Matan Azrad , Xiao Wang Cc: "dev@dpdk.org" References: <1592497686-433697-1-git-send-email-matan@mellanox.com> <74f35cd8-3543-ba1c-0dc7-1502599e6cf7@redhat.com> <197079f4-7436-4f9e-7b8e-b8cb42042ffb@redhat.com> <62311463-2bc1-7f8b-1ec0-da31418e1d75@redhat.com> <6a2d8143-e48d-cdef-5b16-1c340d99b1ae@redhat.com> <3572c65c-394c-496f-483c-a57019550339@redhat.com> <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com> Autocrypt: addr=maxime.coquelin@redhat.com; keydata= mQINBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABtCxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPokCOAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASq5Ag0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAGJ Ah8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= Message-ID: Date: Tue, 23 Jun 2020 16:33:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <7c6bd0b2-8657-3c34-ab1f-f397c39ea2ec@redhat.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v1 3/4] vhost: improve device ready definition X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 6/23/20 3:55 PM, Maxime Coquelin wrote: > Hi Matan, > > On 6/23/20 1:53 PM, Matan Azrad wrote: >> >> >> From: Maxime Coquelin: >>> On 6/23/20 11:02 AM, Matan Azrad wrote: >>>> >>>> >>>> From: Maxime Coquelin: >>>>> On 6/22/20 5:51 PM, Matan Azrad wrote: >>>>>> >>>>>> >>>>>> From: Maxime Coquelin: >>>>>>> On 6/22/20 3:43 PM, Matan Azrad wrote: >>>>>>>> >>>>>>>> >>>>>>>> From: Maxime Coquelin: >>>>>>>>> Sent: Monday, June 22, 2020 3:33 PM >>>>>>>>> To: Matan Azrad ; Xiao Wang >>>>>>>>> >>>>>>>>> Cc: dev@dpdk.org >>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready >>>>>>>>> definition >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 6/22/20 12:06 PM, Matan Azrad wrote: >>>>>>>>>> >>>>>>>>>> Hi Maxime >>>>>>>>>> >>>>>>>>>> From: Maxime Coquelin >>>>>>>>>>> Sent: Monday, June 22, 2020 11:56 AM >>>>>>>>>>> To: Matan Azrad ; Xiao Wang >>>>>>>>>>> >>>>>>>>>>> Cc: dev@dpdk.org >>>>>>>>>>> Subject: Re: [PATCH v1 3/4] vhost: improve device ready >>>>>>>>>>> definition >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 6/22/20 10:41 AM, Matan Azrad wrote: >>>>>>>>>>>>> The issue is if you only check ready state only before and >>>>>>>>>>>>> after the message affecting the ring is handled, it can be >>>>>>>>>>>>> ready at both stages, while the rings have changed and state >>>>>>>>>>>>> change callback should >>>>>>>>>>> have been called. >>>>>>>>>>>> But in this version I checked twice, before message handler >>>>>>>>>>>> and after >>>>>>>>>>> message handler, so it should catch any update. >>>>>>>>>>> >>>>>>>>>>> No, this is not enough, we have to check also during some >>>>>>>>>>> handlers, so that the ready state is invalidated because >>>>>>>>>>> sometimes it will be ready before and after the message handler >>>>>>>>>>> but >>>>> with different values. >>>>>>>>>>> >>>>>>>>>>> That's what I did in my example patch: >>>>>>>>>>> @@ -1847,15 +1892,16 @@ vhost_user_set_vring_kick(struct >>>>>>> virtio_net >>>>>>>>>>> **pdev, struct VhostUserMsg *msg, >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> if (vq->kickfd >= 0) >>>>>>>>>>> close(vq->kickfd); >>>>>>>>>>> + >>>>>>>>>>> + vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD; >>>>>>>>>>> + >>>>>>>>>>> + vhost_user_update_vring_state(dev, file.index); >>>>>>>>>>> + >>>>>>>>>>> vq->kickfd = file.fd; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Without that, the ready check will return ready before and >>>>>>>>>>> after the kickfd changed and the driver won't be notified. >>>>>>>>>> >>>>>>>>>> The driver will be notified in the next >>>>>>>>>> VHOST_USER_SET_VRING_ENABLE >>>>>>>>> message according to v1. >>>>>>>>>> >>>>>>>>>> One of our assumption we agreed on in the design mail is that it >>>>>>>>>> doesn't >>>>>>>>> make sense that QEMU will change queue configuration without >>>>>>>>> enabling the queue again. >>>>>>>>>> Because of that we decided to force calling state callback again >>>>>>>>>> when >>>>>>>>> QEMU send VHOST_USER_SET_VRING_ENABLE(1) message even if >>> the >>>>>>> queue is >>>>>>>>> already ready. >>>>>>>>>> So when driver/app see state enable->enable, it should take into >>>>>>>>>> account >>>>>>>>> that the queue configuration was probably changed. >>>>>>>>>> >>>>>>>>>> I think that this assumption is correct according to the QEMU code. >>>>>>>>> >>>>>>>>> Yes, this was our initial assumption. >>>>>>>>> But now looking into the details of the implementation, I find it >>>>>>>>> is even cleaner & clearer not to do this assumption. >>>>>>>>> >>>>>>>>>> That's why I prefer to collect all the ready checks callbacks >>>>>>>>>> (queue state and >>>>>>>>> device new\conf) to one function that will be called after the >>>>>>>>> message >>>>>>>>> handler: >>>>>>>>>> Pseudo: >>>>>>>>>> vhost_user_update_ready_statuses() { >>>>>>>>>> switch (msg): >>>>>>>>>> case enable: >>>>>>>>>> if(enable is 1) >>>>>>>>>> force queue state =1. >>>>>>>>>> case callfd >>>>>>>>>> case kickfd >>>>>>>>>> ..... >>>>>>>>>> Check queue and device ready + call callbacks if >>> needed.. >>>>>>>>>> Default >>>>>>>>>> Return; >>>>>>>>>> } >>>>>>>>> >>>>>>>>> I find it more natural to "invalidate" ready state where it is >>>>>>>>> handled (after vring_invalidate(), before setting new FD for call >>>>>>>>> & kick, ...) >>>>>>>> >>>>>>>> I think that if you go with this direction, if the first queue >>>>>>>> pair is invalidated, >>>>>>> you need to notify app\driver also about device ready change. >>>>>>>> Also it will cause 2 notifications to the driver instead of one in >>>>>>>> case of FD >>>>>>> change. >>>>>>> >>>>>>> You'll always end-up with two notifications, either Qemu has sent >>>>>>> the disable and so you'll have one notification for the disable and >>>>>>> one for the enable, or it didn't sent the disable and it will >>>>>>> happen at old value invalidation time and after new value is taken into >>> account. >>>>>>> >>>>>> >>>>>> I don't see it in current QEMU behavior. >>>>>> When working MQ I see that some virtqs get configuration message >>>>>> while >>>>> they are in enabled state. >>>>>> Then, enable message is sent again later. >>>>> >>>>> I guess you mean the first queue pair? And it would not be in ready >>>>> state as it would be the initial configuration of the queue? >>>> >>>> Even after initialization when queue is ready. >>>> >>>>>> >>>>>>>> Why not to take this correct assumption and update ready state >>>>>>>> only in one >>>>>>> point in the code instead of doing it in all the configuration >>>>>>> handlers >>>>> around? >>>>>>>> IMO, It is correct, less intrusive, simpler, clearer and cleaner. >>>>>>> >>>>>>> I just looked closer at the Vhost-user spec, and I'm no more so >>>>>>> sure this is a correct assumption: >>>>>>> >>>>>>> "While processing the rings (whether they are enabled or not), >>>>>>> client must support changing some configuration aspects on the fly." >>>>>> >>>>>> Ok, this doesn't explain how configuration is changed on the fly. >>>>> >>>>> I agree it lacks a bit of clarity. >>>>> >>>>>> As I mentioned, QEMU sends enable message always after configuration >>>>> message. >>>>> >>>>> Yes, but we should not do assumptions on current Qemu version when >>>>> possible. Better to be safe and follow the specification, it will be more >>> robust. >>>>> There is also the Virtio-user PMD to take into account for example. >>>> >>>> I understand your point here but do you really want to be ready for any >>> configuration update in run time? >>>> What does it mean? How datatpath should handle configuration from >>> control thread in run time while traffic is on? >>>> For example, changing queue size \ addresses must stop traffic before... >>>> Also changing FDs is very sensitive. >>>> >>>> It doesn't make sense to me. >>>> >>>> Also, according to "on the fly" direction we should not disable the queue >>> unless enable message is coming to disable it. >> >> No response, so looks like you agree that it doesn't make sense. > > No, my reply was general to all your comments. > > With SW backend, I agree we don't need to disable the rings in case of > asynchronous changes to the ring because we protect it with a lock, so > we are sure the ring won't be accessed by another thread while doing the > change. > > For vDPA case that's more problematic because we have no such locking > mechanism. > > For example memory hotplug, Qemu does not seem to disable the queues so > we need to stop the vDPA device one way or another so that it does not > process the rings while the Vhost lib remaps the memory areas. > >>>> In addition: >>>> Do you really want to toggle vDPA drivers\app for any configuration >>> message? It may cause queue recreation for each one (at least for mlx5). >>> >>> I want to have something robust and maintainable. >> >> Me too. >> >>> These messages arriving after a queue have been configured once are rare >>> events, but this is usually the kind of things that cause maintenance burden. >> >> In case of guest poll mode (testpmd virtio) we all the time get callfd twice. > > Right. > >>> If you look at my example patch, you will understand that with my proposal, >>> there won't be any more state change notification than with your proposal >>> when Qemu or any other Vhost-user master send a disable request before >>> sending the request that impact the queue state. >> >> we didn't talk about disable time - this one is very simple. >> >> Yes, In case the queue is disabled your proposal doesn't send extra notification as my. >> But in case the queue is ready, your proposal send extra not ready notification for kikfd,callfd,set_vring_base configurations. > > I think this is necessary for synchronization with the Vhost-user > master (in case the master asks for this synchronization, like > set_mem_table for instance when reply-ack is enabled). > >>> It just adds more robustness if this unlikely event happens, by invalidating >>> the ring state to not ready before doing the actual ring configuration change. >>> So that this config change is not missed by the vDPA driver or the application. >> >> One more issue here is that there is some time that device is ready (already configured) and the first vittq-pair is not ready (your invalidate proposal for set_vring_base). > Sorry, I forgot to reply here. I am not sure about what do you mean about my invalidate proposal for set_vring_base? > >> It doesn’t save the concept that device is ready only in case the first virtq-pair is ready. > > I understand the spec as "the device is ready as soon as the first queue > pair is ready", but I might be wrong. > > Do you suggest to call the dev_close() vDPA callback and the > destroy_device() application callback as soon as one of the ring of the > first queue pair receive a disable request or, with my patch, when one > of the rings receives a request that changes the ring state? > >> >> I will not insist anymore on waiting for enable for notifying although I not fan with it. >> >> So, I suggest to create 1 notification function to be called after message handler and before reply. >> This function is the only one which notify ready states in the next options: >> >> 1. virtq ready state is changed in the queue. >> 2. virtq ready state stays on after configuration message handler. >> 3. device state will be enabled when the first queue pair is ready. > > IIUC, it will not disable the queues when there is a state change, is > that correct? If so, I think it does not work with memory hotplug case I > mentioned earlier. > > Even for the callfd double change it can be problematic as Vhost-lib > will close the first one while it will still be used by the driver (Btw, > I see my example patch is also buggy in this regards, it should reset > the call_fd value in the virtqueue, then call > vhost_user_update_vring_state() and finally close the FD). > > Thanks, > Maxime >> >> Matan >> >> >> >>> Maxime >> >