From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4379543DE7; Wed, 3 Apr 2024 12:19:59 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1B8CF402CE; Wed, 3 Apr 2024 12:19:59 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 051CA4025C for ; Wed, 3 Apr 2024 12:19:56 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712139596; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=p+GHQ2sCRL8EadfaD1cVlmdOhWQTm0BSLCObSwSQlzY=; b=ezTQ4yrsyr0fHq8ZlnUmFDoCNeEWNRMuYC708o3kQPP86Cbdi+Jjv/ad0oMJel9Wz7uTn7 pAaSlpmsaDeQvIQXWZ2TZRwg3UDHnbl8/5ZmesW6jy9DfSqRGWkSCeQTkdFkR8HlekxCcC 3jFzEUfGckvpiAvvMOazr9sdlzcw6vA= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-295-epYvQAOeNnWtaSz6d4KClg-1; Wed, 03 Apr 2024 06:19:53 -0400 X-MC-Unique: epYvQAOeNnWtaSz6d4KClg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2BDCF1C106AE; Wed, 3 Apr 2024 10:19:53 +0000 (UTC) Received: from [10.39.208.23] (unknown [10.39.208.23]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 186D81074E; Wed, 3 Apr 2024 10:19:51 +0000 (UTC) Message-ID: <98c1642c-c32d-46b5-a34c-4bfcc530905f@redhat.com> Date: Wed, 3 Apr 2024 12:19:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] vhost: optimize mbuf allocation in virtio Tx packed path To: Andrey Ignatov , dev@dpdk.org Cc: Chenbo Xia , Wei Shen References: <20240328233338.56544-1-rdna@apple.com> From: Maxime Coquelin Autocrypt: addr=maxime.coquelin@redhat.com; keydata= xsFNBFOEQQIBEADjNLYZZqghYuWv1nlLisptPJp+TSxE/KuP7x47e1Gr5/oMDJ1OKNG8rlNg kLgBQUki3voWhUbMb69ybqdMUHOl21DGCj0BTU3lXwapYXOAnsh8q6RRM+deUpasyT+Jvf3a gU35dgZcomRh5HPmKMU4KfeA38cVUebsFec1HuJAWzOb/UdtQkYyZR4rbzw8SbsOemtMtwOx YdXodneQD7KuRU9IhJKiEfipwqk2pufm2VSGl570l5ANyWMA/XADNhcEXhpkZ1Iwj3TWO7XR uH4xfvPl8nBsLo/EbEI7fbuUULcAnHfowQslPUm6/yaGv6cT5160SPXT1t8U9QDO6aTSo59N jH519JS8oeKZB1n1eLDslCfBpIpWkW8ZElGkOGWAN0vmpLfdyiqBNNyS3eGAfMkJ6b1A24un /TKc6j2QxM0QK4yZGfAxDxtvDv9LFXec8ENJYsbiR6WHRHq7wXl/n8guyh5AuBNQ3LIK44x0 KjGXP1FJkUhUuruGyZsMrDLBRHYi+hhDAgRjqHgoXi5XGETA1PAiNBNnQwMf5aubt+mE2Q5r qLNTgwSo2dpTU3+mJ3y3KlsIfoaxYI7XNsPRXGnZi4hbxmeb2NSXgdCXhX3nELUNYm4ArKBP LugOIT/zRwk0H0+RVwL2zHdMO1Tht1UOFGfOZpvuBF60jhMzbQARAQABzSxNYXhpbWUgQ29x dWVsaW4gPG1heGltZS5jb3F1ZWxpbkByZWRoYXQuY29tPsLBeAQTAQIAIgUCV3u/5QIbAwYL CQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQyjiNKEaHD4ma2g/+P+Hg9WkONPaY1J4AR7Uf kBneosS4NO3CRy0x4WYmUSLYMLx1I3VH6SVjqZ6uBoYy6Fs6TbF6SHNc7QbB6Qjo3neqnQR1 71Ua1MFvIob8vUEl3jAR/+oaE1UJKrxjWztpppQTukIk4oJOmXbL0nj3d8dA2QgHdTyttZ1H xzZJWWz6vqxCrUqHU7RSH9iWg9R2iuTzii4/vk1oi4Qz7y/q8ONOq6ffOy/t5xSZOMtZCspu Mll2Szzpc/trFO0pLH4LZZfz/nXh2uuUbk8qRIJBIjZH3ZQfACffgfNefLe2PxMqJZ8mFJXc RQO0ONZvwoOoHL6CcnFZp2i0P5ddduzwPdGsPq1bnIXnZqJSl3dUfh3xG5ArkliZ/++zGF1O wvpGvpIuOgLqjyCNNRoR7cP7y8F24gWE/HqJBXs1qzdj/5Hr68NVPV1Tu/l2D1KMOcL5sOrz 2jLXauqDWn1Okk9hkXAP7+0Cmi6QwAPuBT3i6t2e8UdtMtCE4sLesWS/XohnSFFscZR6Vaf3 gKdWiJ/fW64L6b9gjkWtHd4jAJBAIAx1JM6xcA1xMbAFsD8gA2oDBWogHGYcScY/4riDNKXi lw92d6IEHnSf6y7KJCKq8F+Jrj2BwRJiFKTJ6ChbOpyyR6nGTckzsLgday2KxBIyuh4w+hMq TGDSp2rmWGJjASrOwU0EVPSbkwEQAMkaNc084Qvql+XW+wcUIY+Dn9A2D1gMr2BVwdSfVDN7 0ZYxo9PvSkzh6eQmnZNQtl8WSHl3VG3IEDQzsMQ2ftZn2sxjcCadexrQQv3Lu60Tgj7YVYRM H+fLYt9W5YuWduJ+FPLbjIKynBf6JCRMWr75QAOhhhaI0tsie3eDsKQBA0w7WCuPiZiheJaL 4MDe9hcH4rM3ybnRW7K2dLszWNhHVoYSFlZGYh+MGpuODeQKDS035+4H2rEWgg+iaOwqD7bg CQXwTZ1kSrm8NxIRVD3MBtzp9SZdUHLfmBl/tLVwDSZvHZhhvJHC6Lj6VL4jPXF5K2+Nn/Su CQmEBisOmwnXZhhu8ulAZ7S2tcl94DCo60ReheDoPBU8PR2TLg8rS5f9w6mLYarvQWL7cDtT d2eX3Z6TggfNINr/RTFrrAd7NHl5h3OnlXj7PQ1f0kfufduOeCQddJN4gsQfxo/qvWVB7PaE 1WTIggPmWS+Xxijk7xG6x9McTdmGhYaPZBpAxewK8ypl5+yubVsE9yOOhKMVo9DoVCjh5To5 aph7CQWfQsV7cd9PfSJjI2lXI0dhEXhQ7lRCFpf3V3mD6CyrhpcJpV6XVGjxJvGUale7+IOp sQIbPKUHpB2F+ZUPWds9yyVxGwDxD8WLqKKy0WLIjkkSsOb9UBNzgRyzrEC9lgQ/ABEBAAHC wV8EGAECAAkFAlT0m5MCGwwACgkQyjiNKEaHD4nU8hAAtt0xFJAy0sOWqSmyxTc7FUcX+pbD KVyPlpl6urKKMk1XtVMUPuae/+UwvIt0urk1mXi6DnrAN50TmQqvdjcPTQ6uoZ8zjgGeASZg jj0/bJGhgUr9U7oG7Hh2F8vzpOqZrdd65MRkxmc7bWj1k81tOU2woR/Gy8xLzi0k0KUa8ueB iYOcZcIGTcs9CssVwQjYaXRoeT65LJnTxYZif2pfNxfINFzCGw42s3EtZFteczClKcVSJ1+L +QUY/J24x0/ocQX/M1PwtZbB4c/2Pg/t5FS+s6UB1Ce08xsJDcwyOPIH6O3tccZuriHgvqKP yKz/Ble76+NFlTK1mpUlfM7PVhD5XzrDUEHWRTeTJSvJ8TIPL4uyfzhjHhlkCU0mw7Pscyxn DE8G0UYMEaNgaZap8dcGMYH/96EfE5s/nTX0M6MXV0yots7U2BDb4soLCxLOJz4tAFDtNFtA wLBhXRSvWhdBJZiig/9CG3dXmKfi2H+wdUCSvEFHRpgo7GK8/Kh3vGhgKmnnxhl8ACBaGy9n fxjSxjSO6rj4/MeenmlJw1yebzkX8ZmaSi8BHe+n6jTGEFNrbiOdWpJgc5yHIZZnwXaW54QT UhhSjDL1rV2B4F28w30jYmlRmm2RdN7iCZfbyP3dvFQTzQ4ySquuPkIGcOOHrvZzxbRjzMx1 Mwqu3GQ= In-Reply-To: <20240328233338.56544-1-rdna@apple.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 3/29/24 00:33, Andrey Ignatov wrote: > Currently virtio_dev_tx_packed() always allocates requested @count of > packets, no matter how many packets are really available on the virtio > Tx ring. Later it has to free all packets it didn't use and if, for > example, there were zero available packets on the ring, then all @count > mbufs would be allocated just to be freed afterwards. > > This wastes CPU cycles since rte_pktmbuf_alloc_bulk() / > rte_pktmbuf_free_bulk() do quite a lot of work. > > Optimize it by using the same idea as the virtio_dev_tx_split() uses on > the Tx split path: estimate the number of available entries on the ring > and allocate only that number of mbufs. > > On the split path it's pretty easy to estimate. > > On the packed path it's more work since it requires checking flags for > up to @count of descriptors. Still it's much less expensive than the > alloc/free pair. > > The new get_nb_avail_entries_packed() function doesn't change how > virtio_dev_tx_packed() works with regard to memory barriers since the > barrier between checking flags and other descriptor fields is still in > place later in virtio_dev_tx_batch_packed() and > virtio_dev_tx_single_packed(). > > The difference for a guest transmitting ~17Gbps with MTU 1500 on a `perf > record` / `perf report` (on lower pps the savings will be bigger): > > * Before the change: > > Samples: 18K of event 'cycles:P', Event count (approx.): 19206831288 > Children Self Pid:Command > - 100.00% 100.00% 798808:dpdk-worker1 > <... skip ...> > - 99.09% pkt_burst_io_forward > - 90.26% common_fwd_stream_receive > - 90.04% rte_eth_rx_burst > - 75.53% eth_vhost_rx > - 74.29% rte_vhost_dequeue_burst > - 71.48% virtio_dev_tx_packed_compliant > + 17.11% rte_pktmbuf_alloc_bulk > + 11.80% rte_pktmbuf_free_bulk > + 2.11% vhost_user_inject_irq > 0.75% rte_pktmbuf_reset > 0.53% __rte_pktmbuf_free_seg_via_array > 0.88% vhost_queue_stats_update > + 13.66% mlx5_rx_burst_vec > + 8.69% common_fwd_stream_transmit > > * After: > > Samples: 18K of event 'cycles:P', Event count (approx.): 19225310840 > Children Self Pid:Command > - 100.00% 100.00% 859754:dpdk-worker1 > <... skip ...> > - 98.61% pkt_burst_io_forward > - 86.29% common_fwd_stream_receive > - 85.84% rte_eth_rx_burst > - 61.94% eth_vhost_rx > - 60.05% rte_vhost_dequeue_burst > - 55.98% virtio_dev_tx_packed_compliant > + 3.43% rte_pktmbuf_alloc_bulk > + 2.50% vhost_user_inject_irq > 1.17% vhost_queue_stats_update > 0.76% rte_rwlock_read_unlock > 0.54% rte_rwlock_read_trylock > + 22.21% mlx5_rx_burst_vec > + 12.00% common_fwd_stream_transmit > > It can be seen that virtio_dev_tx_packed_compliant() goes from 71.48% to > 55.98% with rte_pktmbuf_alloc_bulk() going from 17.11% to 3.43% and > rte_pktmbuf_free_bulk() going away completely. > > Signed-off-by: Andrey Ignatov > --- > lib/vhost/virtio_net.c | 33 +++++++++++++++++++++++++++++++++ > 1 file changed, 33 insertions(+) > Thanks for the contribution and the detailed commit message. Reviewed-by: Maxime Coquelin Maxime