From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 75CB8A054F; Wed, 7 Sep 2022 16:47:17 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1ECD240143; Wed, 7 Sep 2022 16:47:17 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 2A71C400D6 for ; Wed, 7 Sep 2022 16:47:15 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: RE: [PATCH v4] vhost: support CPU copy for small packets Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Wed, 7 Sep 2022 16:47:13 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87300@smartserver.smartshare.dk> In-Reply-To: <20220829005658.84590-1-wenwux.ma@intel.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH v4] vhost: support CPU copy for small packets Thread-Index: Adi7QnZ/y5S9XdSNSAGDiB6Y+JwrbQHg/6Yw References: <20220812064517.272530-1-wenwux.ma@intel.com> <20220829005658.84590-1-wenwux.ma@intel.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Wenwu Ma" , , , , "Bruce Richardson" Cc: , , , , , , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Wenwu Ma [mailto:wenwux.ma@intel.com] > Sent: Monday, 29 August 2022 02.57 >=20 > Offloading small packets to DMA degrades throughput 10%~20%, > and this is because DMA offloading is not free and DMA is not > good at processing small packets. In addition, control plane > packets are usually small, and assign those packets to DMA will > significantly increase latency, which may cause timeout like > TCP handshake packets. Therefore, this patch use CPU to perform > small copies in vhost. >=20 > Signed-off-by: Wenwu Ma > --- [...] > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > index 35fa4670fd..cf796183a0 100644 > --- a/lib/vhost/virtio_net.c > +++ b/lib/vhost/virtio_net.c > @@ -26,6 +26,8 @@ >=20 > #define MAX_BATCH_LEN 256 >=20 > +#define CPU_COPY_THRESHOLD_LEN 256 This threshold may not be optimal for all CPU architectures and/or DMA = engines. Could you please provide a test application to compare the performance = of DMA copy with CPU rte_memcpy? The performance metric should be simple: How many cycles does the CPU = spend copying various packet sizes using each the two methods. You could provide test_dmadev_perf.c in addition to the existing = test_dmadev.c. You can probably copy a some of the concepts and code from = test_memcpy_perf.c. Alternatively, you might be able to add DMA copy to test_memcpy_perf.c. I'm sorry to push this on you - it should have been done as part of = DMAdev development already. -Morten