From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A80DC45501; Wed, 26 Jun 2024 16:58:45 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 815934026C; Wed, 26 Jun 2024 16:58:45 +0200 (CEST) Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by mails.dpdk.org (Postfix) with ESMTP id 18EEE4026B for ; Wed, 26 Jun 2024 16:58:44 +0200 (CEST) Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-7041053c0fdso4018279b3a.3 for ; Wed, 26 Jun 2024 07:58:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1719413923; x=1720018723; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=gCzilQdC8E8HiNgBmVMy8GNRZXn5SP5hqmlzvDpeGmM=; b=Wl/UryF8RmqxWCPdJeYb6yIMO8SWQ7tV769ZR2l9bkUqCSshpn/kgGkKz7T3iDzJdM 9rDgFCeshhfCQUkmN77AVkey7aFZSqj7KZdS740UWFdDpIJqQ2nCXBBwivqHvhVOgw0l 2y82DAxXwhFU0nhEQ+qehlzsJUxjy7sxmTzo0l+hMl1BpqGOtFOSFjqrqEHpThNLSGwT gDXG9PYv0TLZ/QE42zTvh5OAOwsvdGbQi2PxQWuRwRROINdk/FVZajSVTD4RWXAYsiZo 5gnrMqdkiKD8upKrH4UPdPiHX2XKZKw1IMNQlka+dt/9UUL1yLfB218p2X6kLEDdMGNw PhaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719413923; x=1720018723; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gCzilQdC8E8HiNgBmVMy8GNRZXn5SP5hqmlzvDpeGmM=; b=gRUvobG3hhx/hdfERLl7Wrmcd27L5tIcKVD9LhfOFJtMlvPkcwwtuwOoK0thqzfAxJ WEVqsIyYmssyfWp0pY9mEW8+0e7auZdQaUew+MknB25dJcoiQP8zqKjh0k568v1iN7qY RqYr1BlYeT40ITVdtzeW6yLO3gxvDBT7/fpFnCy1XYUlloEXP6SctF2laigWSNQG+jba X/wdsua3iPpcHwdJcfMyQ/tHrMO4kLP7v/3rGcKZosDTg7VbXnKGQxJJ8u1eui+z0uTY 5WGhsrhrcu3PYQDEXJvvVWW6r467+DumA8jhRi1pUzEMoAz8YPTbHZDs71Zs6ot6O8OD CORg== X-Forwarded-Encrypted: i=1; AJvYcCUFkCbTeG22QsrLxU6klGqb8k9rFecjgPIsbEHph3mxbYManL3vMON0gsveyD1NlHFbN32YsqXouUXl2Yw= X-Gm-Message-State: AOJu0YxACrzS4KK1LBa12Gb6d51Xq5fxLG/fjleGis2WiRX07Meq6bIg N0GFuD9umAsvaH6AtwH65icmed1rUQY+SHtJ9Fpe3W30N5nLlk4JcIWTSDxARBo= X-Google-Smtp-Source: AGHT+IHgZE2XjYoyMU51jyibyADkKv/W4bqxoeYZm2GySrbDM1GJBqBfSUybNHGp81ZaDbwlouf06Q== X-Received: by 2002:a05:6a00:1896:b0:706:755a:d1d1 with SMTP id d2e1a72fcca58-706755ad2d4mr13006997b3a.33.1719413922955; Wed, 26 Jun 2024 07:58:42 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-706a7aab875sm1492173b3a.103.2024.06.26.07.58.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jun 2024 07:58:42 -0700 (PDT) Date: Wed, 26 Jun 2024 07:58:41 -0700 From: Stephen Hemminger To: Maxime Coquelin Cc: Mattias =?UTF-8?B?UsO2bm5ibG9t?= , Mattias =?UTF-8?B?UsO2bm5ibG9t?= , dev@dpdk.org, Morten =?UTF-8?B?QnLDuHJ1cA==?= , Abdullah Sevincer , Pavan Nikhilesh , David Hunt , Vladimir Medvedkin , Bruce Richardson Subject: Re: [PATCH v4 00/13] Optionally have rte_memcpy delegate to compiler memcpy Message-ID: <20240626075841.5e63e7c0@hermes.local> In-Reply-To: <3eebd7f7-9ba2-424c-80d1-6efa8945641d@redhat.com> References: <20240620115027.420304-2-mattias.ronnblom@ericsson.com> <20240620175731.420639-1-mattias.ronnblom@ericsson.com> <3eebd7f7-9ba2-424c-80d1-6efa8945641d@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, 26 Jun 2024 10:37:31 +0200 Maxime Coquelin wrote: > On 6/25/24 21:27, Mattias R=C3=B6nnblom wrote: > > On Tue, Jun 25, 2024 at 05:29:35PM +0200, Maxime Coquelin wrote: =20 > >> Hi Mattias, > >> > >> On 6/20/24 19:57, Mattias R=C3=B6nnblom wrote: =20 > >>> This patch set make DPDK library, driver, and application code use the > >>> compiler/libc memcpy() by default when functions in are > >>> invoked. > >>> > >>> The various custom DPDK rte_memcpy() implementations may be retained > >>> by means of a build-time option. > >>> > >>> This patch set only make a difference on x86, PPC and ARM. Loongarch > >>> and RISCV already used compiler/libc memcpy(). =20 > >> > >> It indeed makes a difference on x86! > >> > >> Just tested latest main with and without your series on > >> Intel(R) Xeon(R) Gold 6438N. > >> > >> The test is a simple IO loop between a Vhost PMD and a Virtio-user PMD: > >> # dpdk-testpmd -l 4-6 --file-prefix=3Dvirtio1 --no-pci --vdev 'net_v= irtio_user0,mac=3D00:01:02:03:04:05,path=3D./vhost-net,server=3D1,mrg_rxbuf= =3D1,in_order=3D1' > >> --single-file-segments -- -i =20 > >> testpmd> start =20 > >> > >> # dpdk-testpmd -l 8-10 --file-prefix=3Dvhost1 --no-pci --vdev > >> 'net_vhost0,iface=3Dvhost-net,client=3D1' --single-file-segments -- = -i =20 > >> testpmd> start tx_first 32 =20 > >> > >> Latest main: 14.5Mpps > >> Latest main + this series: 10Mpps > >> =20 > >=20 > > I ran the above benchmark on my Raptor Lake desktop (locked to 3,2 > > GHz). GCC 12.3.0. > >=20 > > Core use_cc_memcpy Mpps > > E false 9.5 > > E true 9.7 > > P false 16.4 > > P true 13.5 > >=20 > > On the P-cores, there's a significant performance regression, although > > not as bad as the one you see on your Sapphire Rapids Xeon. On the > > E-cores, there's actually a slight performance gain. > >=20 > > The virtio PMD does not directly invoke rte_memcpy() or anything else > > from , but rather use memcpy(), so I'm not sure I > > understand what's going on here. Does the virtio driver delegate some > > performance-critical task to some module that in turns uses > > rte_memcpy()? =20 >=20 > This is because Vhost is the bottleneck here, not Virtio driver. > Indeed, the virtqueues memory belongs to the Virtio driver and the > descriptors buffers are Virtio's mbufs, so not much memcpy's are done > there. >=20 > Vhost however, is a heavy memcpy user, as all the descriptors buffers=20 > are copied to/from its mbufs. Would be good to now the size (if small it is inlining that matters, or maybe alignment matters), and have test results for multiple compiler versi= ons. Ideally, feed results back and update Gcc and Clang. DPDK doesn't need to be in the optimize C library space.