From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 11EC546AD9 for ; Tue, 8 Jul 2025 01:04:16 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AF08240287; Tue, 8 Jul 2025 01:04:15 +0200 (CEST) Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) by mails.dpdk.org (Postfix) with ESMTP id AA7D04025D for ; Tue, 8 Jul 2025 01:04:13 +0200 (CEST) Received: by mail-qv1-f52.google.com with SMTP id 6a1803df08f44-70109af5f70so37708736d6.0 for ; Mon, 07 Jul 2025 16:04:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1751929453; x=1752534253; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=KfpOl0ZizZ8B8uicMBJ4iFyqBA+NBDutGTJVJT43wgo=; b=X6Lwwj6zIVYWKgCs7aQInm1lmw724o8vM+BEPQ8qtZgabTCxKxwYUGTzvOsf+8yU9T pPoht1ogQFd2eS6qRAeOYg1Y3g8yqjQSjQ7VgSRmrfiSZ+jwH7yBBePistA7NIfjjf3f m6oXXD8lXR0sLRDkyhUR8Mih07GRkKhhRgFtcXTacFBhEs5qZnVUZlRoASd9sd3u8tVr SfZMsneknZjh+EYN2vzbG1DbkxoQ0XMB5ZGF97ikND/3pHJt/BQy+cvGlxiJDSpQ6vSU CZA4ztX7LVFx9IMpLDcg+cJQb3NjjDQzacxCptCZc8a7IaJ9MIB//1n1guyPP0phcpFf 5/uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751929453; x=1752534253; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KfpOl0ZizZ8B8uicMBJ4iFyqBA+NBDutGTJVJT43wgo=; b=VqsIbYg0YxOBUb/u5kaFwus2dLNJwB5kxQtU8T8ndUHmZFHFvHxegA15+BWsnypPmP N7YXuKq//w3NZir9RvOo5scmvQEv7nWpViKmuaFZlE5MD/L92FEBDECV4GSsj2V6yIHf f2a0bTtCdyBwbCM1xlIopITXEaiqQzdXuSdlWmM3o2E56Ru/lIZdgk0vskhaRWSMyDJf H0dHulqmhf0u5cYtk5Bw0FdX9rvitJrq+uynD/f6qIEmvIeLgKQ73bXJk301rqN3xY1b TFSGJsEoGv/EieLOkW6ORtnfCzIrAgkHCfF3XKsXeFZCu42v3mphQBcfToo3KIEYaMca HyAw== X-Forwarded-Encrypted: i=1; AJvYcCUDqE7penbRtW5+23CCMBrIuOic1o9tnFXKpXrWs9QO/MplMxTmOTz7wOvUkeDhEWA+jRa8nA==@dpdk.org X-Gm-Message-State: AOJu0YwmjAmwib6G6UzlFD2lE7pp38U9GFIZ8ntK23Gz551t/YzNsU9E swLo5/ii2QVR3fb4aUe+tYr9RrMiODdc6Hy1/3gL06Hy91iBomdrvaDeXD3CZep1JE4eA4fsQuc yOZze X-Gm-Gg: ASbGncusEmVEM+40Oma0SHhasKFOIqlMh2Zp+HGnEjb68LeAFyPs/YAphunRH/rAPRx mi4NIO0W5JxJa+Fv2WCeR2lqisGqf80DRh9zx8U1Do0pxRPk0g9q1UdG99SDB96P4lojNPdO4W7 Bv4qLFKy1uWblsaeCP1f7xxngH8yvXtEc3NJ4N5f7RpVtmUhaQgDAQDwx8SVuDvs7XgX+lkoYpX LVC5B4FkmHzvXOeU9VG8P3M/Cl5MDD0lzGKz6QZcHgt/VnEv1Q62LY7Nz8IKJocZ2UI7wpnPwqZ 6LY9eOeddPGDy2ZOHkp0WVx89xovz94EX9hjvECmS7nlcaHdRRHOqZkNtWR0S0ZwgovBe5sQbVT FIu7EVqpE+YEi4SOAjs5wXRRLvw3BwNKeGJyXRcE= X-Google-Smtp-Source: AGHT+IEGkWEOBbnZdDBwbAkr7HsaFG7e4+JPc4qXYBCPQYsR3VslZ4+3rZ6MpuMAuupfM/GfPCWoWw== X-Received: by 2002:a05:6214:3f85:b0:704:7d7a:7874 with SMTP id 6a1803df08f44-7047d806d15mr13103756d6.0.1751929452742; Mon, 07 Jul 2025 16:04:12 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-702c4cc7751sm66753146d6.10.2025.07.07.16.04.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Jul 2025 16:04:12 -0700 (PDT) Date: Mon, 7 Jul 2025 16:04:09 -0700 From: Stephen Hemminger To: Ivan Malov Cc: "Lombardo, Ed" , users Subject: Re: dpdk Tx falling short Message-ID: <20250707160409.75fbc2f1@hermes.local> In-Reply-To: <9ae56e38-0d29-4c7c-0bc2-f92912146da2@arknetworks.am> References: <20250704074957.5848175a@hermes.local> <20250705120834.78849e56@hermes.local> <20250706090232.635bd36e@hermes.local> <9ae56e38-0d29-4c7c-0bc2-f92912146da2@arknetworks.am> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org On Tue, 8 Jul 2025 01:49:44 +0400 (+04) Ivan Malov wrote: > Hi Ed, > > On Mon, 7 Jul 2025, Lombardo, Ed wrote: > > > Hi Stephen, > > I ran a perf diff on two perf records and reveals the real problem with the tx thread in transmitting packets. > > > > The comparison is traffic received on ifn3 and transmit ifn4 to traffic received on ifn3, ifn5 and transmit on ifn4, ifn6. > > When transmit packets on one port the performance is better, however when transmit on two ports the performance across the two drops dramatically. > > > > There is increase of 55.29% of the CPU spent in common_ring_mp_enqueue and 54.18% less time in i40e_xmit_pkts (was E810 tried x710). > > The common_ring_mp_enqueue is multi-producer, is the enqueue of mbuf pointers passed in to rte_eth_tx_burst() have to be multi-producer? > > I may be wrong, but rte_eth_tx_burst(), as part of what is known as "reap" > process, should check for "done" Tx descriptors resulting from previous > invocations and free (enqueue) the associated mbufs into respective mempools. > In your case, you say you only have a single mempool shared between the port > pairs, which, as I understand, are served by concurrent threads, so it might be > logical to use a multi-producer mempool in this case. Or am I missing something? > > The pktmbuf API for mempool allocation is a wrapper around generic API and it > might request multi-producer multi-consumer by default (see [1], 'flags'). > According to your original mempool monitor printout, the per-lcore cache size is > 512. On the premise that separate lcores serve the two port pairs, and taking > into account the burst size, it should be OK, yet you may want to play with the > per-lcore cache size argument when creating the pool. Does it change anything? > > Regarding separate mempools, -- I saw Stephen's response about those making CPU > cache behaviour worse and not better. Makes sense and I won't argue. And yet, > why not just try an make sure this indeed holds in this particular case? Also, > since you're seeking single-producer behaviour, having separate per-port-pair > mempools might allow to create such (again, see 'flags' at [1]), provided that > API [1] is used for mempool creation. Please correct me in case I'm mistaken. > > Also, PMDs can support "fast free" Tx offload. Please see [2] to check whether > the application asks for this offload flag or not. It may be worth enabling. > > [1] https://doc.dpdk.org/api-25.03/rte__mempool_8h.html#a0b64d611bc140a4d2a0c94911580efd5 > [2] https://doc.dpdk.org/api-25.03/rte__ethdev_8h.html#a43f198c6b59d965130d56fd8f40ceac1 > > Thank you. > > > > > Is there a way to change dpdk to use single-producer? > > > > # Event 'cycles' > > # > > # Baseline Delta Abs Shared Object Symbol > > # ........ ......... ................. ...................................... > > # > > 36.37% +55.29% test [.] common_ring_mp_enqueue > > 62.36% -54.18% test [.] i40e_xmit_pkts > > 1.10% -0.94% test [.] dpdk_tx_thread > > 0.01% -0.01% [kernel.kallsyms] [k] native_sched_clock > > +0.00% [kernel.kallsyms] [k] fill_pmd > > +0.00% [kernel.kallsyms] [k] perf_sample_event_took > > 0.00% +0.00% [kernel.kallsyms] [k] __flush_smp_call_function_queue > > 0.02% [kernel.kallsyms] [k] __intel_pmu_enable_all.constprop.0 > > 0.02% [kernel.kallsyms] [k] native_irq_return_iret > > 0.02% [kernel.kallsyms] [k] native_tss_update_io_bitmap > > 0.01% [kernel.kallsyms] [k] ktime_get > > 0.01% [kernel.kallsyms] [k] perf_adjust_freq_unthr_context > > 0.01% [kernel.kallsyms] [k] __update_blocked_fair > > 0.01% [kernel.kallsyms] [k] perf_adjust_freq_unthr_events > > > > Thanks, > > Ed > > > > -----Original Message----- > > From: Lombardo, Ed > > Sent: Sunday, July 6, 2025 1:45 PM > > To: Stephen Hemminger > > Cc: Ivan Malov ; users > > Subject: RE: dpdk Tx falling short > > > > Hi Stephen, > > If using dpdk rings comes with this penalty then what should I use, is there an alterative to rings. We do not want to use shared memory and do buffer copies? > > > > Thanks, > > Ed > > > > -----Original Message----- > > From: Stephen Hemminger > > Sent: Sunday, July 6, 2025 12:03 PM > > To: Lombardo, Ed > > Cc: Ivan Malov ; users > > Subject: Re: dpdk Tx falling short > > > > External Email: This message originated outside of NETSCOUT. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > On Sun, 6 Jul 2025 00:03:16 +0000 > > "Lombardo, Ed" wrote: > > > >> Hi Stephen, > >> Here are comments to the list of obvious causes of cache misses you mentiond. > >> > >> Obvious cache misses. > >> - passing packets to worker with ring - we use lots of rings to pass mbuf pointers. If I skip the rte_eth_tx_burst() and just free mbuf bulk, the tx ring does not fill up. > >> - using spinlocks (cost 16ns) - The driver does not use spinlocks, other than what dpdk uses. > >> - fetching TSC - We don't do this, we let Rx offload timestamp packets. > >> - syscalls? - No syscalls are done in our driver fast path. > >> > >> You mention "passing packets to worker with ring", do you mean using rings to pass mbuf pointers causes cache misses and should be avoided? > > > > Rings do cause data to be modified by one core and examined by another so they are a cache miss. > > > > How many packets is your application seeing per-burst? Ideally it should be getting chunks not single packet a time. And then the driver can use defer free to put back bursts. If you have multi-stage pipeline it helps if you pass a burst to each stage rather than looping over the burst in the outer loop. Imagine getting a burst of 16 packets. If you pass an array down the pipeline, then there is one call per burst. If you process packets one at a time, it can mean 16 calls, and if the pipeline exceeds the instruction cache it can mean 16 cache misses. The point is bursting is a big win in data and instruction cache. If you really want to tune investigate prefetching like VPP does.