RE: [PATCH] net: increase the maximum of RX/TX descriptors

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>,
	"Lukáš Šišmiš" <sismis@cesnet.cz>
Cc: <anatoly.burakov@intel.com>, <ian.stokes@intel.com>,
	<dev@dpdk.org>, <bruce.richardson@intel.com>
Subject: RE: [PATCH] net: increase the maximum of RX/TX descriptors
Date: Tue, 5 Nov 2024 09:49:39 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F876@smartserver.smartshare.dk> (raw)
In-Reply-To: <20241030090643.66af553f@hermes.local>

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 30 October 2024 17.07
> 
> On Wed, 30 Oct 2024 16:40:10 +0100
> Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> 
> > On 30. 10. 24 16:20, Stephen Hemminger wrote:
> > > On Wed, 30 Oct 2024 14:58:40 +0100
> > > Lukáš Šišmiš <sismis@cesnet.cz> wrote:
> > >
> > >> On 29. 10. 24 15:37, Morten Brørup wrote:
> > >>>> From: Lukas Sismis [mailto:sismis@cesnet.cz]
> > >>>> Sent: Tuesday, 29 October 2024 13.49
> > >>>>
> > >>>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> > >>>> This can be limiting for applications requiring a bigger buffer
> > >>>> capabilities. The cap prevented the applications to configure
> > >>>> more descriptors. By bufferring more packets with RX/TX
> > >>>> descriptors, the applications can better handle the processing
> > >>>> peaks.
> > >>>>
> > >>>> Signed-off-by: Lukas Sismis <sismis@cesnet.cz>
> > >>>> ---
> > >>> Seems like a good idea.
> > >>>
> > >>> Have the max number of descriptors been checked with the
> datasheets for all the affected NIC chips?
> > >>>
> > >> I was hoping to get some feedback on this from the Intel folks.
> > >>
> > >> But it seems like I can change it only for ixgbe (82599) to 32k
> > >> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are
> capped at
> > >> 8k - 32.
> > >>
> > >> I neither have any experience with other drivers nor I have them
> > >> available to test so I will let it be in the follow-up version of
> this
> > >> patch.
> > >>
> > >> Lukas
> > >>
> > > Having large number of descriptors especially at lower speeds will
> > > increase buffer bloat. For real life applications, do not want
> increase
> > > latency more than 1ms.
> > >
> > > 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> > > Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us
> per packet)
> > > A ring of 4096 descriptors can take 6 ms for full size packets.
> > >
> > > Be careful, optimizing for 64 byte benchmarks can be disaster in
> real world.
> > >
> > Thanks for the info Stephen, however I am not trying to optimize for
> 64
> > byte benchmarks. The work has been initiated by an IO problem and
> Intel
> > NICs. Suricata IDS worker (1 core per queue) received a burst of
> packets
> > and then sequentially processes them one by one. Well it seems like
> > having a 4k buffers it seems to not be enough. NVIDIA NICs allow e.g.
> > 32k descriptors and it works fine. In the end it worked fine when
> ixgbe
> > descriptors were increased as well. I am not sure why AF-Packet can
> > handle this much better than DPDK, AFP doesn't have crazy high number
> of
> > descriptors configured <= 4096, yet it works better. At the moment I
> > assume there is an internal buffering in the kernel which allows to
> > handle processing spikes.
> >
> > To give more context here is the forum discussion -
> > https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-
> to-af-packet-in-suricata-7-0-7/4896
> >
> >
> >
> 
> I suspect AF_PACKET provides an intermediate step which can buffer more
> or spread out the work.

Agree. It's a Linux scheduling issue.

With DPDK polling, there is no interrupt in the kernel scheduler.
If the CPU core running the DPDK polling thread is running some other thread when the packets arrive on the hardware, the DPDK polling thread is NOT scheduled immediately, but has to wait for the kernel scheduler to switch to this thread instead of the other thread.
Quite a lot of time can pass before this happens - the kernel scheduler does not know that the DPDK polling thread has urgent work pending.
And the number of RX descriptors needs to be big enough to absorb all packets arriving during the scheduling delay.
It is not well described how to *guarantee* that nothing but the DPDK polling thread runs on a dedicated CPU core.


With AF_PACKET, the hardware generates an interrupt, and the kernel immediately calls the driver's interrupt handler - regardless what the CPU core is currently doing.
The driver's interrupt handler acknowledges the interrupt to the hardware and informs the kernel that the softirq handler is pending.
AFAIU, the kernel executes pending softirq handlers immediately after returning from an interrupt handler - regardless what the CPU core was doing the interrupt occurred.
The softirq handler then dequeues the packets from the hardware RX descriptors into SKBs, and when all of them have been dequeued from the hardware, enables interrupts. Then the CPU core resumes the work it was doing when interrupted.

next prev parent reply	other threads:[~2024-11-05  8:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-29 12:48 Lukas Sismis
2024-10-29 14:37 ` Morten Brørup
2024-10-30 13:58   ` Lukáš Šišmiš
2024-10-30 15:20     ` Stephen Hemminger
2024-10-30 15:40       ` Lukáš Šišmiš
2024-10-30 15:58         ` Bruce Richardson
2024-10-30 16:06         ` Stephen Hemminger
2024-11-05  8:49           ` Morten Brørup [this message]
2024-11-05 15:55             ` Stephen Hemminger
2024-11-05 16:50               ` Morten Brørup
2024-11-05 21:20                 ` Lukáš Šišmiš
2024-10-30 15:06 ` [PATCH v2 1/2] net/ixgbe: " Lukas Sismis
2024-10-30 15:06   ` [PATCH v2 2/2] net/ice: " Lukas Sismis
2024-10-30 15:42 ` [PATCH v3 1/1] net/bonding: make bonding functions stable Lukas Sismis
2024-10-30 15:42   ` [PATCH v3 1/2] net/ixgbe: increase the maximum of RX/TX descriptors Lukas Sismis
2024-10-30 16:26     ` Morten Brørup
2024-11-01 11:16       ` Bruce Richardson
2024-10-30 15:42   ` [PATCH v3 2/2] net/ice: " Lukas Sismis
2024-10-30 16:26     ` Morten Brørup
2024-10-31  2:24   ` [PATCH v3 1/1] net/bonding: make bonding functions stable lihuisong (C)
2024-11-06  2:14     ` Ferruh Yigit
  -- strict thread matches above, loose matches on Subject: below --
2024-10-29 12:46 [PATCH] net: increase the maximum of RX/TX descriptors Lukas Sismis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F876@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ian.stokes@intel.com \
    --cc=sismis@cesnet.cz \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).