DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Ferruh Yigit <ferruh.yigit@amd.com>
Cc: "Mattias Rönnblom" <mattias.ronnblom@ericsson.com>,
	"John W . Linville" <linville@tuxdriver.com>,
	dev@dpdk.org, "Tyler Retzlaff" <roretzla@linux.microsoft.com>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH] net/af_packet: cache align Rx/Tx structs
Date: Thu, 25 Apr 2024 00:27:36 +0200	[thread overview]
Message-ID: <2371b1a8-bdc5-4184-8491-54e2e3a64211@lysator.liu.se> (raw)
In-Reply-To: <20240424121330.7547e290@hermes.local>

On 2024-04-24 21:13, Stephen Hemminger wrote:
> On Wed, 24 Apr 2024 18:50:50 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>> I don't know how slow af_packet is, but if you care about performance,
>>> you don't want to use atomic add for statistics.
>>>    
>>
>> There are a few soft drivers already using atomics adds for updating stats.
>> If we document expectations from 'rte_eth_stats_reset()', we can update
>> those usages.
> 
> Using atomic add is lots of extra overhead. The statistics are not guaranteed
> to be perfect.  If nothing else, the bytes and packets can be skewed.
> 

The sad thing here is that in case the counters are reset within the 
load-modify-store cycle of the lcore counter update, the reset may end 
up being a nop. So, it's not like you missed a packet or two, or suffer 
some transient inconsistency, but you completed and permanently ignored 
the reset request.

> The soft drivers af_xdp, af_packet, and tun performance is dominated by the
> overhead of the kernel system call and copies. Yes, alignment is good
> but won't be noticeable.

There aren't any syscalls in the RX path in the af_packet PMD.

I added the same statistics updates as the af_packet PMD uses into an 
benchmark app which consumes ~1000 cc in-between stats updates.

If the equivalent of the RX queue struct was cache aligned, the 
statistics overhead was so small it was difficult to measure. Less than 
3-4 cc per update. This was with volatile, but without atomics.

If the RX queue struct wasn't cache aligned, and sized so a cache line 
generally was used by two (neighboring) cores, the stats incurred a cost 
of ~55 cc per update.

Shaving off 55 cc should translate to a couple of hundred percent 
increased performance for an empty af_packet poll. If your lcore has 
some other primary source of work than the af_packet RX queue, and the 
RX queue is polled often, then this may well be a noticeable gain.

The benchmark was run on 16 Gracemont cores, which in my experience 
seems to have a little shorter core-to-core latency than many other 
systems, provided the remote core/cache line owner is located in the 
same cluster.

  reply	other threads:[~2024-04-24 22:27 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23  9:08 Mattias Rönnblom
2024-04-23 11:15 ` Ferruh Yigit
2024-04-23 20:56   ` Mattias Rönnblom
2024-04-24  0:27     ` Honnappa Nagarahalli
2024-04-24  6:28       ` Mattias Rönnblom
2024-04-24 10:21     ` Ferruh Yigit
2024-04-24 10:28       ` Bruce Richardson
2024-04-24 18:02         ` Ferruh Yigit
2024-04-24 11:57       ` Mattias Rönnblom
2024-04-24 17:50         ` Ferruh Yigit
2024-04-24 19:13           ` Stephen Hemminger
2024-04-24 22:27             ` Mattias Rönnblom [this message]
2024-04-24 23:55               ` Stephen Hemminger
2024-04-25  9:26                 ` Mattias Rönnblom
2024-04-25  9:49                   ` Morten Brørup
2024-04-25 14:04                   ` Ferruh Yigit
2024-04-25 15:06                     ` Mattias Rönnblom
2024-04-25 16:21                       ` Ferruh Yigit
2024-04-25 15:07                     ` Stephen Hemminger
2024-04-25 14:08   ` Ferruh Yigit
2024-04-25 15:08     ` Mattias Rönnblom
2024-04-25 15:35       ` Ferruh Yigit
2024-04-26  7:25         ` Mattias Rönnblom
2024-04-26  7:38 ` Mattias Rönnblom
2024-04-26  8:27   ` Ferruh Yigit
2024-04-26 10:20     ` Mattias Rönnblom
2024-04-26  9:05   ` [PATCH v3] " Mattias Rönnblom
2024-04-26  9:22     ` Morten Brørup
2024-04-26 15:10     ` Stephen Hemminger
2024-04-26 15:41     ` Tyler Retzlaff
2024-04-29  8:46       ` Ferruh Yigit
2024-04-26 21:27 ` [PATCH] " Patrick Robb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2371b1a8-bdc5-4184-8491-54e2e3a64211@lysator.liu.se \
    --to=hofors@lysator.liu.se \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=linville@tuxdriver.com \
    --cc=mattias.ronnblom@ericsson.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).