From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 9993243EFA;
	Thu, 25 Apr 2024 00:27:42 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 49AD4402BB;
	Thu, 25 Apr 2024 00:27:42 +0200 (CEST)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id 2B1E14028A
 for <dev@dpdk.org>; Thu, 25 Apr 2024 00:27:41 +0200 (CEST)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 5F1EAC181
 for <dev@dpdk.org>; Thu, 25 Apr 2024 00:27:40 +0200 (CEST)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id 3D4E3C0E3; Thu, 25 Apr 2024 00:27:40 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0
X-Spam-Score: -1.3
Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se
 [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id BFEDDC121;
 Thu, 25 Apr 2024 00:27:36 +0200 (CEST)
Message-ID: <2371b1a8-bdc5-4184-8491-54e2e3a64211@lysator.liu.se>
Date: Thu, 25 Apr 2024 00:27:36 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <hofors@lysator.liu.se>
Subject: Re: [PATCH] net/af_packet: cache align Rx/Tx structs
To: Stephen Hemminger <stephen@networkplumber.org>,
 Ferruh Yigit <ferruh.yigit@amd.com>
Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>,
 "John W . Linville" <linville@tuxdriver.com>, dev@dpdk.org,
 Tyler Retzlaff <roretzla@linux.microsoft.com>,
 Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
References: <20240423090813.94110-1-mattias.ronnblom@ericsson.com>
 <6f7aabcb-2c12-4cfe-ae9d-73b42bfd4977@amd.com>
 <63dbb564-61f6-4d9f-9c13-4a21f5e97dc9@lysator.liu.se>
 <5d2a0887-527a-4948-943c-65f1dfda9328@amd.com>
 <3b2cf48e-2293-4226-b6cd-5f4dd3969f99@lysator.liu.se>
 <0ff40e60-926b-44eb-8af5-2e16aff1c336@amd.com>
 <20240424121330.7547e290@hermes.local>
Content-Language: en-US
In-Reply-To: <20240424121330.7547e290@hermes.local>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2024-04-24 21:13, Stephen Hemminger wrote:
> On Wed, 24 Apr 2024 18:50:50 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>>> I don't know how slow af_packet is, but if you care about performance,
>>> you don't want to use atomic add for statistics.
>>>    
>>
>> There are a few soft drivers already using atomics adds for updating stats.
>> If we document expectations from 'rte_eth_stats_reset()', we can update
>> those usages.
> 
> Using atomic add is lots of extra overhead. The statistics are not guaranteed
> to be perfect.  If nothing else, the bytes and packets can be skewed.
> 

The sad thing here is that in case the counters are reset within the 
load-modify-store cycle of the lcore counter update, the reset may end 
up being a nop. So, it's not like you missed a packet or two, or suffer 
some transient inconsistency, but you completed and permanently ignored 
the reset request.

> The soft drivers af_xdp, af_packet, and tun performance is dominated by the
> overhead of the kernel system call and copies. Yes, alignment is good
> but won't be noticeable.

There aren't any syscalls in the RX path in the af_packet PMD.

I added the same statistics updates as the af_packet PMD uses into an 
benchmark app which consumes ~1000 cc in-between stats updates.

If the equivalent of the RX queue struct was cache aligned, the 
statistics overhead was so small it was difficult to measure. Less than 
3-4 cc per update. This was with volatile, but without atomics.

If the RX queue struct wasn't cache aligned, and sized so a cache line 
generally was used by two (neighboring) cores, the stats incurred a cost 
of ~55 cc per update.

Shaving off 55 cc should translate to a couple of hundred percent 
increased performance for an empty af_packet poll. If your lcore has 
some other primary source of work than the af_packet RX queue, and the 
RX queue is polled often, then this may well be a noticeable gain.

The benchmark was run on 16 Gracemont cores, which in my experience 
seems to have a little shorter core-to-core latency than many other 
systems, provided the remote core/cache line owner is located in the 
same cluster.