From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id ACE08A0545;
	Wed, 10 Aug 2022 13:59:42 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 676484068E;
	Wed, 10 Aug 2022 13:59:42 +0200 (CEST)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id 2F80F4067C
 for <dev@dpdk.org>; Wed, 10 Aug 2022 13:59:40 +0200 (CEST)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id D34379523
 for <dev@dpdk.org>; Wed, 10 Aug 2022 13:59:39 +0200 (CEST)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id B7A86958A; Wed, 10 Aug 2022 13:59:39 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-1.6 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6
X-Spam-Score: -1.6
Received: from [192.168.1.59] (unknown [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 3FB719263;
 Wed, 10 Aug 2022 13:59:39 +0200 (CEST)
Message-ID: <233ac43d-d1f2-4d59-fea6-017f1c1520ba@lysator.liu.se>
Date: Wed, 10 Aug 2022 13:59:39 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: Re: [RFC v2] non-temporal memcpy
Content-Language: en-US
To: =?UTF-8?Q?Morten_Br=c3=b8rup?= <mb@smartsharesystems.com>,
 Stephen Hemminger <stephen@networkplumber.org>
Cc: dev@dpdk.org, Bruce Richardson <bruce.richardson@intel.com>,
 Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
 Jan Viktorin <viktorin@rehivetech.com>, Ruifeng Wang <ruifeng.wang@arm.com>,
 David Christensen <drc@linux.vnet.ibm.com>,
 Stanislaw Kardach <kda@semihalf.com>
References: <98CBD80474FA8B44BF855DF32C47DC35D871D4@smartserver.smartshare.dk>
 <9ac934d2-ad05-6ec9-3bb6-63986d68d5d3@lysator.liu.se>
 <98CBD80474FA8B44BF855DF32C47DC35D87247@smartserver.smartshare.dk>
 <20220809082602.1fe5bd89@hermes.local>
 <98CBD80474FA8B44BF855DF32C47DC35D8724C@smartserver.smartshare.dk>
From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= <hofors@lysator.liu.se>
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D8724C@smartserver.smartshare.dk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2022-08-09 19:24, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 9 August 2022 17.26
>>
>> On Tue, 9 Aug 2022 11:46:19 +0200
>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>
>>>>
>>>> I don't think memcpy() functions should have alignment
>> requirements.
>>>> That's not very practical, and violates the principle of least
>>>> surprise.
>>>
>>> I didn't make the CPUs with these alignment requirements.
>>>
>>> However, I will offer optimized performance in a generic NT memcpy()
>> function in the cases where the individual alignment requirements of
>> various CPUs happen to be met.
>>
>> Rather than making a generic equivalent memcpy function, why not have
>> something which only takes aligned data.
> 
> Our application is copying data not meeting x86 NT load alignment requirements (16 byte), so the function must support that. Specifically, our application is copying complete or truncated IP packets excl. the Ethernet and VLAN headers, i.e. offset by 14, 18 or 22 byte from the cache line aligned packet buffer.
> 

Sure, but you can use regular loads for the non-aligned parts, and the 
you continue to use NT load for the rest of the data. I suspect there is 
no point in doing NT loads for data on the same cache line that you've 
done regular loads for, so you might as well treat the alignment 
requirements as 64 byte, not 16.

>> And to avoid user confusion
>> change the name to be something not suggestive of memcpy.
>>
>> Maybe rte_non_cache_copy()?
>>
>> Want to avoid the naive user just doing s/memcpy/rte_memcpy_nt/ and
>> expect
>> everything to work.
> 
> I see the risk you point out here... But it's not advertised in presentations, whitepapers and elsewhere like rte_memcpy() having much better performance than classic memcpy(), which might lead to that misconception. So the probability should be low.
>