From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5C2EDA034C; Sun, 7 Aug 2022 22:40:46 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2C9F24014F; Sun, 7 Aug 2022 22:40:46 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 1F8D4400D7 for ; Sun, 7 Aug 2022 22:40:45 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 9C93E41FD for ; Sun, 7 Aug 2022 22:40:44 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 9B0FB4433; Sun, 7 Aug 2022 22:40:44 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A, T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6 X-Spam-Score: -1.7 Received: from [192.168.1.59] (unknown [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 345534332 for ; Sun, 7 Aug 2022 22:40:44 +0200 (CEST) Message-ID: <4ef33229-c9dd-3043-7f2d-25102b823cac@lysator.liu.se> Date: Sun, 7 Aug 2022 22:40:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC v2] non-temporal memcpy Content-Language: en-US To: dev@dpdk.org References: <98CBD80474FA8B44BF855DF32C47DC35D871D4@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D871DB@smartserver.smartshare.dk> <262c214b-7870-a221-2621-6684dce42823@yandex.ru> <98CBD80474FA8B44BF855DF32C47DC35D871E6@smartserver.smartshare.dk> <2c646d01-14d0-e5cb-2d7c-50c8456fc3e5@yandex.ru> <98CBD80474FA8B44BF855DF32C47DC35D8720C@smartserver.smartshare.dk> <5e1567fb744841a0915348397a81b99d@huawei.com> <20220729090548.2cdffd4e@hermes.local> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= In-Reply-To: <20220729090548.2cdffd4e@hermes.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2022-07-29 18:05, Stephen Hemminger wrote: > On Fri, 29 Jul 2022 12:13:52 +0000 > Konstantin Ananyev wrote: > >> Sorry, missed that part. >> >>> >>>> Another question - who will do 'sfence' after the copying? >>>> Would it be inside memcpy_nt (seems quite costly), or would >>>> it be another API function for that: memcpy_nt_flush() or so? >>> >>> Outside. Only the developer knows when it is required, so it wouldn't make any sense to add the cost inside memcpy_nt(). >>> >>> I don't think we should add a flush function; it would just be another name for an already existing function. Referring to the required >>> operation in the memcpy_nt() function documentation should suffice. >>> >> >> Ok, but again wouldn't it be arch specific? >> AFAIK for x86 it needs to boil down to sfence, for other architectures - I don't know. >> If you think there already is some generic one (rte_wmb?) that would always produce >> correct instructions - sure let's use it. >> >> > > It makes sense in a few select places to use non-temporal copy. > But it would add unnecessary complexity to DPDK if every function in DPDK that could > cause a copy had a non-temporal variant. A NT load and NT store variant, plus a NT load+store variant. :) > > Maybe just having rte_memcpy have a threshold (config value?) that if copy is larger than > a certain size, then it would automatically be non-temporal. Small copies wouldn't matter, > the optimization is more about not stopping cache size issues with large streams of data. I don't think there's any way for rte_memcpy() to know if the application plan to use the source, the destination, both, or neither of the buffers in the immediate future. For huge copies (MBs or more) the size heuristic makes sense, but for medium sized copies (say a packet worth of data), I'm not so sure. What is unclear to me is if there is a benefit (or drawback) of using the imaginary rte_memcpy_nt(), compared to doing rte_memcpy() + clflushopt or cldemote, in the typical use case (if there is such).