From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 30BF1A0C47; Wed, 27 Oct 2021 15:35:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1B24C40DDA; Wed, 27 Oct 2021 15:35:11 +0200 (CEST) Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by mails.dpdk.org (Postfix) with ESMTP id 7B035407FF for ; Wed, 27 Oct 2021 15:35:09 +0200 (CEST) Received: by mail-ed1-f41.google.com with SMTP id y12so10833200eda.4 for ; Wed, 27 Oct 2021 06:35:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vvdntech-in.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1CD2KSd7480AUdUl6T6A4V/fBllUaILMDnluHFFpz4M=; b=rPci+35+nYKDz7Ksevs/OBBPnZp6oY9ZuFjsB7fg1by6xHArl5W9FEQG5q877p00qq OpMlpUcX+ke0qRCI1UUQDiKMR0eEMlVcr8vbFEoExvNDh/lOzOMTOnFoX8yYXDaJWexJ g4D4606fecFHEcQBEy+Bj4HWpih9s3kpTXheC4fKTaUAncVPVt0otVrRU3Apl2m+Jf2s 4z+IRtflXwbwLS+6j6wy+muvnvwKx6iaTOTLZYx45JUJkLF4EXkTU8r+CgH9+VnPXW8J bJVuIb4Dm9NeVOO9n3xxVi0tFh3bRDo/uMBbJRwUYQaFCnd1oBVsyA9FT9QylAhO96xL MCMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1CD2KSd7480AUdUl6T6A4V/fBllUaILMDnluHFFpz4M=; b=QryMT1wiPC64PD9xCnE+dYBS10+5H4CqrpD4zAwa6SbLmAc206lSNnCeuCs2ljDtK2 yHhuC0QH2l+NT2BtbqxfvrWRfbiT/spIXrbLaoN1A7+b1n19z2HxEZeLKAp1GmV0L1Gb xx/vaxzVZjUNfqGdtnK5oNZTGIl/iDJPZIeT6lJc9eSqOYto4Csj//57rnxoIXjztSFU zf/JQhPCIiaMlfTHM6YRVGBmo6eN1SVF3k3xvjIXnRMkd1tfeLRHtbSGh2sw6h/YUFnf iPenvuagg9BIABYezSNHTYahC3WRLi6QGp42ZbQFGgB7QCDSxZjYaFYE1bmBNJPEESik m6Gg== X-Gm-Message-State: AOAM530+YnPJHBtTOthWu4NYnzMhEmdWpgtiLINYdw7CKmdQhHf0jOut sHbxjrPvM1wdSQC6nfQcO2T7LvnI35VLFZp8MCW7Gw== X-Google-Smtp-Source: ABdhPJykWVF6p1EMKm/ZaQHidkeucaHgUHP9Iihb+/we8TQX5p2knkskM/LRVhBmqDd/lUYX2f5fsTISXG2GmSwdjVA= X-Received: by 2002:a05:6402:2550:: with SMTP id l16mr44765891edb.229.1635341708910; Wed, 27 Oct 2021 06:35:08 -0700 (PDT) MIME-Version: 1.0 References: <20211026155645.246783-1-aman.kumar@vvdntech.in> <20211027072810.257795-1-aman.kumar@vvdntech.in> <20211027072810.257795-2-aman.kumar@vvdntech.in> <1932804.9rrtejxFVQ@thomas> In-Reply-To: From: Aman Kumar Date: Wed, 27 Oct 2021 19:04:57 +0530 Message-ID: To: "Ananyev, Konstantin" Cc: "Van Haaren, Harry" , "mattias.ronnblom" , Thomas Monjalon , "dev@dpdk.org" , "viacheslavo@nvidia.com" , "Burakov, Anatoly" , "Song, Keesang" , "jerinjacobk@gmail.com" , "Richardson, Bruce" , "honnappa.nagarahalli@arm.com" , Ruifeng Wang , David Christensen , "david.marchand@redhat.com" , "stephen@networkplumber.org" Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH v4 2/2] lib/eal: add temporal store memcpy support for AMD platform X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Oct 27, 2021 at 5:53 PM Ananyev, Konstantin < konstantin.ananyev@intel.com> wrote > > > > Hi Mattias, > > > > > > 6) What is the use-case for this? When would a user *want* to use > this instead > > > of rte_memcpy()? > > > > If the data being loaded is relevant to datapath/packets, presumably > other > > > packets might require the > > > > loaded data, so temporal (normal) loads should be used to cache the > source > > > data? > > > > > > > > > I'm not sure if your first question is rhetorical or not, but a > memcpy() > > > in a NT variant is certainly useful. One use case for a memcpy() with > > > temporal loads and non-temporal stores is if you need to archive packet > > > payload for (distant, potential) future use, and want to avoid causing > > > unnecessary LLC evictions while doing so. > > > > Yes I agree that there are certainly benefits in using cache-locality > hints. > > There is an open question around if the src or dst or both are > non-temporal. > > > > In the implementation of this patch, the NT/T type of store is reversed > from your use-case: > > 1) Loads are NT (so loaded data is not cached for future packets) > > 2) Stores are T (so copied/dst data is now resident in L1/L2) > > > > In theory there might even be valid uses for this type of memcpy where > loaded > > data is not needed again soon and stored data is referenced again soon, > > although I cannot think of any here while typing this mail.. > > > > I think some use-case examples, and clear documentation on when/how to > choose > > between rte_memcpy() or any (potential future) rte_memcpy_nt() variants > is required > > to progress this patch. > > > > Assuming a strong use-case exists, and it can be clearly indicators to > users of DPDK APIs which > > rte_memcpy() to use, we can look at technical details around enabling > the implementation. > > > > +1 here. > Function behaviour and restrictions (src parameter needs to be 16/32 B > aligned, etc.), > along with expected usage scenarios have to be documented properly. > Again, as Harry pointed out, I don't see any AMD specific instructions in > this function, > so presumably such function can go into __AVX2__ code block and no new > defines will > be required. > > Agreed that APIs are generic but we've kept under an AMD flag for a simple reason that it is NOT tested on any other platform. A use-case on how to use this was planned earlier for mlx5 pmd but dropped in this version of patch as the data path of mlx5 is going to be refactored soon and may not be useful for future versions of mlx5 (>22.02). Ref link: adaptation to mlx5 mprq (*we've plan to adapt this into future version*) The patch in the link basically enhances mlx5 mprq implementation for our specific use-case and with 128B packet size, we achieve ~60% better perf. We understand the use of this copy function should be documented which we shall plan along with few other platform specific optimizations in future versions of DPDK. As this does not conflict with other platforms, can we still keep under AMD flag for now as suggested by Thomas?