From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2AE77A0546; Thu, 27 May 2021 20:15:29 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AA1D240150; Thu, 27 May 2021 20:15:28 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id A8A5840143 for ; Thu, 27 May 2021 20:15:27 +0200 (CEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Thu, 27 May 2021 20:15:19 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35C617E4@smartserver.smartshare.dk> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [dpdk-dev] rte_memcpy - fence and stream Thread-Index: AddTHPwIqjlkBkgjRFSl+BhQXcU3ggABFdIQ References: <98CBD80474FA8B44BF855DF32C47DC35C617E1@smartserver.smartshare.dk> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Bruce Richardson" , "Manish Sharma" Cc: Subject: Re: [dpdk-dev] rte_memcpy - fence and stream X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > Sent: Thursday, 27 May 2021 19.22 >=20 > On Thu, May 27, 2021 at 10:39:59PM +0530, Manish Sharma wrote: > > For the case I have, hardly 2% of the data buffers which are = being > > copied get looked at - mostly its for DMA. Having a version of > DPDK > > memcopy that does non temporal copies would definitely be good. > > If in my case, I have a lot of CPUs doing the copy in parallel, > would > > I/OAT driver copy accelerator still help? > > > It will depend upon the size of the copies being done. For bigger > packets > the accelerator can help free up CPU cycles for other things. >=20 > However, if only 2% of the data which is being copied gets looked at, > why > does it need to be copied? Can the original buffers not be used in = that > case? I can only speak for myself here... Our firmware has a packet capture feature with a filter. If a packet matches the capture filter, a metadata header and the = relevant part of the packet contents ("snap length" in tcpdump = terminology) is appended to a large memory area (the "capture buffer") = using rte_pktmbuf_read/rte_memcpy. This capture buffer is only read = through the GUI or management API by the network administrator, i.e. it = will only be read minutes or hours later, so there is no need to put any = of it in any CPU cache. It does not make sense to clone and hold on to many thousands of mbufs = when we only need some of their contents. So we copy the contents = instead of increasing the mbuf refcount. We currently only use our packet capture feature for R&D purposes, so we = have not optimized it yet. However, we will need to optimize it for = production use at some point. So I find this discussion initiated by = Manish very interesting. -Morten