From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 443AAA0032; Fri, 14 Jan 2022 11:23:41 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B149542775; Fri, 14 Jan 2022 11:23:40 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 2B8A340DDD for ; Fri, 14 Jan 2022 11:23:38 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1642155819; x=1673691819; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=QquekEkouIe4TyBYUhBjZPtLjNx3j+wVaYJ0AETnz8E=; b=go7DX9p8WMbZc66pvh6rR457a7NsCeR+R+nZzXNUYB5q+r8IEe/YK6jR h1yMZ3KofADNuI+hGy3s9H7JO6Q5wZ8lt4y11GaY4QkwC9SuLS7Fqeg3x Vx9n5LUEhkdSAqBRjRZu1mQp3H1GnrBcIJ7OIscAPrDx5T0aD94hgd5Lz +DeldBWAvj8GoXOQsKl57LWEhNtMA3j/m+scUXWwf+isIN07sNR/p0oi2 4NdMXNZJZ9WE5vS032RXBFTA4/cC10BgPcNEwqPBaCWCVUIRE+CT26OPX QyXhh5fzVzeIOsVlSN+L5g21y4vC/xQIGtYiMwYktCx/DVVu/mVCZVRfP Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10226"; a="307565458" X-IronPort-AV: E=Sophos;i="5.88,288,1635231600"; d="scan'208";a="307565458" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2022 02:23:32 -0800 X-IronPort-AV: E=Sophos;i="5.88,288,1635231600"; d="scan'208";a="475727679" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.26.25]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 14 Jan 2022 02:23:31 -0800 Date: Fri, 14 Jan 2022 10:22:36 +0000 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Jan Viktorin , Ruifeng Wang , David Christensen , Konstantin Ananyev , dev@dpdk.org Subject: Re: rte_memcpy alignment Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86E00@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86E02@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86E02@smartserver.smartshare.dk> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Jan 14, 2022 at 10:53:54AM +0100, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Friday, 14 January 2022 10.11 > > > > On Fri, Jan 14, 2022 at 09:56:50AM +0100, Morten Brørup wrote: > > > Dear ARM/POWER/x86 maintainers, > > > > > > The architecture specific rte_memcpy() provides optimized variants to > > copy aligned data. However, the alignment requirements depend on the > > hardware architecture, and there is no common definition for the > > alignment. > > > > > > DPDK provides __rte_cache_aligned for cache optimization purposes, > > with architecture specific values. Would you consider providing an > > __rte_memcpy_aligned for rte_memcpy() optimization purposes? > > > > > > Or should I just use __rte_cache_aligned, although it is overkill? > > > > > > > > > Specifically, I am working on a mempool optimization where the objs > > field in the rte_mempool_cache structure may benefit by being aligned > > for optimized rte_memcpy(). > > > > > For me the difficulty with such a memcpy proposal - apart from probably > > adding to the amount of memcpy code we have to maintain - is the > > specific meaning > > of what "aligned" in the memcpy case. Unlike for a struct definition, > > the > > possible meaning of aligned in memcpy could be: > > * the source address is aligned > > * the destination address is aligned > > * both source and destination is aligned > > * both source and destination are aligned and the copy length is a > > multiple > > of the alignment length > > * the data is aligned to a cacheline boundary > > * the data is aligned to the largest load-store size for system > > * the data is aligned to the boundary suitable for the copy size, e.g. > > memcpy of 8 bytes is 8-byte aligned etc. > > > > Can you clarify a bit more on your own thinking here? Personally, I am > > a > > little dubious of the benefit of general memcpy optimization, but I do > > believe that for specific usecases there is value is having their own > > copy > > operations which include constraints for that specific usecase. For > > example, in the AVX-512 ice/i40e PMD code, we fold the memcpy from the > > mempool cache into the descriptor rearm function because we know we can > > always do 64-byte loads and stores, and also because we know that for > > each > > load in the copy, we can reuse the data just after storing it (giving > > good > > perf boost). Perhaps something similar could work for you in your > > mempool > > optimization. > > > > /Bruce > > I'm going to copy array of pointers, specifically the 'objs' array in the rte_mempool_cache structure. > > The 'objs' array starts at byte 24, which is only 8 byte aligned. So it always fails the ALIGNMENT_MASK test in the x86 specific rte_memcpy(), and thus cannot ever use the optimized rte_memcpy_aligned() function to copy the array, but will use the rte_memcpy_generic() function. > > If the 'objs' array was optimally aligned, and the other array that is being copied to/from is also optimally aligned, rte_memcpy() would use the optimized rte_memcpy_aligned() function. > > Please also note that the value of ALIGNMENT_MASK depends on which vector instruction set DPDK is being compiled with. > > The other CPU architectures have similar stuff in their rte_memcpy() implementations, and their alignment requirements are also different. > > Please also note that rte_memcpy() becomes even more optimized when the size of the memcpy() operation is known at compile time. > > So I am asking for a public #define __rte_memcpy_aligned I can use to meet the alignment requirements for optimal rte_memcpy(). > Thanks for that, I misunderstood your original ask. Things are clearer now, and it seems reasonable.