From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DF35AA052B; Tue, 28 Jul 2020 15:50:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7DE9F1BFF3; Tue, 28 Jul 2020 15:50:35 +0200 (CEST) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by dpdk.org (Postfix) with ESMTP id 4F8251BFF2 for ; Tue, 28 Jul 2020 15:50:33 +0200 (CEST) Received: by mail-wm1-f68.google.com with SMTP id k20so10617100wmi.5 for ; Tue, 28 Jul 2020 06:50:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=xnGIhLFvylbhpi+tWFDSqMPS68AL7rgSxBaiO4FgAzo=; b=GvYLjAWnAom1dzfLXBjjlbV2pLZm8wInlpHYwzuGMfsvCvPnM9M0nWyMr0b6SIuCyW DhiL0UqIbJEaPTFNj+d+u7rE0sf/WcATV5WYomfl72e4AEK7fehYgtuhVcXMviA/SKTL E8Jy+ZNVKqYwjw8hA3m1RXBjerIiFW1HaU0kepHzh7i9pvnxC99Auai7ugBE8PagHTLj R+60D1vWck0ZRF6Q1MyHsVZI95sNslyKGlz3PqbcclTqb2mt6PfJDArDKSQMNT2wy1Pt VOsWLkNV8pt3rCW1xc0kIR8X2HH38MzD/znnQ/K9E3D9Wq7MsQpXsF6BgNuULEdCNrCz oRBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=xnGIhLFvylbhpi+tWFDSqMPS68AL7rgSxBaiO4FgAzo=; b=ngcLi5SQeDwSAlP0P77nS8uIBTPt9EVap5eswRUZOqMpG9YMIKifRQwElTOgjLX/Ng ga1APEcj3rHfzHlltBfqbus1NaRuZwTDXDaQHlbwRecyo7F0rRAT1buuphCAhG6M1AFN eAr5/QeDEdT8IUl2SrNW2jBNRXofnedqPMWKyBInpl9pdmnvBsVRjIlI6/cZqEVtnFFL ROF6pNImGFB9MBmch1l2uocerIMJ5jjZ2UT3o3ltEH+fnMuurA/DEcBk5jXsSoshZ4Tl MXGQxgMzJ2VUDAqWAY7KM6cDYo/KalOWE6eaBHqvO20YTDRDH77zoDz3MHWb/lgsLAD7 5vEQ== X-Gm-Message-State: AOAM533hHY6fVP1lznDFKaFxbMVi+q68qa20UqerLIBDMX1HHtqKlo3C xbUyIHJbN6UK4bRlwz2nIxtosQ== X-Google-Smtp-Source: ABdhPJz/8VcrzjY9J8kmg974ZVAUwzwShChYV0bttPtmVDj9fi5ArDjdm6yO4/GU18GK78RXPD/BiQ== X-Received: by 2002:a1c:3c82:: with SMTP id j124mr3996777wma.145.1595944232904; Tue, 28 Jul 2020 06:50:32 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id t189sm4527971wmf.47.2020.07.28.06.50.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jul 2020 06:50:32 -0700 (PDT) Date: Tue, 28 Jul 2020 15:50:31 +0200 From: Olivier Matz To: Sarosh Arif Cc: Stephen Hemminger , dev@dpdk.org Message-ID: <20200728135031.GX5869@platinum> References: <20200723070240.14749-1-sarosh.arif@emumba.com> <20200723084715.1f055aff@hermes.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy() code semantics with optimized rte_memcpy() X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Sarosh, On Tue, Jul 28, 2020 at 06:30:46PM +0500, Sarosh Arif wrote: > Hello, > The following things made me think that rte_memcpy() is more optimized > than memcpy(): > 1. dpdk documentation recommends to use rte_memcpy() instead of memcpy(): > https://doc.dpdk.org/guides/prog_guide/writing_efficient_code.html > 2. Here some benchmarks are available: > https://software.intel.com/content/www/us/en/develop/articles/performance-optimization-of-memcpy-in-dpdk.html > 3. rte_memcpy() has __attribute__((always_inline)) associated with it, > so compiler also tries to inline it. > > Using rte_memcpy() everywhere ensures consistency in code-base. > Here are the results of the performance number measurement using "perf": > > rte_memcpy() > > Performance counter stats > 1.573864 task-clock (msec) # 0.898 CPUs > utilized > 0 context-switches # 0.000 K/sec > 0 cpu-migrations # 0.000 K/sec > 342 page-faults # 0.217 M/sec > 5,483,016 cycles # 3.484 GHz > 5,554,017 instructions # 1.01 insn per > cycle > 1,114,593 branches # 708.189 M/sec > 33,796 branch-misses # 3.03% of all > branches > 1,369,247 L1-dcache-loads # 869.991 M/sec > L1-dcache-load-misses > (0.00%) > LLC-loads > (0.00%) > LLC-load-misses > (0.00%) > > 0.001753373 seconds time elapsed > > > > memcpy() > > Performance counter stats > 1.631135 task-clock (msec) # 0.902 CPUs > utilized > 0 context-switches # 0.000 K/sec > 0 cpu-migrations # 0.000 K/sec > 342 page-faults # 0.210 M/sec > 5,676,549 cycles # 3.480 GHz > (73.99%) > 5,739,593 instructions # 1.01 insn per > cycle > 1,141,121 branches # 699.587 M/sec > 34,553 branch-misses # 3.03% of all > branches > 1,417,494 L1-dcache-loads # 869.023 M/sec > 67,312 L1-dcache-load-misses # 4.75% of all > L1-dcache hits (26.01%) > LLC-loads > (0.00%) > LLC-load-misses > (0.00%) > > 0.001808500 seconds time elapsed > Can you give more details about your use-case? I mean what code are you running for this benchmark. I'll tend to agree with Stephen: memcpy() with a constant (small) size should directly be replaced by the optimal code for this architecture. rte_memcpy() uses vector instructions, and is probably better than libc's memcpy for larger copies. Thanks, Olivier > > > On Thu, Jul 23, 2020 at 8:47 PM Stephen Hemminger > wrote: > > > > On Thu, 23 Jul 2020 12:02:40 +0500 > > Sarosh Arif wrote: > > > > > Since rte_memcpy is more optimized it should be used instead of memcpy > > > > > > Signed-off-by: Sarosh Arif > > > > Really did you measure this. > > For fixed size structures, compiler can inline memcpy small set of instructions.