From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3AD8AA052B; Tue, 28 Jul 2020 15:31:26 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F39C41BFF5; Tue, 28 Jul 2020 15:31:24 +0200 (CEST) Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67]) by dpdk.org (Postfix) with ESMTP id 0170B1BFF4 for ; Tue, 28 Jul 2020 15:31:23 +0200 (CEST) Received: by mail-io1-f67.google.com with SMTP id v6so5477555iow.11 for ; Tue, 28 Jul 2020 06:31:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=emumba-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=r5SQ8jiVNK3ZQCd2B2HgBBN3c7dYEwyRohjVxVIL4aw=; b=ZEbaoFNyVvX9oB89LxsjGy9NQjarG27eJnvXE1tJ4hp5dQAS3lB3+GGrI0bKxjCdER 4hsGJABbCdnr3LQYBMBgJljwqeFEISGEn4z4d0hE8zH1JlRNKL0moWM8hCRE7Tp3y+do KCRFe84gWdfnrUyb920YGTrXM+0IiY+24EfJN8bHjQPrlKgqgnchKRhel9EE5x7kQdrO 6kYIYB/dsNFTeDItYRS0/gKiGdHbK9zjyAeBWHJt+Gbz+Oahy4X2jcw7GU/Sij1K3sGs K2Rre/q1dKoIM6sGtdOvx5SKYxq6/rLM7JRg4e7fAZlbadPjpP9i2K06PcYeiz/V/Gge 229w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=r5SQ8jiVNK3ZQCd2B2HgBBN3c7dYEwyRohjVxVIL4aw=; b=pvFt97DNLZwHDOEzJcZKfbiWKGzla5r4ijayQKrSKgfzcVrPmiYU03J8EhErcsndul N6ILs8KH0+k82nblsxOO8G4gAU1NJIC/HImgjIBQe22qnz9yr8fRW4TDZoPuQZgauwgK tPRa/e3Dg2qbBP8y8eTwFvYnF6tV0tr+DGSlN53umgU8VNmZ2gZ2s59d+17+EZ68QGHk IvphluhvFMfuSEpf41FSgP40IeOqGla1b1kiXBwtQyGXvGkEn+euIv2zGuF1QKfDXUs3 /K4Al3QsIYEYs8Q+Xtk5wy0T+/wM5bk+XDeocC2Azb6LiUJkiuQmoh1nD9JR4YIjcvtQ 0cZg== X-Gm-Message-State: AOAM531UAJKlgZ2wDjbH3Hp5/nrQ4+t64TAiHnRBTH5i5zGEeFGYHhf0 bfwCgVKP73mBrDNdKt6i75gOXRQodtf4AhXQ8RPSwiCayIM= X-Google-Smtp-Source: ABdhPJyaMoLdp5SW5YqcZmF5mMnHtyOgq0/QDlhHbgd0c6UfnNxUIg9L/BGGydTxAtzzJ6gcNF3SXVqRCuUT2rKLf7E= X-Received: by 2002:a5e:9507:: with SMTP id r7mr14556004ioj.151.1595943083159; Tue, 28 Jul 2020 06:31:23 -0700 (PDT) MIME-Version: 1.0 References: <20200723070240.14749-1-sarosh.arif@emumba.com> <20200723084715.1f055aff@hermes.lan> In-Reply-To: <20200723084715.1f055aff@hermes.lan> From: Sarosh Arif Date: Tue, 28 Jul 2020 18:30:46 +0500 Message-ID: To: Stephen Hemminger Cc: Olivier Matz , dev@dpdk.org Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy() code semantics with optimized rte_memcpy() X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello, The following things made me think that rte_memcpy() is more optimized than memcpy(): 1. dpdk documentation recommends to use rte_memcpy() instead of memcpy(): https://doc.dpdk.org/guides/prog_guide/writing_efficient_code.html 2. Here some benchmarks are available: https://software.intel.com/content/www/us/en/develop/articles/performance-optimization-of-memcpy-in-dpdk.html 3. rte_memcpy() has __attribute__((always_inline)) associated with it, so compiler also tries to inline it. Using rte_memcpy() everywhere ensures consistency in code-base. Here are the results of the performance number measurement using "perf": rte_memcpy() Performance counter stats 1.573864 task-clock (msec) # 0.898 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 342 page-faults # 0.217 M/sec 5,483,016 cycles # 3.484 GHz 5,554,017 instructions # 1.01 insn per cycle 1,114,593 branches # 708.189 M/sec 33,796 branch-misses # 3.03% of all branches 1,369,247 L1-dcache-loads # 869.991 M/sec L1-dcache-load-misses (0.00%) LLC-loads (0.00%) LLC-load-misses (0.00%) 0.001753373 seconds time elapsed memcpy() Performance counter stats 1.631135 task-clock (msec) # 0.902 CPUs utilized 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 342 page-faults # 0.210 M/sec 5,676,549 cycles # 3.480 GHz (73.99%) 5,739,593 instructions # 1.01 insn per cycle 1,141,121 branches # 699.587 M/sec 34,553 branch-misses # 3.03% of all branches 1,417,494 L1-dcache-loads # 869.023 M/sec 67,312 L1-dcache-load-misses # 4.75% of all L1-dcache hits (26.01%) LLC-loads (0.00%) LLC-load-misses (0.00%) 0.001808500 seconds time elapsed On Thu, Jul 23, 2020 at 8:47 PM Stephen Hemminger wrote: > > On Thu, 23 Jul 2020 12:02:40 +0500 > Sarosh Arif wrote: > > > Since rte_memcpy is more optimized it should be used instead of memcpy > > > > Signed-off-by: Sarosh Arif > > Really did you measure this. > For fixed size structures, compiler can inline memcpy small set of instructions.