From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 72BA443C39; Sat, 2 Mar 2024 13:01:58 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 13A824025C; Sat, 2 Mar 2024 13:01:58 +0100 (CET) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 2A586400D5 for ; Sat, 2 Mar 2024 13:01:56 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 17F2F125F for ; Sat, 2 Mar 2024 13:01:55 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id E3C4613C2; Sat, 2 Mar 2024 13:01:54 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.4 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id A356B125E; Sat, 2 Mar 2024 13:01:52 +0100 (CET) Message-ID: Date: Sat, 2 Mar 2024 13:01:51 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 00/71] replace use of fixed size rte_mempcy From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= To: Stephen Hemminger , dev@dpdk.org References: <20240229225936.483472-1-stephen@networkplumber.org> <20240301171707.95242-1-stephen@networkplumber.org> <91b581bb-e181-4eac-879c-afd20b7bd6c4@lysator.liu.se> Content-Language: en-US In-Reply-To: <91b581bb-e181-4eac-879c-afd20b7bd6c4@lysator.liu.se> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-03-02 12:14, Mattias Rönnblom wrote: > On 2024-03-01 18:14, Stephen Hemminger wrote: >> The DPDK has a lot of "cargo cult" usage of rte_memcpy. >> This patch set replaces cases where rte_memcpy is used with a fixed >> size constant size. >> >> Typical example is: >>     rte_memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN); >> which can be replaced with: >>     memcpy(mac_addrs, mac.addr_bytes, RTE_ETHER_ADDR_LEN); >> >> This has two benefits. Gcc (and clang) are smart enough that for >> all small fixed size values, they just generate the necessary >> instructions >> to do it inline. It also means that fortify, Coverity, and ASAN >> analyzers can check these memcpy's. >> > > Instead of smearing out the knowledge of when to use rte_memcpy(), and > when to use memcpy() across the code base, wouldn't it be better to > *always* call rte_memcpy() in the fast path, and leave the policy > decision to the rte_memcpy() implementation? > > In rte_memcpy(), add: > if (__builtin_constant_p(n) && n < RTE_LIBC_MEMCPY_SIZE_THRESHOLD) >     memcpy(/../); > ..or something to that effect. > > Could you have a #ifdef for dumb static analysis tools? To make it look > like you are always using memcpy()? > >> So faster, better, safer. >> > > What is "faster" based on? > I ran some DSW benchmarks, and if you add diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h index 72a92290e0..64cd82d78d 100644 --- a/lib/eal/x86/include/rte_memcpy.h +++ b/lib/eal/x86/include/rte_memcpy.h @@ -862,6 +862,11 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n) static __rte_always_inline void * rte_memcpy(void *dst, const void *src, size_t n) { + if (__builtin_constant_p(n) && n <= 32) { + memcpy(dst, src, n); + return dst; + } + if (!(((uintptr_t)dst | (uintptr_t)src) & ALIGNMENT_MASK)) return rte_memcpy_aligned(dst, src, n); else ...the overhead increases from roughly 48 core clock cycles/event to 59 cc/event. The same for "n < 128". (I'm not sure what counts as "small" here.) So something rte_memcpy() does for small and constant memory copies does make things go *significantly* faster, at least in certain cases. (Linux, GCC 11.4, Intel Gracemont.) > My experience with replacing rte_memcpy() with memcpy() (or vice versa) > is mixed. > > I've also tried just dropping the DPDK-custom memcpy() implementation > altogether, and that caused a performance drop (in a particular app, on > a particular compiler and CPU). > >> The first patch is a simple coccinelle script to do the replacement >> and the rest are the results broken out by module. >> >> The coccinelle script can be used again to make sure more bad >> usage doesn't creep in with new drivers. >> >> v2 - fix CI failure on some OS by adding string.h >>       remove rte_memcpy.h if no longer used >> >> Stephen Hemminger (71): >>    cocci/rte_memcpy: add script to eliminate fixed size rte_memcpy >>    eal: replace use of fixed size rte_memcpy >>    ethdev: replace use of fixed size rte_memcpy >>    eventdev: replace use of fixed size rte_memcpy >>    cryptodev: replace use of fixed size rte_memcpy >>    ip_frag: replace use of fixed size rte_memcpy >>    net: replace use of fixed size rte_memcpy >>    lpm: replace use of fixed size rte_memcpy >>    node: replace use of fixed size rte_memcpy >>    pdcp: replace use of fixed size rte_memcpy >>    pipeline: replace use of fixed size rte_memcpy >>    rib: replace use of fixed size rte_memcpy >>    security: replace use of fixed size rte_memcpy >>    net/mlx5: replace use of fixed size rte_memcpy >>    net/nfp: replace use of fixed size rte_memcpy >>    net/ngbe: replace use of fixed size rte_memcpy >>    net/null: replace use of fixed size rte_memcpy >>    net/pcap: replace use of fixed size rte_memcpy >>    net/sfc: replace use of fixed size rte_memcpy >>    net/tap: replace use of fixed size rte_memcpy >>    net/txgbe: replace use of fixed size rte_memcpy >>    raw/ifpga: replace use of fixed size rte_memcpy >>    raw/skeleton: replace use of fixed size rte_memcpy >>    net/hns3: replace use of fixed size rte_memcpy >>    net/i40e: replace use of fixed size rte_memcpy >>    net/iavf: replace use of fixed size rte_memcpy >>    net/ice: replace use of fixed size rte_memcpy >>    net/idpf: replace use of fixed size rte_memcpy >>    net/ipn3ke: replace use of fixed size rte_memcpy >>    net/ixgbe: replace use of fixed size rte_memcpy >>    net/memif: replace use of fixed size rte_memcpy >>    net/qede: replace use of fixed size rte_memcpy >>    baseband/acc: replace use of fixed size rte_memcpy >>    baseband/la12xx: replace use of fixed size rte_memcpy >>    common/idpf: replace use of fixed size rte_memcpy >>    common/qat: replace use of fixed size rte_memcpy >>    compress/qat: replace use of fixed size rte_memcpy >>    crypto/ccp: replace use of fixed size rte_memcpy >>    crypto/cnxk: replace use of fixed size rte_memcpy >>    crypto/dpaa_sec: replace use of fixed size rte_memcpy >>    crypto/ipsec_mb: replace use of fixed size rte_memcpy >>    crypto/qat: replace use of fixed size rte_memcpy >>    crypto/scheduler: replace use of fixed size rte_memcpy >>    event/cnxk: replace use of fixed size rte_memcpy >>    event/dlb2: replace use of fixed size rte_memcpy >>    event/dpaa2: replace use of fixed size rte_memcpy >>    event/octeontx: replace use of fixed size rte_memcpy >>    mempool/dpaa: replace use of fixed size rte_memcpy >>    mempool/dpaa2: replace use of fixed size rte_memcpy >>    ml/cnxk: replace use of fixed size rte_memcpy >>    net/af_xdp: replace use of fixed size rte_memcpy >>    net/avp: replace use of fixed size rte_memcpy >>    net/axgbe: replace use of fixed size rte_memcpy >>    net/bnx2x: replace use of fixed size rte_memcpy >>    net/bnxt: replace use of fixed size rte_memcpy >>    net/bonding: replace use of fixed size rte_memcpy >>    net/cnxk: replace use of fixed size rte_memcpy >>    net/cpfl: replace use of fixed size rte_memcpy >>    net/cxgbe: replace use of fixed size rte_memcpy >>    net/dpaa2: replace use of fixed size rte_memcpy >>    net/e1000: replace use of fixed size rte_memcpy >>    net/enic: replace use of fixed size rte_memcpy >>    net/failsafe: replace use of fixed size rte_memcpy >>    net/gve/base: replace use of fixed size rte_memcpy >>    net/hinic: replace use of fixed size rte_memcpy >>    net/mvpp2: replace use of fixed size rte_memcpy >>    app/test-pmd: replace use of fixed size rte_memcpy >>    app/graph: replace use of fixed size rte_memcpy >>    app/test-eventdev: replace use of fixed size rte_memcpy >>    app/test: replace use of fixed size rte_memcpy >>    examples: replace use of fixed size rte_memcpy >> >>   app/graph/neigh.c                             |   8 +- >>   app/test-eventdev/test_pipeline_common.c      |  19 ++- >>   app/test-pmd/cmdline.c                        |  48 ++++---- >>   app/test-pmd/cmdline_flow.c                   |  24 ++-- >>   app/test-pmd/config.c                         |   8 +- >>   app/test-pmd/csumonly.c                       |   1 - >>   app/test-pmd/flowgen.c                        |   1 - >>   app/test-pmd/iofwd.c                          |   1 - >>   app/test-pmd/macfwd.c                         |   1 - >>   app/test-pmd/macswap.c                        |   1 - >>   app/test-pmd/noisy_vnf.c                      |   1 - >>   app/test-pmd/rxonly.c                         |   1 - >>   app/test-pmd/testpmd.c                        |   1 - >>   app/test/commands.c                           |   1 - >>   app/test/packet_burst_generator.c             |   4 +- >>   app/test/test_crc.c                           |   5 +- >>   app/test/test_cryptodev.c                     |  18 ++- >>   app/test/test_cryptodev_asym.c                |   1 - >>   app/test/test_cryptodev_security_pdcp.c       |   1 - >>   app/test/test_efd.c                           |   1 - >>   app/test/test_efd_perf.c                      |   1 - >>   app/test/test_event_crypto_adapter.c          |  12 +- >>   app/test/test_event_dma_adapter.c             |   4 +- >>   app/test/test_eventdev.c                      |   1 - >>   app/test/test_ipsec.c                         |   6 +- >>   app/test/test_link_bonding_mode4.c            |   8 +- >>   app/test/test_mbuf.c                          |   1 - >>   app/test/test_member.c                        |   1 - >>   app/test/test_member_perf.c                   |   1 - >>   app/test/test_rawdev.c                        |   1 - >>   app/test/test_security_inline_proto.c         |  36 +++--- >>   app/test/test_service_cores.c                 |   1 - >>   app/test/virtual_pmd.c                        |   3 +- >>   devtools/cocci/rte_memcpy.cocci               |  11 ++ >>   drivers/baseband/acc/rte_acc100_pmd.c         |  17 ++- >>   drivers/baseband/acc/rte_vrb_pmd.c            |  21 ++-- >>   drivers/baseband/la12xx/bbdev_la12xx.c        |   4 +- >>   drivers/common/idpf/idpf_common_device.c      |   4 +- >>   drivers/common/idpf/idpf_common_virtchnl.c    |   8 +- >>   drivers/common/qat/qat_qp.c                   |  10 +- >>   drivers/compress/qat/qat_comp.c               |   8 +- >>   drivers/crypto/ccp/ccp_crypto.c               |  14 +-- >>   drivers/crypto/cnxk/cnxk_cryptodev_ops.c      |   2 +- >>   drivers/crypto/cnxk/cnxk_se.h                 |   2 +- >>   drivers/crypto/dpaa_sec/dpaa_sec.c            |   2 +- >>   drivers/crypto/ipsec_mb/pmd_snow3g.c          |   4 +- >>   drivers/crypto/qat/qat_sym_session.c          |  52 ++++----- >>   .../scheduler/rte_cryptodev_scheduler.c       |   6 +- >>   drivers/crypto/scheduler/scheduler_failover.c |  12 +- >>   drivers/event/cnxk/cnxk_tim_evdev.c           |   4 +- >>   drivers/event/dlb2/dlb2.c                     |   6 +- >>   drivers/event/dpaa2/dpaa2_eventdev.c          |   6 +- >>   drivers/event/octeontx/timvf_evdev.c          |   4 +- >>   drivers/mempool/dpaa/dpaa_mempool.c           |   4 +- >>   drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |   4 +- >>   drivers/ml/cnxk/cn10k_ml_model.c              |   8 +- >>   drivers/ml/cnxk/cn10k_ml_ops.c                |  11 +- >>   drivers/ml/cnxk/cnxk_ml_ops.c                 |   2 +- >>   drivers/ml/cnxk/mvtvm_ml_model.c              |   8 +- >>   drivers/ml/cnxk/mvtvm_ml_ops.c                |   8 +- >>   drivers/net/af_xdp/rte_eth_af_xdp.c           |   2 +- >>   drivers/net/avp/avp_ethdev.c                  |   4 +- >>   drivers/net/axgbe/axgbe_ethdev.c              |   4 +- >>   drivers/net/bnx2x/bnx2x.c                     |  32 +++-- >>   drivers/net/bnx2x/bnx2x_stats.c               |  10 +- >>   drivers/net/bnx2x/bnx2x_vfpf.c                |  19 +-- >>   drivers/net/bnxt/bnxt_flow.c                  |  34 +++--- >>   drivers/net/bonding/rte_eth_bond_8023ad.c     |   4 +- >>   drivers/net/bonding/rte_eth_bond_flow.c       |   2 +- >>   drivers/net/cnxk/cnxk_ethdev_ops.c            |   2 +- >>   drivers/net/cnxk/cnxk_tm.c                    |   5 +- >>   drivers/net/cpfl/cpfl_ethdev.c                |   3 +- >>   drivers/net/cpfl/cpfl_vchnl.c                 |   4 +- >>   drivers/net/cxgbe/clip_tbl.c                  |   2 +- >>   drivers/net/cxgbe/cxgbe_filter.c              |   8 +- >>   drivers/net/cxgbe/l2t.c                       |   4 +- >>   drivers/net/cxgbe/smt.c                       |  20 ++-- >>   drivers/net/dpaa2/base/dpaa2_hw_dpni.c        |   1 - >>   drivers/net/dpaa2/dpaa2_ethdev.c              |   1 - >>   drivers/net/dpaa2/dpaa2_recycle.c             |   1 - >>   drivers/net/dpaa2/dpaa2_rxtx.c                |   1 - >>   drivers/net/dpaa2/dpaa2_sparser.c             |   1 - >>   drivers/net/dpaa2/dpaa2_tm.c                  |   2 +- >>   drivers/net/e1000/em_rxtx.c                   |   1 - >>   drivers/net/e1000/igb_flow.c                  |  22 ++-- >>   drivers/net/e1000/igb_pf.c                    |   7 +- >>   drivers/net/e1000/igb_rxtx.c                  |   1 - >>   drivers/net/enic/enic_main.c                  |   8 +- >>   drivers/net/failsafe/failsafe_ops.c           |   6 +- >>   drivers/net/gve/base/gve_adminq.c             |   2 +- >>   drivers/net/hinic/hinic_pmd_flow.c            |  40 +++---- >>   drivers/net/hns3/hns3_fdir.c                  |   2 +- >>   drivers/net/hns3/hns3_flow.c                  |   4 +- >>   drivers/net/i40e/i40e_ethdev.c                | 109 ++++++++---------- >>   drivers/net/i40e/i40e_fdir.c                  |  28 +++-- >>   drivers/net/i40e/i40e_flow.c                  |  56 +++++---- >>   drivers/net/i40e/i40e_pf.c                    |   3 +- >>   drivers/net/i40e/i40e_tm.c                    |  11 +- >>   drivers/net/i40e/rte_pmd_i40e.c               |  34 +++--- >>   drivers/net/iavf/iavf_fdir.c                  |  93 +++++++-------- >>   drivers/net/iavf/iavf_fsub.c                  |  50 ++++---- >>   drivers/net/iavf/iavf_generic_flow.c          |   2 +- >>   drivers/net/iavf/iavf_tm.c                    |  11 +- >>   drivers/net/iavf/iavf_vchnl.c                 |   9 +- >>   drivers/net/ice/ice_dcf.c                     |   5 +- >>   drivers/net/ice/ice_dcf_parent.c              |   2 +- >>   drivers/net/ice/ice_dcf_sched.c               |  11 +- >>   drivers/net/ice/ice_diagnose.c                |   4 +- >>   drivers/net/ice/ice_ethdev.c                  |  14 +-- >>   drivers/net/ice/ice_fdir_filter.c             |  37 +++--- >>   drivers/net/ice/ice_generic_flow.c            |   2 +- >>   drivers/net/ice/ice_hash.c                    |   2 +- >>   drivers/net/ice/ice_tm.c                      |  11 +- >>   drivers/net/idpf/idpf_ethdev.c                |   7 +- >>   drivers/net/idpf/idpf_rxtx.c                  |  10 +- >>   drivers/net/ipn3ke/ipn3ke_flow.c              |  32 +++-- >>   drivers/net/ipn3ke/ipn3ke_representor.c       |  16 +-- >>   drivers/net/ipn3ke/ipn3ke_tm.c                |   6 +- >>   drivers/net/ixgbe/ixgbe_ethdev.c              |   9 +- >>   drivers/net/ixgbe/ixgbe_fdir.c                |   7 +- >>   drivers/net/ixgbe/ixgbe_flow.c                |  65 +++++------ >>   drivers/net/ixgbe/ixgbe_ipsec.c               |   8 +- >>   drivers/net/ixgbe/ixgbe_pf.c                  |   5 +- >>   drivers/net/ixgbe/ixgbe_tm.c                  |  11 +- >>   drivers/net/ixgbe/rte_pmd_ixgbe.c             |   4 +- >>   drivers/net/memif/memif_socket.c              |   4 +- >>   drivers/net/mlx5/mlx5_devx.c                  |   4 +- >>   drivers/net/mlx5/mlx5_flow.c                  |  38 +++--- >>   drivers/net/mlx5/mlx5_flow_aso.c              |   6 +- >>   drivers/net/mlx5/mlx5_flow_hw.c               |  16 +-- >>   drivers/net/mlx5/mlx5_rx.c                    |   6 +- >>   drivers/net/mlx5/mlx5_rxtx_vec.c              |   8 +- >>   drivers/net/mvpp2/mrvl_tm.c                   |   2 +- >>   drivers/net/nfp/flower/nfp_conntrack.c        |   2 +- >>   drivers/net/nfp/flower/nfp_flower_flow.c      |  16 +-- >>   .../net/nfp/flower/nfp_flower_representor.c   |   2 +- >>   drivers/net/nfp/nfp_mtr.c                     |  10 +- >>   drivers/net/ngbe/ngbe_pf.c                    |   4 +- >>   drivers/net/null/rte_eth_null.c               |   6 +- >>   drivers/net/pcap/pcap_ethdev.c                |   2 +- >>   drivers/net/pcap/pcap_osdep_freebsd.c         |   2 +- >>   drivers/net/pcap/pcap_osdep_linux.c           |   2 +- >>   drivers/net/qede/qede_main.c                  |   2 +- >>   drivers/net/sfc/sfc.c                         |   2 +- >>   drivers/net/sfc/sfc_ef10_tx.c                 |   2 +- >>   drivers/net/sfc/sfc_ethdev.c                  |  11 +- >>   drivers/net/sfc/sfc_flow.c                    |  20 ++-- >>   drivers/net/sfc/sfc_flow_rss.c                |   2 +- >>   drivers/net/sfc/sfc_mae.c                     |   2 +- >>   drivers/net/sfc/sfc_rx.c                      |   2 +- >>   drivers/net/sfc/sfc_tso.c                     |   2 +- >>   drivers/net/sfc/sfc_tso.h                     |   9 +- >>   drivers/net/tap/rte_eth_tap.c                 |  14 +-- >>   drivers/net/txgbe/txgbe_ethdev.c              |   9 +- >>   drivers/net/txgbe/txgbe_fdir.c                |   6 +- >>   drivers/net/txgbe/txgbe_flow.c                |  65 +++++------ >>   drivers/net/txgbe/txgbe_ipsec.c               |   8 +- >>   drivers/net/txgbe/txgbe_pf.c                  |   4 +- >>   drivers/net/txgbe/txgbe_tm.c                  |  11 +- >>   drivers/raw/ifpga/afu_pmd_he_hssi.c           |   3 +- >>   drivers/raw/ifpga/afu_pmd_he_lpbk.c           |   3 +- >>   drivers/raw/ifpga/afu_pmd_he_mem.c            |   3 +- >>   drivers/raw/ifpga/afu_pmd_n3000.c             |   8 +- >>   drivers/raw/ifpga/ifpga_rawdev.c              |  11 +- >>   drivers/raw/skeleton/skeleton_rawdev.c        |   7 +- >>   examples/bbdev_app/main.c                     |   2 +- >>   examples/l2fwd-cat/cat.c                      |   4 +- >>   examples/ptpclient/ptpclient.c                |  11 +- >>   examples/vhost/main.c                         |   5 +- >>   examples/vmdq/main.c                          |   6 +- >>   examples/vmdq_dcb/main.c                      |  15 +-- >>   lib/cryptodev/rte_cryptodev.c                 |   2 +- >>   lib/eal/common/eal_common_options.c           |   7 +- >>   lib/ethdev/rte_ethdev.c                       |   3 +- >>   lib/ethdev/rte_flow.c                         |   5 +- >>   lib/eventdev/rte_event_crypto_adapter.c       |   2 +- >>   lib/eventdev/rte_event_dma_adapter.c          |   4 +- >>   lib/eventdev/rte_event_timer_adapter.c        |   2 +- >>   lib/fib/trie.c                                |   2 +- >>   lib/ip_frag/rte_ipv6_fragmentation.c          |   4 +- >>   lib/ip_frag/rte_ipv6_reassembly.c             |   6 +- >>   lib/lpm/rte_lpm6.c                            |   3 +- >>   lib/net/rte_ether.c                           |   2 +- >>   lib/node/ip6_lookup.c                         |   8 +- >>   lib/pdcp/pdcp_process.c                       |  36 +++--- >>   lib/pipeline/rte_table_action.c               |   8 +- >>   lib/rib/rte_rib6.h                            |   5 +- >>   lib/security/rte_security.c                   |   4 +- >>   188 files changed, 881 insertions(+), 998 deletions(-) >>   create mode 100644 devtools/cocci/rte_memcpy.cocci >>