From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 794BD43BDB; Sun, 3 Mar 2024 06:58:11 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5A5884354C; Sun, 3 Mar 2024 06:58:11 +0100 (CET) Received: from mail-il1-f178.google.com (mail-il1-f178.google.com [209.85.166.178]) by mails.dpdk.org (Postfix) with ESMTP id D1DE34028C for ; Sun, 3 Mar 2024 06:58:09 +0100 (CET) Received: by mail-il1-f178.google.com with SMTP id e9e14a558f8ab-365b3d92354so21340225ab.1 for ; Sat, 02 Mar 2024 21:58:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1709445489; x=1710050289; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=DMzqoQtQuMdDg3nkKl0EfuiCwZZdkqRC2doDD1a1lAE=; b=L+vetlU5Cy0IgsMGBn0wOubCw7YGqGmxFu8wAHwtdGuxxTN7CGxetyDDTsLsKNaySb oI84g2MQBBe3nwIi6a1ZZaKHvOcOIlRt6+Dol4/cMqC7JcUO9OY+J8NDd9njIkNYLqZi WFeDF5anxgrdnJWNLlNrsJgRWYWtotrCNDddlvPj6kRtlzp4V1NtnKmdQfEWZccC4SrR seFriRQ5plLZQZyD7qP7FJSwzbqy8xrpOOPathJK1+FkIiLZtwXJSimdXkhraSoW7Kai 29jyKVmwuwH9hfRVkC+klgZiHitVTEjskPbCukChg2me0e5MjtoYsszYzEaoM3KlB0Zc QVEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709445489; x=1710050289; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DMzqoQtQuMdDg3nkKl0EfuiCwZZdkqRC2doDD1a1lAE=; b=T8XOvHY54a7GlCTARXy4QZx9SIff0OmyKGSeaxCF7Wkh9XLf8nK2jcntU9ANBw1iyX ua5MgugpE8Skho+G47U5njN8hGTkEn9vx16M5S6hWv/TeroLaPUSREg8vv+RkAyXEUe0 MEj2bE9pC5f+ZRALxoANOmSyvEOdgeP2pccloF5AqpnKkI6KJWdTOWCc9Cd3xfthupJq qwgmqm3SrBjj74IkRKOW/YmFDYflWFv0W0Aq/dBL32slLBtRm9PBR0514j/v427HCG2S pllrtWaGqnLEYrkrLEsOEZwWqxmdPwv5czYcu6Pij40OMfYiiMj+hXfKjiveTphps7Se BZOA== X-Forwarded-Encrypted: i=1; AJvYcCXQC8oovNzNbDb8NM645bZwWYic+kSXxkDmnq7mEAI9eORzlYx1R5yZa2WE02082/ZWyTTdCFZIGaC9ZYg= X-Gm-Message-State: AOJu0Yzyf6CI9ClMsdeA5CEbmvlWLPycr5w2HCMVqAfpvfLqVE3Bce9m 2DTyr/90QLp1JiDJml28CHdVz6a4fpv9eugilhs8/6DY/N0Zll/957hV8AUwwpo= X-Google-Smtp-Source: AGHT+IE/DhTHcv8dPkpe61axa2CiQG4vHLUO5AvomeOu/eDWnFGb6+fE30H1XmBvmd8KDkbNoc20eg== X-Received: by 2002:a05:6e02:1bc9:b0:365:1529:937e with SMTP id x9-20020a056e021bc900b003651529937emr8746694ilv.8.1709445489191; Sat, 02 Mar 2024 21:58:09 -0800 (PST) Received: from hermes.local (204-195-123-141.wavecable.com. [204.195.123.141]) by smtp.gmail.com with ESMTPSA id k4-20020aa79984000000b006e559bc3250sm2722994pfh.68.2024.03.02.21.58.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 02 Mar 2024 21:58:09 -0800 (PST) Date: Sat, 2 Mar 2024 21:58:07 -0800 From: Stephen Hemminger To: Morten =?UTF-8?B?QnLDuHJ1cA==?= Cc: bruce.richardson@intel.com, konstantin.v.ananyev@yandex.ru, mattias.ronnblom@ericsson.com, dev@dpdk.org Subject: Re: [PATCH] eal/x86: improve rte_memcpy const size 16 performance Message-ID: <20240302215807.6d7c3cd9@hermes.local> In-Reply-To: <20240302214003.15c37310@hermes.local> References: <20240302234812.9137-1-mb@smartsharesystems.com> <20240302214003.15c37310@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, 2 Mar 2024 21:40:03 -0800 Stephen Hemminger wrote: > On Sun, 3 Mar 2024 00:48:12 +0100 > Morten Br=C3=B8rup wrote: >=20 > > When the rte_memcpy() size is 16, the same 16 bytes are copied twice. > > In the case where the size is knownto be 16 at build tine, omit the > > duplicate copy. > >=20 > > Reduced the amount of effectively copy-pasted code by using #ifdef > > inside functions instead of outside functions. > >=20 > > Suggested-by: Stephen Hemminger > > Signed-off-by: Morten Br=C3=B8rup > > --- =20 >=20 > Looks good, let me see how it looks in goldbolt vs Gcc. >=20 > One other issue is that for the non-constant case, rte_memcpy has an exce= ssively > large inline code footprint. That is one of the reasons Gcc doesn't always > inline. For > 128 bytes, it really should be a function. For size of 4,6,8,16, 32, 64, up to 128 Gcc inline and rte_memcpy match. For size 128. It looks gcc is simpler. rte_copy_addr: vmovdqu ymm0, YMMWORD PTR [rsi] vextracti128 XMMWORD PTR [rdi+16], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+32] vextracti128 XMMWORD PTR [rdi+48], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+32], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+64] vextracti128 XMMWORD PTR [rdi+80], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+64], xmm0 vmovdqu ymm0, YMMWORD PTR [rsi+96] vextracti128 XMMWORD PTR [rdi+112], ymm0, 0x1 vmovdqu XMMWORD PTR [rdi+96], xmm0 vzeroupper ret copy_addr: vmovdqu ymm0, YMMWORD PTR [rsi] vmovdqu YMMWORD PTR [rdi], ymm0 vmovdqu ymm1, YMMWORD PTR [rsi+32] vmovdqu YMMWORD PTR [rdi+32], ymm1 vmovdqu ymm2, YMMWORD PTR [rsi+64] vmovdqu YMMWORD PTR [rdi+64], ymm2 vmovdqu ymm3, YMMWORD PTR [rsi+96] vmovdqu YMMWORD PTR [rdi+96], ymm3 vzeroupper ret