From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4FADBA0524; Thu, 4 Feb 2021 15:14:58 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 021D62405B2; Thu, 4 Feb 2021 15:14:58 +0100 (CET) Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by mails.dpdk.org (Postfix) with ESMTP id 0A6B8240596 for ; Thu, 4 Feb 2021 15:14:55 +0100 (CET) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Thu, 04 Feb 2021 06:14:55 -0800 Received: from HQMAIL105.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Thu, 04 Feb 2021 06:14:55 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Thu, 04 Feb 2021 06:14:55 -0800 Received: from HKMAIL104.nvidia.com (10.18.16.13) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 4 Feb 2021 14:14:53 +0000 Received: from HKMAIL101.nvidia.com (10.18.16.10) by HKMAIL104.nvidia.com (10.18.16.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 4 Feb 2021 14:14:48 +0000 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.176) by HKMAIL101.nvidia.com (10.18.16.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 4 Feb 2021 14:14:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MYQ+uq2J26O/YLwv1goNsMQoQtI+jkFaSKaTWBCI4XAV4QZTTed04WcOjVC6yLbVglveEqWFaw7ojwqv2FTqPesdDFwY7g/KntClgtTdjcZZlqxEhGBCZ7MyHvJcnxQO+XbF2RTJbTsLozKF9x9bldS9cpyTBZ1uNKTm1R5aptQJHqmAbsMZIIeQEihLYqb7ntbuvtgA3T+FSZspHYlWUv14JaWKzezC/MJcy21F1ZrBfwY+Hgz85LIPySaShRoxPjX/pXIgH3HC7vsg9g8Yz4Van+afgoMybUfnsETGjSG8j0Vfcz0qKFX0ctwuYUts7QQ5iycIwBb2jJN+0pme1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/re3eV/C8g/zdqEj0X3ebuP4BqBCIwSFB66zVnqzkgI=; b=Mo9eoKD6mGfMof9G05fods68DpUaxPNWcAM5w494fcrKnweGbTxhBljVe7kL/QleakL0kzG7k+M/YgsU+pkq32iaNl5LJBeQSt3FHbtSFrTAHArtUpJlm5XMqtRWX0WSmiSE+gdcGFyJF/x223YAxxcZB2SIxuuM1ry/z6mfGDDH0aBYMl1RnmsT6bZI0vcBixwbZIYO9IkcDNFIHYSfjkxQiSlPzN4yYVzqfmAhwK6N151HjlsSpEuN6UnDuuRuOyWIv5uCOCwtXqRqt6zJhEPx8eXYRVbRsOYfT2+qvwSDooSClmj7mGH1WKKbxbOocI2aAzD4f5aaEN4ncnl7OA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none Received: from DM6PR12MB3753.namprd12.prod.outlook.com (2603:10b6:5:1c7::18) by DM6PR12MB3948.namprd12.prod.outlook.com (2603:10b6:5:1c4::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3825.20; Thu, 4 Feb 2021 14:14:45 +0000 Received: from DM6PR12MB3753.namprd12.prod.outlook.com ([fe80::e4a9:f9a1:d873:d07a]) by DM6PR12MB3753.namprd12.prod.outlook.com ([fe80::e4a9:f9a1:d873:d07a%5]) with mapi id 15.20.3805.028; Thu, 4 Feb 2021 14:14:45 +0000 From: Slava Ovsiienko To: Aman Kumar , "dev@dpdk.org" CC: Raslan Darawsheh , "keesang.song@amd.com" , Asaf Penso , Shy Shyman , Alexander Kozyrev , Matan Azrad Thread-Topic: [PATCH v3 1/2] net/mlx5: optimize mprq memcpy Thread-Index: AQHWnuPqm2wuO9+dXk6uGoCZkuHIo6pIucag Date: Thu, 4 Feb 2021 14:14:45 +0000 Message-ID: References: <20200925031658.50476-1-aman.kumar@vvdntech.in> <20201010090034.1797958-1-aman.kumar@vvdntech.in> In-Reply-To: <20201010090034.1797958-1-aman.kumar@vvdntech.in> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: vvdntech.in; dkim=none (message not signed) header.d=none;vvdntech.in; dmarc=none action=none header.from=nvidia.com; x-originating-ip: [95.164.10.10] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e7d08e79-4557-4b4d-5753-08d8c917393b x-ms-traffictypediagnostic: DM6PR12MB3948: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-header: ProcessedBy-CMR-outbound x-ms-oob-tlc-oobclassifiers: OLM:9508; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: BAIEDrdqYUc/CRV/Pef1jCF7yIP4fIpdUP1k9l3JyLgggtdKLR6ovXdza/EBJkUuLJNFKvU/T5B19AG2Tq1UAnt/E6CaFLaoiTghD4h2c4qjPr0eaPjnFaJPofoRGnBZ0+MsC8fEnxdw4MG/sZF6oQNA4a8h9TafSc6mHssE+CtUcKtaG4Sbb9yq2IE2/+zhSPi07E+gOYyZ+1MGcewX0p8q0nEsDBub0MuUQJGUezPqca4rUkq/AdE1wtukku5ctSWP5b/BYA74vu/mIJB+dHAbAYc5Suk9sM2Ys/gciZtBq9kiWt+W7wiwSAnvoP2mAK9+OGq9c67aPQI0mxG+9kRAfZJHu/6nRTe9DfBVl1RlgXXeiDaQIE9y2mGu0zQwP2eMM5W/FLU36LO079624ahbpOOYD2k7rPliKKQQxA3NPrdP+/x5I5r+y3Uv8+Giy3TH45GyMhBurV4nPV/jau7h18Q0jD2Bgu4LE4g4GGUdFxUVu/OwZTqKp4+y2UKtUEBgTDw1LbBlkTeuBXmfXmGNeAIZaqAhyVFLOD/KtfITZENbboqyzfwFqf5gVyXDT9ZQIpCpYjf8yYM0savex4sdv9TmgsPtjmhysm3Mtb0= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR12MB3753.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(136003)(366004)(346002)(376002)(39860400002)(83380400001)(33656002)(86362001)(55016002)(2906002)(52536014)(66946007)(66556008)(64756008)(66446008)(66476007)(8936002)(478600001)(186003)(26005)(5660300002)(76116006)(316002)(7696005)(6506007)(53546011)(54906003)(71200400001)(8676002)(110136005)(9686003)(107886003)(4326008)(309714004)(554374003); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?us-ascii?Q?sKGCco4JuLi+w60f755tHpBOlk6wiOr4NziP5dvzYTKEFNmpZ1T1L4Ygb3ag?= =?us-ascii?Q?R+zHYZff2kLk49leq4IJrjgGB8hIRsX0+n6u4tprvprnxwWU8qIL5POxuXqJ?= =?us-ascii?Q?46K1T7N/TsquDm15zmxfGGrUg7hbD8ttZwiBbrCPlO12GGcsgdTosvuFIGvq?= =?us-ascii?Q?S30L1aLIrxRxHlyJjzxyP6ATe0MnJ5klrHglW3qSuMksKZXAJw67evzEH8xW?= =?us-ascii?Q?fksIdIKRtMrMYHh7/36fwiRRJ8G28UPWrRLe827UxnvDl+UerwqwMhT1kHw6?= =?us-ascii?Q?/aNi23qR3aAaja3SfjOJCmafB3htyzWysHmIvwhJok61YU9SH8XbB0GsYES5?= =?us-ascii?Q?7wwjS7vqAJZobD7ZY8ExRLuehN4D/oqtuEqZAaqhUc3lUhkiyiLyTcID6Rt9?= =?us-ascii?Q?o/4GMhMShp7P90WaQzjfPeYNqrQRyy1YEsKnWCIxHJ9DsyKTXcjtmKB3W4Ri?= =?us-ascii?Q?10AuIS8d+R8GRz5Hh2HXUMMuLqmelm4fXWmWfByq1NhTOgV8veu/uVydUhI3?= =?us-ascii?Q?ywW/AekoZfZGJ914iKIWWiE/Cb33dNJhzknl7KjZEOXHjo4w51CV5+CU2AeM?= =?us-ascii?Q?ftzjj74yNiQa6tk5Z4zPxuI9Dn832uyEZpwQBp7EGY9cNKw7NvJREVR8gF5N?= =?us-ascii?Q?g0squdmHEpuLbKhS5QlOaOzAdPWTZosKHc7rkgLbP+nlaYlwHuH0FYv6Bo1D?= =?us-ascii?Q?QJB56lL32a/9E3VJwe5VMrJD5xxa0GCe4HiCZtFZzPaeFmKYtLEw6LK5SY1S?= =?us-ascii?Q?iXi6HWfKK7LTByvJpWWWIGgGxAhFgfzLCi3jqTCcF75bYkfDcSiPgcViSJME?= =?us-ascii?Q?N0j3UuqyPVBvTE1vseYATU1wTp5klMpKMhOZa7r3+xnBfaKAGSnTeb0XdFgl?= =?us-ascii?Q?GZkQ/Uu6ryOH7/ogqebDoloHgZK8cFt800cdyQBM1qE+3FWyhWRpT3Toumh0?= =?us-ascii?Q?1k8HuQfDrj77e6+oAIszL01HfTikfawx0vmoghPsEmURnyK5Ij0PftapboOf?= =?us-ascii?Q?VqSK2Ze8/JHsSCZfeoEEMtQUAV62KFtX5qoLzwcU8qQK2s21Egr2uR74wER/?= =?us-ascii?Q?g04vXk01?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3753.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: e7d08e79-4557-4b4d-5753-08d8c917393b X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Feb 2021 14:14:45.3876 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: rE0j63K20qEaxlGrgOJHsL9MrpsAJKMWLu67OPj/QiOu6dJIWLw/FZIC3TuZspYs0V9ZaIdJU2rSuu+6kwbjzA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB3948 X-OriginatorOrg: Nvidia.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1612448095; bh=/re3eV/C8g/zdqEj0X3ebuP4BqBCIwSFB66zVnqzkgI=; h=X-PGP-Universal:ARC-Seal:ARC-Message-Signature: ARC-Authentication-Results:From:To:CC:Subject:Thread-Topic: Thread-Index:Date:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:authentication-results:x-originating-ip: x-ms-publictraffictype:x-ms-office365-filtering-correlation-id: x-ms-traffictypediagnostic:x-ms-exchange-transport-forked: x-microsoft-antispam-prvs:x-header:x-ms-oob-tlc-oobclassifiers: x-ms-exchange-senderadcheck:x-microsoft-antispam: x-microsoft-antispam-message-info:x-forefront-antispam-report: x-ms-exchange-antispam-messagedata:Content-Type: Content-Transfer-Encoding:MIME-Version: X-MS-Exchange-CrossTenant-AuthAs: X-MS-Exchange-CrossTenant-AuthSource: X-MS-Exchange-CrossTenant-Network-Message-Id: X-MS-Exchange-CrossTenant-originalarrivaltime: X-MS-Exchange-CrossTenant-fromentityheader: X-MS-Exchange-CrossTenant-id:X-MS-Exchange-CrossTenant-mailboxtype: X-MS-Exchange-CrossTenant-userprincipalname: X-MS-Exchange-Transport-CrossTenantHeadersStamped:X-OriginatorOrg; b=g4Xpk0GLi1SdJatNh6CibwdyYus1cSYsHGBY9xJIvhBJnLXY4GW6NTJW7+3DOfpun uu/LTS35SV4oGEYshhIFV/XPFtdwUk9sOlCAkXZt34QvIqJ6bu1DqFZzsrhqx6xpeu VQqRoMushaf0s53O86JYSdTKPEG7MchMfDw6xgrfzIQHbdCLWlwGpdwLivxyIDf80c q3FNC6CSM2Eja6j3qCsGpmlycL794uak22U0hCd3zebSVoGUveqmGBA9zukCYDm9mM BClKBLFhrA0hmQLvNXinSU3Tm+FwLRlrkcnn0hNa3TU2ctL5fEsLXbcdnDZPQb43jH Al+pXLUUvq8JQ== Subject: Re: [dpdk-dev] [PATCH v3 1/2] net/mlx5: optimize mprq memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, I'm sorry for asking the questions very late. Is still this patch set actual and should it be updated and considered?=20 As I can understand this one optimizes the memory writes in some way using = the instructions with the hints. Is this specific for some CPU families? Is this more common? I suppose it s= hould we considered and discussed more widely, possible on EAL level. I would propose to introduce these spec= ial memory routines on EAL level to give advantage to all PMDs, not specifically to mlx5. With best regards, Slava > -----Original Message----- > From: Aman Kumar > Sent: Saturday, October 10, 2020 12:01 > To: dev@dpdk.org > Cc: Raslan Darawsheh ; keesang.song@amd.com; > Asaf Penso ; Shy Shyman ; Slava > Ovsiienko ; Alexander Kozyrev > ; Matan Azrad ; > aman.kumar@vvdntech.in > Subject: [PATCH v3 1/2] net/mlx5: optimize mprq memcpy >=20 > add non temporal load and temporal store for mprq memcpy. > define RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY in build > configuration to enable this optimization. >=20 > Signed-off-by: Aman Kumar > --- > drivers/net/mlx5/meson.build | 1 + > drivers/net/mlx5/mlx5.c | 12 ++++ > drivers/net/mlx5/mlx5.h | 3 + > drivers/net/mlx5/mlx5_rxq.c | 3 + > drivers/net/mlx5/mlx5_rxtx.c | 116 > ++++++++++++++++++++++++++++++++++- > drivers/net/mlx5/mlx5_rxtx.h | 3 + > meson_options.txt | 2 + > 7 files changed, 138 insertions(+), 2 deletions(-) >=20 > diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build > index 9a97bb9c8..38e93fdc1 100644 > --- a/drivers/net/mlx5/meson.build > +++ b/drivers/net/mlx5/meson.build > @@ -47,6 +47,7 @@ foreach option:cflags_options > cflags +=3D option > endif > endforeach > +dpdk_conf.set('RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY', > +get_option('mlx5_ntload_tstore')) > if get_option('buildtype').contains('debug') > cflags +=3D [ '-pedantic', '-DPEDANTIC' ] else diff --git > a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index > 01ead6e6a..a2796eaa5 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -160,6 +160,11 @@ > /* Configure timeout of LRO session (in microseconds). */ #define > MLX5_LRO_TIMEOUT_USEC "lro_timeout_usec" >=20 > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > +/* mprq_tstore_memcpy */ > +#define MLX5_MPRQ_TSTORE_MEMCPY "mprq_tstore_memcpy" > +#endif > + > /* > * Device parameter to configure the total data buffer size for a single > * hairpin queue (logarithm value). > @@ -1623,6 +1628,10 @@ mlx5_args_check(const char *key, const char > *val, void *opaque) > config->sys_mem_en =3D !!tmp; > } else if (strcmp(MLX5_DECAP_EN, key) =3D=3D 0) { > config->decap_en =3D !!tmp; > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + } else if (strcmp(MLX5_MPRQ_TSTORE_MEMCPY, key) =3D=3D 0) { > + config->mprq_tstore_memcpy =3D tmp; > +#endif > } else { > DRV_LOG(WARNING, "%s: unknown parameter", key); > rte_errno =3D EINVAL; > @@ -1683,6 +1692,9 @@ mlx5_args(struct mlx5_dev_config *config, struct > rte_devargs *devargs) > MLX5_RECLAIM_MEM, > MLX5_SYS_MEM_EN, > MLX5_DECAP_EN, > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + MLX5_MPRQ_TSTORE_MEMCPY, > +#endif > NULL, > }; > struct rte_kvargs *kvlist; > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index > 43da9a1fb..1eb305650 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -234,6 +234,9 @@ struct mlx5_dev_config { > int tx_skew; /* Tx scheduling skew between WQE and data on wire. > */ > struct mlx5_hca_attr hca_attr; /* HCA attributes. */ > struct mlx5_lro_config lro; /* LRO configuration. */ > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + unsigned int mprq_tstore_memcpy:1; > +#endif > }; >=20 >=20 > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c in= dex > c059e216d..c8db59a12 100644 > --- a/drivers/net/mlx5/mlx5_rxq.c > +++ b/drivers/net/mlx5/mlx5_rxq.c > @@ -1380,6 +1380,9 @@ mlx5_rxq_new(struct rte_eth_dev *dev, uint16_t > idx, uint16_t desc, > tmpl->socket =3D socket; > if (dev->data->dev_conf.intr_conf.rxq) > tmpl->irq =3D 1; > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + tmpl->rxq.mprq_tstore_memcpy =3D config->mprq_tstore_memcpy; > #endif > mprq_stride_nums =3D config->mprq.stride_num_n ? > config->mprq.stride_num_n : MLX5_MPRQ_STRIDE_NUM_N; > mprq_stride_size =3D non_scatter_min_mbuf_size <=3D diff --git > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index > 0b87be15b..f59e30d82 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.c > +++ b/drivers/net/mlx5/mlx5_rxtx.c > @@ -123,6 +123,97 @@ uint8_t mlx5_swp_types_table[1 << 10] > __rte_cache_aligned; uint64_t rte_net_mlx5_dynf_inline_mask; #define > PKT_TX_DYNF_NOINLINE rte_net_mlx5_dynf_inline_mask >=20 > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > +static void copy16B_ts(void *dst, void *src) { > + __m128i var128; > + > + var128 =3D _mm_stream_load_si128((__m128i *)src); > + _mm_storeu_si128((__m128i *)dst, var128); } > + > +static void copy32B_ts(void *dst, void *src) { > + __m256i ymm0; > + > + ymm0 =3D _mm256_stream_load_si256((const __m256i *)src); > + _mm256_storeu_si256((__m256i *)dst, ymm0); } > + > +static void copy64B_ts(void *dst, void *src) { > + __m256i ymm0, ymm1; > + > + ymm0 =3D _mm256_stream_load_si256((const __m256i *)src); > + ymm1 =3D _mm256_stream_load_si256((const __m256i *)((uint8_t > *)src + 32)); > + _mm256_storeu_si256((__m256i *)dst, ymm0); > + _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 32), ymm1); } > + > +static void copy128B_ts(void *dst, void *src) { > + __m256i ymm0, ymm1, ymm2, ymm3; > + > + ymm0 =3D _mm256_stream_load_si256((const __m256i *)src); > + ymm1 =3D _mm256_stream_load_si256((const __m256i *)((uint8_t > *)src + 32)); > + ymm2 =3D _mm256_stream_load_si256((const __m256i *)((uint8_t > *)src + 64)); > + ymm3 =3D _mm256_stream_load_si256((const __m256i *)((uint8_t > *)src + 96)); > + _mm256_storeu_si256((__m256i *)dst, ymm0); > + _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 32), ymm1); > + _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 64), ymm2); > + _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 96), ymm3); } > + > +static void *memcpy_aligned_rx_tstore_16B(void *dst, void *src, int > +len) { > + void *dest =3D dst; > + > + while (len >=3D 128) { > + copy128B_ts(dst, src); > + dst =3D (uint8_t *)dst + 128; > + src =3D (uint8_t *)src + 128; > + len -=3D 128; > + } > + while (len >=3D 64) { > + copy64B_ts(dst, src); > + dst =3D (uint8_t *)dst + 64; > + src =3D (uint8_t *)src + 64; > + len -=3D 64; > + } > + while (len >=3D 32) { > + copy32B_ts(dst, src); > + dst =3D (uint8_t *)dst + 32; > + src =3D (uint8_t *)src + 32; > + len -=3D 32; > + } > + if (len >=3D 16) { > + copy16B_ts(dst, src); > + dst =3D (uint8_t *)dst + 16; > + src =3D (uint8_t *)src + 16; > + len -=3D 16; > + } > + if (len >=3D 8) { > + *(uint64_t *)dst =3D *(const uint64_t *)src; > + dst =3D (uint8_t *)dst + 8; > + src =3D (uint8_t *)src + 8; > + len -=3D 8; > + } > + if (len >=3D 4) { > + *(uint32_t *)dst =3D *(const uint32_t *)src; > + dst =3D (uint8_t *)dst + 4; > + src =3D (uint8_t *)src + 4; > + len -=3D 4; > + } > + if (len !=3D 0) { > + dst =3D (uint8_t *)dst - (4 - len); > + src =3D (uint8_t *)src - (4 - len); > + *(uint32_t *)dst =3D *(const uint32_t *)src; > + } > + > + return dest; > +} > +#endif > + > /** > * Build a table to translate Rx completion flags to packet type. > * > @@ -1707,6 +1798,9 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct > rte_mbuf **pkts, uint16_t pkts_n) > int32_t hdrm_overlap; > volatile struct mlx5_mini_cqe8 *mcqe =3D NULL; > uint32_t rss_hash_res =3D 0; > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + uintptr_t data_addr; > +#endif >=20 > if (consumed_strd =3D=3D strd_n) { > /* Replace WQE only if the buffer is still in use. */ > @@ -1772,12 +1866,30 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct > rte_mbuf **pkts, uint16_t pkts_n) > * - Out of buffer in the Mempool for Multi-Packet RQ. > * - The packet's stride overlaps a headroom and scatter is > off. > */ > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + if (unlikely(!rxq->mprq_tstore_memcpy) && > + len <=3D rxq->mprq_max_memcpy_len) { > + rte_prefetch1(addr); > + if (len > RTE_CACHE_LINE_SIZE) > + rte_prefetch2((void *)((uintptr_t)addr + > RTE_CACHE_LINE_SIZE)); > + } > +#endif > if (len <=3D rxq->mprq_max_memcpy_len || > rxq->mprq_repl =3D=3D NULL || > (hdrm_overlap > 0 && !rxq->strd_scatter_en)) { > if (likely(rte_pktmbuf_tailroom(pkt) >=3D len)) { > - rte_memcpy(rte_pktmbuf_mtod(pkt, void *), > - addr, len); > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + data_addr =3D > (uintptr_t)rte_pktmbuf_mtod(pkt, void *); > + if (!(rxq->mprq_tstore_memcpy)) > + rte_memcpy((void *)data_addr, > addr, len); > + else if ((rxq->mprq_tstore_memcpy) && > + !((data_addr | (uintptr_t)addr) & > ALIGNMENT_MASK)) > + > memcpy_aligned_rx_tstore_16B((void *)data_addr, > + addr, len); > + else > +#endif > + rte_memcpy(rte_pktmbuf_mtod(pkt, > void *), > + addr, len); > DATA_LEN(pkt) =3D len; > } else if (rxq->strd_scatter_en) { > struct rte_mbuf *prev =3D pkt; > diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h > index 9ffa028d2..a8ea1a795 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.h > +++ b/drivers/net/mlx5/mlx5_rxtx.h > @@ -153,6 +153,9 @@ struct mlx5_rxq_data { > uint32_t tunnel; /* Tunnel information. */ > uint64_t flow_meta_mask; > int32_t flow_meta_offset; > +#ifdef RTE_LIBRTE_MLX5_NTLOAD_TSTORE_ALIGN_COPY > + unsigned int mprq_tstore_memcpy:1; > +#endif > } __rte_cache_aligned; >=20 > enum mlx5_rxq_type { > diff --git a/meson_options.txt b/meson_options.txt index > 9bf18ab6b..a4bc565d2 100644 > --- a/meson_options.txt > +++ b/meson_options.txt > @@ -30,6 +30,8 @@ option('max_lcores', type: 'integer', value: 128, > description: 'maximum number of cores/threads supported by EAL') > option('max_numa_nodes', type: 'integer', value: 4, > description: 'maximum number of NUMA nodes supported by EAL') > +option('mlx5_ntload_tstore', type: 'boolean', value: false, > + description: 'to enable optimized MPRQ in RX datapath') > option('enable_trace_fp', type: 'boolean', value: false, > description: 'enable fast path trace points.') option('tests', type: > 'boolean', value: true, > -- > 2.25.1