From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B43DB43C4F; Tue, 5 Mar 2024 09:11:09 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8C3DB4026B; Tue, 5 Mar 2024 09:11:09 +0100 (CET) Received: from smtpbgau2.qq.com (smtpbgau2.qq.com [54.206.34.216]) by mails.dpdk.org (Postfix) with ESMTP id 2D3294014F for ; Tue, 5 Mar 2024 09:11:06 +0100 (CET) X-QQ-mid: Yeas3t1709626252t494t50042 Received: from 3DB253DBDE8942B29385B9DFB0B7E889 (jiawenwu@trustnetic.com [220.184.149.201]) X-QQ-SSF: 00400000000000F0FUF000000000000 From: =?utf-8?b?Smlhd2VuIFd1?= X-BIZMAIL-ID: 11638575351528147012 To: , References: <20240201030019.21336-1-jiawenwu@trustnetic.com> <20240201030019.21336-2-jiawenwu@trustnetic.com> <4a0e5000-3ae3-4894-a23d-715801f3c3b7@amd.com> In-Reply-To: <4a0e5000-3ae3-4894-a23d-715801f3c3b7@amd.com> Subject: RE: [PATCH 1/2] net/txgbe: add vectorized functions for Rx/Tx Date: Tue, 5 Mar 2024 16:10:51 +0800 Message-ID: <07f901da6ed4$a3914660$eab3d320$@trustnetic.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: zh-cn Thread-Index: AQH22iClLZIszsqol4Pnvg9eSGq00wJBYoQQAnLpCT6wyfWDEA== X-QQ-SENDSIZE: 520 Feedback-ID: Yeas:trustnetic.com:qybglogicsvrgz:qybglogicsvrgz8a-1 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Feb 7, 2024 11:13 AM, Ferruh.Yigit@amd.com wrote: > On 2/1/2024 3:00 AM, Jiawen Wu wrote: > > To optimize Rx/Tx burst process, add SSE/NEON vector instructions on > > x86/arm architecture. > > >=20 > Do you have any performance improvement number with vector > implementation, if so can you put it into commit log for record? On our local x86 platforms, the performance was at full speed without using vector. So we don't have the performance improvement number with SSE yet. But I will add the test result for arm. > > @@ -2198,8 +2220,15 @@ txgbe_set_tx_function(struct rte_eth_dev = *dev, struct txgbe_tx_queue *txq) > > #endif > > txq->tx_free_thresh >=3D RTE_PMD_TXGBE_TX_MAX_BURST) { > > PMD_INIT_LOG(DEBUG, "Using simple tx code path"); > > - dev->tx_pkt_burst =3D txgbe_xmit_pkts_simple; > > dev->tx_pkt_prepare =3D NULL; > > + if (txq->tx_free_thresh <=3D RTE_TXGBE_TX_MAX_FREE_BUF_SZ && > > + (rte_eal_process_type() !=3D RTE_PROC_PRIMARY || > > >=20 > Why vector Tx enable only for secondary process? It is not only for secondary process. The constraint is (rte_eal_process_type() !=3D RTE_PROC_PRIMARY || = txgbe_txq_vec_setup(txq) =3D=3D 0) This code references ixgbe, which explains: "When using multiple processes, the TX function used in all processes should be the same, otherwise the secondary processes cannot transmit more than tx-ring-size - 1 packets. To achieve this, we extract out the code to select the ixgbe TX = function to be used into a separate function inside the ixgbe driver, and call that from a secondary process when it is attaching to an already-configured NIC." > > +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c > > @@ -0,0 +1,604 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2015-2024 Beijing WangXun Technology Co., Ltd. > > + * Copyright(c) 2010-2015 Intel Corporation > > + */ > > + > > +#include > > +#include > > +#include > > + > > +#include "txgbe_ethdev.h" > > +#include "txgbe_rxtx.h" > > +#include "txgbe_rxtx_vec_common.h" > > + > > +#pragma GCC diagnostic ignored "-Wcast-qual" > > + >=20 > Is this pragma really required? Yes. Otherwise, there are warnings in the compilation: [1909/2921] Compiling C object = drivers/libtmp_rte_net_txgbe.a.p/net_txgbe_txgbe_rxtx_vec_neon.c.o ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c: In function = =E2=80=98txgbe_rxq_rearm=E2=80=99: ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:37:15: warning: cast discards = =E2=80=98volatile=E2=80=99 qualifier from pointer target type = [-Wcast-qual] vst1q_u64((uint64_t *)&rxdp[i], zero); ^ ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:60:13: warning: cast discards = =E2=80=98volatile=E2=80=99 qualifier from pointer target type = [-Wcast-qual] vst1q_u64((uint64_t *)rxdp++, dma_addr0); ^ ../drivers/net/txgbe/txgbe_rxtx_vec_neon.c:65:13: warning: cast discards = =E2=80=98volatile=E2=80=99 qualifier from pointer target type = [-Wcast-qual] vst1q_u64((uint64_t *)rxdp++, dma_addr1);