From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 551C14280B; Wed, 22 Mar 2023 13:56:42 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 31C0740E09; Wed, 22 Mar 2023 13:56:42 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id D9D7E40A84 for ; Wed, 22 Mar 2023 13:56:40 +0100 (CET) Content-class: urn:content-classes:message Subject: RE: [PATCH v3 0/3] Direct re-arming of buffers on receive side MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Wed, 22 Mar 2023 13:56:39 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D877E3@smartserver.smartshare.dk> X-MimeOLE: Produced By Microsoft Exchange V6.5 In-Reply-To: <20230104073043.1120168-1-feifei.wang2@arm.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH v3 0/3] Direct re-arming of buffers on receive side Thread-Index: AdkgDntIx/8vWsFOS9CajMKGnsFCcw8rPrkg References: <20220420081650.2043183-1-feifei.wang2@arm.com> <20230104073043.1120168-1-feifei.wang2@arm.com> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Feifei Wang" Cc: , , , , "Yuying Zhang" , "Beilei Xing" , "Ruifeng Wang" , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > From: Feifei Wang [mailto:feifei.wang2@arm.com] > Sent: Wednesday, 4 January 2023 08.31 >=20 > Currently, the transmit side frees the buffers into the lcore cache = and > the receive side allocates buffers from the lcore cache. The transmit > side typically frees 32 buffers resulting in 32*8=3D256B of stores to > lcore cache. The receive side allocates 32 buffers and stores them in > the receive side software ring, resulting in 32*8=3D256B of stores and > 256B of load from the lcore cache. >=20 > This patch proposes a mechanism to avoid freeing to/allocating from > the lcore cache. i.e. the receive side will free the buffers from > transmit side directly into its software ring. This will avoid the = 256B > of loads and stores introduced by the lcore cache. It also frees up = the > cache lines used by the lcore cache. I am starting to wonder if we have been adding unnecessary feature creep = in order to make this feature too generic. Could you please describe some of the most important high-volume use = cases from real life? It would help setting the scope correctly. >=20 > However, this solution poses several constraints: >=20 > 1)The receive queue needs to know which transmit queue it should take > the buffers from. The application logic decides which transmit port to > use to send out the packets. In many use cases the NIC might have a > single port ([1], [2], [3]), in which case a given transmit queue is > always mapped to a single receive queue (1:1 Rx queue: Tx queue). This > is easy to configure. >=20 > If the NIC has 2 ports (there are several references), then we will = have > 1:2 (RX queue: TX queue) mapping which is still easy to configure. > However, if this is generalized to 'N' ports, the configuration can be > long. More over the PMD would have to scan a list of transmit queues = to > pull the buffers from. >=20 > 2)The other factor that needs to be considered is 'run-to-completion' = vs > 'pipeline' models. In the run-to-completion model, the receive side = and > the transmit side are running on the same lcore serially. In the = pipeline > model. The receive side and transmit side might be running on = different > lcores in parallel. This requires locking. This is not supported at = this > point. >=20 > 3)Tx and Rx buffers must be from the same mempool. And we also must > ensure Tx buffer free number is equal to Rx buffer free number. > Thus, 'tx_next_dd' can be updated correctly in direct-rearm mode. This > is due to tx_next_dd is a variable to compute tx sw-ring free = location. > Its value will be one more round than the position where next time = free > starts. >=20 > Current status in this patch: > 1)Two APIs are added for users to enable direct-rearm mode: > In control plane, users can call 'rte_eth_rx_queue_rearm_data_get' > to get Rx sw_ring pointer and its rxq_info. > (This avoid Tx load Rx data directly); >=20 > In data plane, users can call 'rte_eth_dev_direct_rearm' to rearm = Rx > buffers and free Tx buffers at the same time. Specifically, in this > API, there are two separated API for Rx and Tx. > For Tx, 'rte_eth_tx_fill_sw_ring' can fill a given sw_ring by Tx = freed > buffers. > For Rx, 'rte_eth_rx_flush_descriptor' can flush its descriptors = based > on the rearm buffers. > Thus, this can separate Rx and Tx operation, and user can even = re-arm > RX queue not from the same driver's TX queue, but from different > sources too. > = ----------------------------------------------------------------------- > control plane: > rte_eth_rx_queue_rearm_data_get(*rxq_rearm_data); > data plane: > loop { > rte_eth_dev_direct_rearm(*rxq_rearm_data){ >=20 > rte_eth_tx_fill_sw_ring{ > for (i =3D 0; i <=3D 32; i++) { > sw_ring.mbuf[i] =3D tx.mbuf[i]; > } > } >=20 > rte_eth_rx_flush_descriptor{ > for (i =3D 0; i <=3D 32; i++) { > flush descs[i]; > } > } > } > rte_eth_rx_burst; > rte_eth_tx_burst; > } > = ----------------------------------------------------------------------- > 2)The i40e driver is changed to do the direct re-arm of the receive > side. > 3)The ixgbe driver is changed to do the direct re-arm of the receive > side. >=20 > Testing status: > (1) dpdk l3fwd test with multiple drivers: > port 0: 82599 NIC port 1: XL710 NIC > ------------------------------------------------------------- > Without fast free With fast free > Thunderx2: +9.44% +7.14% > ------------------------------------------------------------- >=20 > (2) dpdk l3fwd test with same driver: > port 0 && 1: XL710 NIC > ------------------------------------------------------------- > *Direct rearm with exposing rx_sw_ring: > Without fast free With fast free > Ampere altra: +14.98% +15.77% > n1sdp: +6.47% +0.52% > ------------------------------------------------------------- >=20 > (3) VPP test with same driver: > port 0 && 1: XL710 NIC > ------------------------------------------------------------- > *Direct rearm with exposing rx_sw_ring: > Ampere altra: +4.59% > n1sdp: +5.4% > ------------------------------------------------------------- >=20 > Reference: > [1] = https://store.nvidia.com/en-us/networking/store/product/MCX623105AN- > = CDAT/NVIDIAMCX623105ANCDATConnectX6DxENAdapterCard100GbECryptoDisabled/ > [2] https://www.intel.com/content/www/us/en/products/sku/192561/intel- > ethernet-network-adapter-e810cqda1/specifications.html > [3] https://www.broadcom.com/products/ethernet-connectivity/network- > adapters/100gb-nic-ocp/n1100g >=20 > V2: > 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) > 2. Add 'txq_data_get' API to get txq info for Rx (Konstantin) > 3. Use input parameter to enable direct rearm in l3fwd (Konstantin) > 4. Add condition detection for direct rearm API (Morten, Andrew = Rybchenko) >=20 > V3: > 1. Seperate Rx and Tx operation with two APIs in direct-rearm = (Konstantin) > 2. Delete L3fwd change for direct rearm (Jerin) > 3. enable direct rearm in ixgbe driver in Arm >=20 > Feifei Wang (3): > ethdev: enable direct rearm with separate API > net/i40e: enable direct rearm with separate API > net/ixgbe: enable direct rearm with separate API >=20 > drivers/net/i40e/i40e_ethdev.c | 1 + > drivers/net/i40e/i40e_ethdev.h | 2 + > drivers/net/i40e/i40e_rxtx.c | 19 +++ > drivers/net/i40e/i40e_rxtx.h | 4 + > drivers/net/i40e/i40e_rxtx_vec_common.h | 54 +++++++ > drivers/net/i40e/i40e_rxtx_vec_neon.c | 42 ++++++ > drivers/net/ixgbe/ixgbe_ethdev.c | 1 + > drivers/net/ixgbe/ixgbe_ethdev.h | 3 + > drivers/net/ixgbe/ixgbe_rxtx.c | 19 +++ > drivers/net/ixgbe/ixgbe_rxtx.h | 4 + > drivers/net/ixgbe/ixgbe_rxtx_vec_common.h | 48 ++++++ > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 52 +++++++ > lib/ethdev/ethdev_driver.h | 10 ++ > lib/ethdev/ethdev_private.c | 2 + > lib/ethdev/rte_ethdev.c | 52 +++++++ > lib/ethdev/rte_ethdev.h | 174 = ++++++++++++++++++++++ > lib/ethdev/rte_ethdev_core.h | 11 ++ > lib/ethdev/version.map | 6 + > 18 files changed, 504 insertions(+) >=20 > -- > 2.25.1 >=20