From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3235AA0032; Sat, 17 Sep 2022 06:22:09 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 27C6742847; Sat, 17 Sep 2022 06:21:53 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 4E89A410D0 for ; Sat, 17 Sep 2022 06:21:49 +0200 (CEST) Received: from dggpeml500024.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4MTyN46NQzzmVQj; Sat, 17 Sep 2022 12:17:56 +0800 (CST) Received: from localhost.localdomain (10.67.165.24) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Sat, 17 Sep 2022 12:21:46 +0800 From: Chengwen Feng To: , CC: , , , , , <3chas3@gmail.com> Subject: [PATCH v3 1/3] net/bonding: support Tx prepare Date: Sat, 17 Sep 2022 04:15:35 +0000 Message-ID: <20220917041537.6821-2-fengchengwen@huawei.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220917041537.6821-1-fengchengwen@huawei.com> References: <1619171202-28486-2-git-send-email-tangchengchang@huawei.com> <20220917041537.6821-1-fengchengwen@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.67.165.24] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Normally, to use the HW offloads capability (e.g. checksum and TSO) in the Tx direction, the application needs to call rte_eth_dev_prepare to do some adjustment with the packets before sending them (e.g. processing pseudo headers when Tx checksum offload enabled). But, the tx_prepare callback of the bonding driver is not implemented. Therefore, the sent packets may have errors (e.g. checksum errors). However, it is difficult to design the tx_prepare callback for bonding driver. Because when a bonded device sends packets, the bonded device allocates the packets to different slave devices based on the real-time link status and bonding mode. That is, it is very difficult for the bonded device to determine which slave device's prepare function should be invoked. So, in this patch, the tx_prepare callback of bonding driver is not implemented. Instead, the rte_eth_dev_tx_prepare() will be called for all the fast path packets in mode 0, 1, 2, 4, 5, 6 (mode 3 is not included, see[1]). In this way, all tx_offloads can be processed correctly for all NIC devices in these modes. As previously discussed (see [2]), for user input mbufs, if the tx_prepare fails, the bonding driver will free the cossesponding packets internally, and only the packets of the tx_prepare OK are xmit. To minimize performance impact, this patch adds one new 'tx_prepare_enabled' field, and corresponding control and get API: rte_eth_bond_tx_prepare_set() and rte_eth_bond_tx_prepare_get(). [1]: In bond mode 3 (broadcast), a packet needs to be sent by all slave ports. Different slave PMDs process the packets differently in tx_prepare. If call tx_prepare before each slave port sending, the sent packet may be incorrect. [2]: https://inbox.dpdk.org/dev/1618571071-5927-2-git-send-email-tangchengchang@huawei.com/ Signed-off-by: Chengchang Tang Signed-off-by: Chengwen Feng Reviewed-by: Min Hu (Connor) --- .../link_bonding_poll_mode_drv_lib.rst | 5 + doc/guides/rel_notes/release_22_11.rst | 6 ++ drivers/net/bonding/eth_bond_private.h | 6 ++ drivers/net/bonding/rte_eth_bond.h | 24 +++++ drivers/net/bonding/rte_eth_bond_8023ad.c | 8 +- drivers/net/bonding/rte_eth_bond_api.c | 32 ++++++ drivers/net/bonding/rte_eth_bond_pmd.c | 101 +++++++++++++++--- drivers/net/bonding/version.map | 5 + 8 files changed, 169 insertions(+), 18 deletions(-) diff --git a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst index 9510368103..a3d91b2091 100644 --- a/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst +++ b/doc/guides/prog_guide/link_bonding_poll_mode_drv_lib.rst @@ -359,6 +359,11 @@ The link status of a bonded device is dictated by that of its slaves, if all slave device link status are down or if all slaves are removed from the link bonding device then the link status of the bonding device will go down. +Unlike normal PMD drivers, the Tx prepare for the bonding driver is controlled +by ``rte_eth_bond_tx_prepare_set`` (all bond modes except mode 3 (broadcast) +are supported). The ``rte_eth_bond_tx_prepare_get`` for querying the enabling +status is provided. + It is also possible to configure / query the configuration of the control parameters of a bonded device using the provided APIs ``rte_eth_bond_mode_set/ get``, ``rte_eth_bond_primary_set/get``, diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst index 8c021cf050..6e28a6c0af 100644 --- a/doc/guides/rel_notes/release_22_11.rst +++ b/doc/guides/rel_notes/release_22_11.rst @@ -55,6 +55,12 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Added Tx prepare for bonding driver.** + + * Added ``rte_eth_bond_tx_prepare_set`` to set whether enable Tx prepare for bonded port. + All bond modes except mode 3 (broadcast) are supported. + * Added ``rte_eth_bond_tx_prepare_get`` to get whether Tx prepare enabled for bonded port. + Removed Items ------------- diff --git a/drivers/net/bonding/eth_bond_private.h b/drivers/net/bonding/eth_bond_private.h index 8222e3cd38..976163b06b 100644 --- a/drivers/net/bonding/eth_bond_private.h +++ b/drivers/net/bonding/eth_bond_private.h @@ -117,6 +117,7 @@ struct bond_dev_private { uint16_t user_defined_primary_port; /**< Flag for whether primary port is user defined or not */ + uint8_t tx_prepare_enabled; uint8_t balance_xmit_policy; /**< Transmit policy - l2 / l23 / l34 for operation in balance mode */ burst_xmit_hash_t burst_xmit_hash; @@ -258,6 +259,11 @@ void slave_add(struct bond_dev_private *internals, struct rte_eth_dev *slave_eth_dev); +uint16_t +bond_ethdev_tx_ctrl_wrap(struct bond_dev_private *internals, + uint16_t slave_port_id, uint16_t queue_id, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts); + void burst_xmit_l2_hash(struct rte_mbuf **buf, uint16_t nb_pkts, uint16_t slave_count, uint16_t *slaves); diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h index 874aa91a5f..deae2dd9ad 100644 --- a/drivers/net/bonding/rte_eth_bond.h +++ b/drivers/net/bonding/rte_eth_bond.h @@ -343,6 +343,30 @@ rte_eth_bond_link_up_prop_delay_set(uint16_t bonded_port_id, int rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id); +/** + * Set whether enable Tx-prepare (rte_eth_tx_prepare) for bonded port + * + * @param bonded_port_id Bonded device id + * @param en Enable flag + * + * @return + * 0 on success, negative value otherwise. + */ +__rte_experimental +int +rte_eth_bond_tx_prepare_set(uint16_t bonded_port_id, bool en); + +/** + * Get whether Tx-prepare (rte_eth_tx_prepare) is enabled for bonded port + * + * @param bonded_port_id Bonded device id + * + * @return + * 0-disabled, 1-enabled, negative value otherwise. + */ +__rte_experimental +int +rte_eth_bond_tx_prepare_get(uint16_t bonded_port_id); #ifdef __cplusplus } diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c index b3cddd8a20..d5951d684e 100644 --- a/drivers/net/bonding/rte_eth_bond_8023ad.c +++ b/drivers/net/bonding/rte_eth_bond_8023ad.c @@ -636,8 +636,8 @@ tx_machine(struct bond_dev_private *internals, uint16_t slave_id) return; } } else { - uint16_t pkts_sent = rte_eth_tx_burst(slave_id, - internals->mode4.dedicated_queues.tx_qid, + uint16_t pkts_sent = bond_ethdev_tx_ctrl_wrap(internals, + slave_id, internals->mode4.dedicated_queues.tx_qid, &lacp_pkt, 1); if (pkts_sent != 1) { rte_pktmbuf_free(lacp_pkt); @@ -1371,8 +1371,8 @@ bond_mode_8023ad_handle_slow_pkt(struct bond_dev_private *internals, } } else { /* Send packet directly to the slow queue */ - uint16_t tx_count = rte_eth_tx_burst(slave_id, - internals->mode4.dedicated_queues.tx_qid, + uint16_t tx_count = bond_ethdev_tx_ctrl_wrap(internals, + slave_id, internals->mode4.dedicated_queues.tx_qid, &pkt, 1); if (tx_count != 1) { /* reset timer */ diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index 4ac191c468..47841289f4 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -1070,3 +1070,35 @@ rte_eth_bond_link_up_prop_delay_get(uint16_t bonded_port_id) return internals->link_up_delay_ms; } + +int +rte_eth_bond_tx_prepare_set(uint16_t bonded_port_id, bool en) +{ + struct bond_dev_private *internals; + + if (valid_bonded_port_id(bonded_port_id) != 0) + return -1; + + internals = rte_eth_devices[bonded_port_id].data->dev_private; + if (internals->mode == BONDING_MODE_BROADCAST) { + RTE_BOND_LOG(ERR, "Mode broadcast don't support to configure Tx-prepare"); + return -ENOTSUP; + } + + internals->tx_prepare_enabled = en ? 1 : 0; + + return 0; +} + +int +rte_eth_bond_tx_prepare_get(uint16_t bonded_port_id) +{ + struct bond_dev_private *internals; + + if (valid_bonded_port_id(bonded_port_id) != 0) + return -1; + + internals = rte_eth_devices[bonded_port_id].data->dev_private; + + return internals->tx_prepare_enabled; +} diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c index 73e6972035..ec9d7d7bab 100644 --- a/drivers/net/bonding/rte_eth_bond_pmd.c +++ b/drivers/net/bonding/rte_eth_bond_pmd.c @@ -559,6 +559,76 @@ bond_ethdev_rx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return nb_recv_pkts; } +/** This function is used to transmit internally generated mbufs. */ +uint16_t +bond_ethdev_tx_ctrl_wrap(struct bond_dev_private *internals, + uint16_t slave_port_id, uint16_t queue_id, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts) +{ + uint16_t nb_pre = nb_pkts; + + if (internals->tx_prepare_enabled) + nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, tx_pkts, + nb_pkts); + + return rte_eth_tx_burst(slave_port_id, queue_id, tx_pkts, nb_pre); +} + +/** + * This function is used to transmit the mbufs input by the user. + * If the tx-prepare is enabled, the mbufs that fail with tx-prepare will be + * freed internally. + */ +static inline uint16_t +bond_ethdev_tx_user_wrap(struct bond_tx_queue *bd_tx_q, uint16_t slave_port_id, + struct rte_mbuf **tx_pkts, uint16_t nb_pkts) +{ + struct bond_dev_private *internals = bd_tx_q->dev_private; + uint16_t queue_id = bd_tx_q->queue_id; + struct rte_mbuf *fail_pkts[nb_pkts]; + uint8_t fail_mark[nb_pkts]; + uint16_t nb_pre, index; + uint16_t fail_cnt = 0; + int i; + + if (!internals->tx_prepare_enabled) + goto tx_burst; + + nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, tx_pkts, nb_pkts); + if (nb_pre == nb_pkts) + goto tx_burst; + + fail_pkts[fail_cnt++] = tx_pkts[nb_pre]; + memset(fail_mark, 0, sizeof(fail_mark)); + fail_mark[nb_pre] = 1; + for (i = nb_pre + 1; i < nb_pkts; /* update in inner loop */) { + nb_pre = rte_eth_tx_prepare(slave_port_id, queue_id, + tx_pkts + i, nb_pkts - i); + if (nb_pre == nb_pkts - i) + break; + fail_pkts[fail_cnt++] = tx_pkts[i + nb_pre]; + fail_mark[i + nb_pre] = 1; + i += nb_pre + 1; + } + + /* move tx-prepare OK mbufs to the end */ + for (i = index = nb_pkts - 1; i >= 0; i--) { + if (!fail_mark[i]) + tx_pkts[index--] = tx_pkts[i]; + } + /* move tx-prepare fail mbufs to the begin, and free them */ + for (i = 0; i < fail_cnt; i++) { + tx_pkts[i] = fail_pkts[i]; + rte_pktmbuf_free(fail_pkts[i]); + } + + if (fail_cnt == nb_pkts) + return nb_pkts; +tx_burst: + return fail_cnt + rte_eth_tx_burst(slave_port_id, queue_id, + tx_pkts + fail_cnt, nb_pkts - fail_cnt); +} + static uint16_t bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) @@ -602,8 +672,9 @@ bond_ethdev_tx_burst_round_robin(void *queue, struct rte_mbuf **bufs, /* Send packet burst on each slave device */ for (i = 0; i < num_of_slaves; i++) { if (slave_nb_pkts[i] > 0) { - num_tx_slave = rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, - slave_bufs[i], slave_nb_pkts[i]); + num_tx_slave = bond_ethdev_tx_user_wrap(bd_tx_q, + slaves[i], slave_bufs[i], + slave_nb_pkts[i]); /* if tx burst fails move packets to end of bufs */ if (unlikely(num_tx_slave < slave_nb_pkts[i])) { @@ -635,8 +706,8 @@ bond_ethdev_tx_burst_active_backup(void *queue, if (internals->active_slave_count < 1) return 0; - return rte_eth_tx_burst(internals->current_primary_port, bd_tx_q->queue_id, - bufs, nb_pkts); + return bond_ethdev_tx_user_wrap(bd_tx_q, + internals->current_primary_port, bufs, nb_pkts); } static inline uint16_t @@ -951,8 +1022,8 @@ bond_ethdev_tx_burst_tlb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) #endif } - num_tx_total += rte_eth_tx_burst(slaves[i], bd_tx_q->queue_id, - bufs + num_tx_total, nb_pkts - num_tx_total); + num_tx_total += bond_ethdev_tx_user_wrap(bd_tx_q, slaves[i], + bufs + num_tx_total, nb_pkts - num_tx_total); if (num_tx_total == nb_pkts) break; @@ -1064,8 +1135,9 @@ bond_ethdev_tx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) /* Send ARP packets on proper slaves */ for (i = 0; i < RTE_MAX_ETHPORTS; i++) { if (slave_bufs_pkts[i] > 0) { - num_send = rte_eth_tx_burst(i, bd_tx_q->queue_id, - slave_bufs[i], slave_bufs_pkts[i]); + num_send = bond_ethdev_tx_ctrl_wrap(internals, i, + bd_tx_q->queue_id, slave_bufs[i], + slave_bufs_pkts[i]); for (j = 0; j < slave_bufs_pkts[i] - num_send; j++) { bufs[nb_pkts - 1 - num_not_send - j] = slave_bufs[i][nb_pkts - 1 - j]; @@ -1088,7 +1160,8 @@ bond_ethdev_tx_burst_alb(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) /* Send update packets on proper slaves */ for (i = 0; i < RTE_MAX_ETHPORTS; i++) { if (update_bufs_pkts[i] > 0) { - num_send = rte_eth_tx_burst(i, bd_tx_q->queue_id, update_bufs[i], + num_send = bond_ethdev_tx_ctrl_wrap(internals, i, + bd_tx_q->queue_id, update_bufs[i], update_bufs_pkts[i]); for (j = num_send; j < update_bufs_pkts[i]; j++) { rte_pktmbuf_free(update_bufs[i][j]); @@ -1158,9 +1231,8 @@ tx_burst_balance(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs, if (slave_nb_bufs[i] == 0) continue; - slave_tx_count = rte_eth_tx_burst(slave_port_ids[i], - bd_tx_q->queue_id, slave_bufs[i], - slave_nb_bufs[i]); + slave_tx_count = bond_ethdev_tx_user_wrap(bd_tx_q, + slave_port_ids[i], slave_bufs[i], slave_nb_bufs[i]); total_tx_count += slave_tx_count; @@ -1243,8 +1315,9 @@ tx_burst_8023ad(void *queue, struct rte_mbuf **bufs, uint16_t nb_bufs, if (rte_ring_dequeue(port->tx_ring, (void **)&ctrl_pkt) != -ENOENT) { - slave_tx_count = rte_eth_tx_burst(slave_port_ids[i], - bd_tx_q->queue_id, &ctrl_pkt, 1); + slave_tx_count = bond_ethdev_tx_ctrl_wrap(internals, + slave_port_ids[i], bd_tx_q->queue_id, + &ctrl_pkt, 1); /* * re-enqueue LAG control plane packets to buffering * ring if transmission fails so the packet isn't lost. diff --git a/drivers/net/bonding/version.map b/drivers/net/bonding/version.map index 9333923b4e..2c121f2559 100644 --- a/drivers/net/bonding/version.map +++ b/drivers/net/bonding/version.map @@ -31,3 +31,8 @@ DPDK_23 { local: *; }; + +EXPERIMENTAL { + rte_eth_bond_tx_prepare_get; + rte_eth_bond_tx_prepare_set; +}; \ No newline at end of file -- 2.17.1