From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E2444A0032; Tue, 16 Nov 2021 02:24:36 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 26C3D41147; Tue, 16 Nov 2021 02:24:31 +0100 (CET) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id EC15040141 for ; Tue, 16 Nov 2021 02:24:24 +0100 (CET) Received: from dggeme756-chm.china.huawei.com (unknown [172.30.72.57]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4HtSwc3HYJz8vRM; Tue, 16 Nov 2021 09:22:40 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by dggeme756-chm.china.huawei.com (10.3.19.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.20; Tue, 16 Nov 2021 09:24:22 +0800 From: "Min Hu (Connor)" To: CC: , Subject: [PATCH v2 1/2] net/hns3: optimized Tx performance by mbuf fast free Date: Tue, 16 Nov 2021 09:22:11 +0800 Message-ID: <20211116012212.64819-2-humin29@huawei.com> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211116012212.64819-1-humin29@huawei.com> References: <20211111133859.13705-1-humin29@huawei.com> <20211116012212.64819-1-humin29@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggeme756-chm.china.huawei.com (10.3.19.102) X-CFilter-Loop: Reflected X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Chengwen Feng Currently the vector and simple xmit algorithm don't support multi_segs, so if Tx offload support MBUF_FAST_FREE, driver could invoke rte_mempool_put_bulk() to free Tx mbufs in this situation. In the testpmd single core MAC forwarding scenario, the performance is improved by 8% at 64B on Kunpeng920 platform. Cc: stable@dpdk.org Signed-off-by: Chengwen Feng --- doc/guides/nics/features/hns3.ini | 1 + drivers/net/hns3/hns3_rxtx.c | 11 +++++++++++ drivers/net/hns3/hns3_rxtx.h | 2 ++ drivers/net/hns3/hns3_rxtx_vec.h | 9 +++++++++ 4 files changed, 23 insertions(+) diff --git a/doc/guides/nics/features/hns3.ini b/doc/guides/nics/features/hns3.ini index c3464c8396..405b94f05c 100644 --- a/doc/guides/nics/features/hns3.ini +++ b/doc/guides/nics/features/hns3.ini @@ -12,6 +12,7 @@ Queue start/stop = Y Runtime Rx queue setup = Y Runtime Tx queue setup = Y Burst mode info = Y +Fast mbuf free = Y Free Tx mbuf on demand = Y MTU update = Y Scattered Rx = Y diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c index d26e262335..f0a57611ec 100644 --- a/drivers/net/hns3/hns3_rxtx.c +++ b/drivers/net/hns3/hns3_rxtx.c @@ -3059,6 +3059,8 @@ hns3_tx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t nb_desc, txq->min_tx_pkt_len = hw->min_tx_pkt_len; txq->tso_mode = hw->tso_mode; txq->udp_cksum_mode = hw->udp_cksum_mode; + txq->mbuf_fast_free_en = !!(dev->data->dev_conf.txmode.offloads & + RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE); memset(&txq->basic_stats, 0, sizeof(struct hns3_tx_basic_stats)); memset(&txq->dfx_stats, 0, sizeof(struct hns3_tx_dfx_stats)); @@ -3991,6 +3993,14 @@ hns3_tx_free_buffer_simple(struct hns3_tx_queue *txq) tx_entry = &txq->sw_ring[txq->next_to_clean]; + if (txq->mbuf_fast_free_en) { + rte_mempool_put_bulk(tx_entry->mbuf->pool, + (void **)tx_entry, txq->tx_rs_thresh); + for (i = 0; i < txq->tx_rs_thresh; i++) + tx_entry[i].mbuf = NULL; + goto update_field; + } + for (i = 0; i < txq->tx_rs_thresh; i++) rte_prefetch0((tx_entry + i)->mbuf); for (i = 0; i < txq->tx_rs_thresh; i++, tx_entry++) { @@ -3998,6 +4008,7 @@ hns3_tx_free_buffer_simple(struct hns3_tx_queue *txq) tx_entry->mbuf = NULL; } +update_field: txq->next_to_clean = (tx_next_clean + 1) % txq->nb_tx_desc; txq->tx_bd_ready += txq->tx_rs_thresh; } diff --git a/drivers/net/hns3/hns3_rxtx.h b/drivers/net/hns3/hns3_rxtx.h index 63bafc68b6..df731856ef 100644 --- a/drivers/net/hns3/hns3_rxtx.h +++ b/drivers/net/hns3/hns3_rxtx.h @@ -495,6 +495,8 @@ struct hns3_tx_queue { * this point. */ uint16_t pvid_sw_shift_en:1; + /* check whether the mbuf fast free offload is enabled */ + uint16_t mbuf_fast_free_en:1; /* * For better performance in tx datapath, releasing mbuf in batches is diff --git a/drivers/net/hns3/hns3_rxtx_vec.h b/drivers/net/hns3/hns3_rxtx_vec.h index 67c75e44ef..4985a7cae8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec.h +++ b/drivers/net/hns3/hns3_rxtx_vec.h @@ -18,6 +18,14 @@ hns3_tx_bulk_free_buffers(struct hns3_tx_queue *txq) int i; tx_entry = &txq->sw_ring[txq->next_to_clean]; + if (txq->mbuf_fast_free_en) { + rte_mempool_put_bulk(tx_entry->mbuf->pool, (void **)tx_entry, + txq->tx_rs_thresh); + for (i = 0; i < txq->tx_rs_thresh; i++) + tx_entry[i].mbuf = NULL; + goto update_field; + } + for (i = 0; i < txq->tx_rs_thresh; i++, tx_entry++) { m = rte_pktmbuf_prefree_seg(tx_entry->mbuf); tx_entry->mbuf = NULL; @@ -36,6 +44,7 @@ hns3_tx_bulk_free_buffers(struct hns3_tx_queue *txq) if (nb_free) rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free); +update_field: /* Update numbers of available descriptor due to buffer freed */ txq->tx_bd_ready += txq->tx_rs_thresh; txq->next_to_clean += txq->tx_rs_thresh; -- 2.33.0