From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D7F8AA046B for ; Tue, 23 Jul 2019 03:01:25 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B2A131B998; Tue, 23 Jul 2019 03:01:25 +0200 (CEST) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 82EEF1B998 for ; Tue, 23 Jul 2019 03:01:24 +0200 (CEST) Received: from Internal Mail-Server by MTLPINE2 (envelope-from yskoh@mellanox.com) with ESMTPS (AES256-SHA encrypted); 23 Jul 2019 04:01:20 +0300 Received: from scfae-sc-2.mti.labs.mlnx (scfae-sc-2.mti.labs.mlnx [10.101.0.96]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x6N11Hem026580; Tue, 23 Jul 2019 04:01:19 +0300 From: Yongseok Koh To: Yongseok Koh Cc: Shahaf Shuler , dpdk stable Date: Mon, 22 Jul 2019 17:59:29 -0700 Message-Id: <20190723010115.6446-2-yskoh@mellanox.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190723010115.6446-1-yskoh@mellanox.com> References: <20190723010115.6446-1-yskoh@mellanox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-stable] patch 'net/mlx5: fix instruction hotspot on replenishing Rx buffer' has been queued to LTS release 17.11.7 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, FYI, your patch has been queued to LTS release 17.11.7 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objection by 07/27/19. So please shout if anyone has objection. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Thanks. Yongseok --- >From 0ef4e25f6173b692ca138ab0aba42d80a8645e28 Mon Sep 17 00:00:00 2001 From: Yongseok Koh Date: Mon, 14 Jan 2019 13:16:22 -0800 Subject: [PATCH] net/mlx5: fix instruction hotspot on replenishing Rx buffer [ backported from upstream commit 9c55c6bd86156d17df93bf947dc620222ee9f7e4 ] On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed to be accessed as it is static and easily calculated from the mbuf address. Accessing the mbuf content causes unnecessary load stall. non-x86 processors (mostly RISC such as ARM and Power) are more vulnerable to load stall. For x86, reducing the number of instructions seems to matter most. Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx") Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- drivers/net/mlx5/mlx5_rxtx_vec.h | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h index 750559b8d1..e7367b74d8 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h @@ -116,7 +116,22 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, uint16_t n) return; } for (i = 0; i < n; ++i) { - wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr + + void *buf_addr; + + /* + * Load the virtual address for Rx WQE. non-x86 processors + * (mostly RISC such as ARM and Power) are more vulnerable to + * load stall. For x86, reducing the number of instructions + * seems to matter most. + */ +#ifdef RTE_ARCH_X86_64 + buf_addr = elts[i]->buf_addr; +#else + buf_addr = (char *)elts[i] + sizeof(struct rte_mbuf) + + rte_pktmbuf_priv_size(rxq->mp); + assert(buf_addr == elts[i]->buf_addr); +#endif + wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + RTE_PKTMBUF_HEADROOM); /* If there's only one MR, no need to replace LKEY in WQEs. */ if (unlikely(!IS_SINGLE_MR(rxq->mr_ctrl.bh_n))) -- 2.21.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2019-07-22 17:55:06.640266135 -0700 +++ 0002-net-mlx5-fix-instruction-hotspot-on-replenishing-Rx-.patch 2019-07-22 17:55:05.698471000 -0700 @@ -1,37 +1,35 @@ -From 9c55c6bd86156d17df93bf947dc620222ee9f7e4 Mon Sep 17 00:00:00 2001 +From 0ef4e25f6173b692ca138ab0aba42d80a8645e28 Mon Sep 17 00:00:00 2001 From: Yongseok Koh -Date: Mon, 25 Mar 2019 12:13:10 -0700 -Subject: [PATCH] net/mlx5: revert mbuf address calculation for x86 +Date: Mon, 14 Jan 2019 13:16:22 -0800 +Subject: [PATCH] net/mlx5: fix instruction hotspot on replenishing Rx buffer -When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be -loaded. non-x86 processors (mostly RISC such as ARM and Power) are more -vulnerable to load stall. For x86, reducing the number of instructions -seems to matter most. - -For x86, this is simply a load but for other architectures, it is -calculated from the address of mbuf structure by rte_mbuf_buf_addr() -without having to load the first cacheline of the mbuf. +[ backported from upstream commit 9c55c6bd86156d17df93bf947dc620222ee9f7e4 ] -Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx buffer") -Cc: stable@dpdk.org +On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed +to be accessed as it is static and easily calculated from the mbuf address. +Accessing the mbuf content causes unnecessary load stall. non-x86 +processors (mostly RISC such as ARM and Power) are more vulnerable to load +stall. For x86, reducing the number of instructions seems to matter most. + +Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx") Signed-off-by: Yongseok Koh Acked-by: Shahaf Shuler --- - drivers/net/mlx5/mlx5_rxtx_vec.h | 14 +++++++++++++- - 1 file changed, 13 insertions(+), 1 deletion(-) + drivers/net/mlx5/mlx5_rxtx_vec.h | 17 ++++++++++++++++- + 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h -index 5df8e291e6..4220b08dd2 100644 +index 750559b8d1..e7367b74d8 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h -@@ -102,9 +102,21 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, uint16_t n) +@@ -116,7 +116,22 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, uint16_t n) return; } for (i = 0; i < n; ++i) { -- void *buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); +- wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr + + void *buf_addr; - ++ + /* + * Load the virtual address for Rx WQE. non-x86 processors + * (mostly RISC such as ARM and Power) are more vulnerable to @@ -40,14 +38,15 @@ + */ +#ifdef RTE_ARCH_X86_64 + buf_addr = elts[i]->buf_addr; -+ assert(buf_addr == rte_mbuf_buf_addr(elts[i], rxq->mp)); +#else -+ buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); - assert(buf_addr == elts[i]->buf_addr); ++ buf_addr = (char *)elts[i] + sizeof(struct rte_mbuf) + ++ rte_pktmbuf_priv_size(rxq->mp); ++ assert(buf_addr == elts[i]->buf_addr); +#endif - wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + ++ wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + RTE_PKTMBUF_HEADROOM); - /* If there's only one MR, no need to replace LKey in WQE. */ + /* If there's only one MR, no need to replace LKEY in WQEs. */ + if (unlikely(!IS_SINGLE_MR(rxq->mr_ctrl.bh_n))) -- 2.21.0