From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D1F9BA04EF for ; Mon, 1 Jun 2020 10:50:31 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A1A081C1E5; Mon, 1 Jun 2020 10:50:31 +0200 (CEST) Received: from huawei.com (szxga07-in.huawei.com [45.249.212.35]) by dpdk.org (Postfix) with ESMTP id 375F81C0AF for ; Mon, 1 Jun 2020 10:50:30 +0200 (CEST) Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 17D0FD46E8D17F2D8829 for ; Mon, 1 Jun 2020 16:50:29 +0800 (CST) Received: from localhost.localdomain (10.69.192.56) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.487.0; Mon, 1 Jun 2020 16:50:18 +0800 From: "Wei Hu (Xavier)" To: CC: , Date: Mon, 1 Jun 2020 16:48:56 +0800 Message-ID: <1591001336-6162-6-git-send-email-xavier.huwei@huawei.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1591001336-6162-1-git-send-email-xavier.huwei@huawei.com> References: <1591001336-6162-1-git-send-email-xavier.huwei@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.69.192.56] X-CFilter-Loop: Reflected Subject: [dpdk-stable] [PATCH 19.11 5/5] net/hns3: replace memory barrier with data dependency order X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" From: Chengwen Feng [ upstream commit 95ea1a00cedd8e752ef6968bbff675633a00db74 ] This patch optimizes the Rx performance by using data dependency ordering to instead of memory barrier which is rte_cio_rmb in the '.rx_pkt_burst' ops implementation function named hns3_recv_pkts. Signed-off-by: Chengwen Feng Signed-off-by: Wei Hu (Xavier) --- drivers/net/hns3/hns3_rxtx.c | 85 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 73 insertions(+), 12 deletions(-) diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c index 34e7448..36d7a7b 100644 --- a/drivers/net/hns3/hns3_rxtx.c +++ b/drivers/net/hns3/hns3_rxtx.c @@ -1453,13 +1453,14 @@ hns3_rx_set_cksum_flag(struct rte_mbuf *rxm, uint64_t packet_type, uint16_t hns3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) { + volatile struct hns3_desc *rx_ring; /* RX ring (desc) */ + volatile struct hns3_desc *rxdp; /* pointer of the current desc */ struct hns3_rx_queue *rxq; /* RX queue */ - struct hns3_desc *rx_ring; /* RX ring (desc) */ struct hns3_entry *sw_ring; struct hns3_entry *rxe; - struct hns3_desc *rxdp; /* pointer of the current desc */ struct rte_mbuf *first_seg; struct rte_mbuf *last_seg; + struct hns3_desc rxd; struct rte_mbuf *nmb; /* pointer of the new mbuf */ struct rte_mbuf *rxm; struct rte_eth_dev *dev; @@ -1491,6 +1492,67 @@ hns3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) bd_base_info = rte_le_to_cpu_32(rxdp->rx.bd_base_info); if (unlikely(!hns3_get_bit(bd_base_info, HNS3_RXD_VLD_B))) break; + /* + * The interactive process between software and hardware of + * receiving a new packet in hns3 network engine: + * 1. Hardware network engine firstly writes the packet content + * to the memory pointed by the 'addr' field of the Rx Buffer + * Descriptor, secondly fills the result of parsing the + * packet include the valid field into the Rx Buffer + * Descriptor in one write operation. + * 2. Driver reads the Rx BD's valid field in the loop to check + * whether it's valid, if valid then assign a new address to + * the addr field, clear the valid field, get the other + * information of the packet by parsing Rx BD's other fields, + * finally write back the number of Rx BDs processed by the + * driver to the HNS3_RING_RX_HEAD_REG register to inform + * hardware. + * In the above process, the ordering is very important. We must + * make sure that CPU read Rx BD's other fields only after the + * Rx BD is valid. + * + * There are two type of re-ordering: compiler re-ordering and + * CPU re-ordering under the ARMv8 architecture. + * 1. we use volatile to deal with compiler re-ordering, so you + * can see that rx_ring/rxdp defined with volatile. + * 2. we commonly use memory barrier to deal with CPU + * re-ordering, but the cost is high. + * + * In order to solve the high cost of using memory barrier, we + * use the data dependency order under the ARMv8 architecture, + * for exmple: + * instr01: load A + * instr02: load B <- A + * the instr02 will always execute after instr01. + * + * To construct the data dependency ordering, we use the + * following assignment: + * rxd = rxdp[(bd_base_info & (1u << HNS3_RXD_VLD_B)) - + * (1u<mb_pool); if (unlikely(nmb == NULL)) { @@ -1514,14 +1576,13 @@ hns3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) rxe->mbuf = nmb; dma_addr = rte_cpu_to_le_64(rte_mbuf_data_iova_default(nmb)); - rxdp->addr = dma_addr; rxdp->rx.bd_base_info = 0; + rxdp->addr = dma_addr; - rte_cio_rmb(); /* Load remained descriptor data and extract necessary fields */ - data_len = (uint16_t)(rte_le_to_cpu_16(rxdp->rx.size)); - l234_info = rte_le_to_cpu_32(rxdp->rx.l234_info); - ol_info = rte_le_to_cpu_32(rxdp->rx.ol_info); + data_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.size)); + l234_info = rte_le_to_cpu_32(rxd.rx.l234_info); + ol_info = rte_le_to_cpu_32(rxd.rx.ol_info); if (first_seg == NULL) { first_seg = rxm; @@ -1540,14 +1601,14 @@ hns3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) } /* The last buffer of the received packet */ - pkt_len = (uint16_t)(rte_le_to_cpu_16(rxdp->rx.pkt_len)); + pkt_len = (uint16_t)(rte_le_to_cpu_16(rxd.rx.pkt_len)); first_seg->pkt_len = pkt_len; first_seg->port = rxq->port_id; - first_seg->hash.rss = rte_le_to_cpu_32(rxdp->rx.rss_hash); + first_seg->hash.rss = rte_le_to_cpu_32(rxd.rx.rss_hash); first_seg->ol_flags = PKT_RX_RSS_HASH; if (unlikely(hns3_get_bit(bd_base_info, HNS3_RXD_LUM_B))) { first_seg->hash.fdir.hi = - rte_le_to_cpu_32(rxdp->rx.fd_id); + rte_le_to_cpu_32(rxd.rx.fd_id); first_seg->ol_flags |= PKT_RX_FDIR | PKT_RX_FDIR_ID; } rxm->next = NULL; @@ -1565,9 +1626,9 @@ hns3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) first_seg->packet_type, cksum_err); - first_seg->vlan_tci = rte_le_to_cpu_16(rxdp->rx.vlan_tag); + first_seg->vlan_tci = rte_le_to_cpu_16(rxd.rx.vlan_tag); first_seg->vlan_tci_outer = - rte_le_to_cpu_16(rxdp->rx.ot_vlan_tag); + rte_le_to_cpu_16(rxd.rx.ot_vlan_tag); rx_pkts[nb_rx++] = first_seg; first_seg = NULL; continue; -- 2.7.4