From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6BAACA2EDB for ; Mon, 30 Sep 2019 16:41:21 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0A6804C77; Mon, 30 Sep 2019 16:41:21 +0200 (CEST) Received: from huawei.com (szxga05-in.huawei.com [45.249.212.191]) by dpdk.org (Postfix) with ESMTP id 3E7094C74 for ; Mon, 30 Sep 2019 16:41:18 +0200 (CEST) Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 3CFCFB0EF0B4B701B57D; Mon, 30 Sep 2019 22:41:16 +0800 (CST) Received: from [127.0.0.1] (10.177.29.98) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Mon, 30 Sep 2019 22:41:08 +0800 To: "Gavin Hu (Arm Technology China)" , "ferruh.yigit@intel.com" CC: "dev@dpdk.org" , "xuanziyang2@huawei.com" , "shahar.belkar@huawei.com" , "luoxianjun@huawei.com" , "tanya.brokhman@huawei.com" , "zhouguoyang@huawei.com" , "wulike1@huawei.com" , nd References: <8fa4210f9ba33fe2db2a66f0c16fd01b1c7a57f5.1569421287.git.cloud.wangxiaoyun@huawei.com> From: "Wangxiaoyun (Cloud, Network Chip Application Development Dept)" Message-ID: <39715600-b11d-fd20-b5d5-098259454201@huawei.com> Date: Mon, 30 Sep 2019 22:41:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.177.29.98] X-CFilter-Loop: Reflected Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v2 17/17] net/hinic: optimize tx&rx performance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Gavin, Thanks for your comments. +#if defined(__ARM64_NEON__) No NEON intrinsics used, maybe RTE_ARCH_ARM64 is better. In the following line __rte_always_inline is commonly used in DPDK, the effect is same. /Gavin For this patch, we don't use NEON intrinsics, but for tx& rx process, we use NEON intrinsics for wqebb bigend conversion on arm platform, so in order to keep ingres, all intrinsics optimization on arm we use __ARM64_NEON__ definitation. I understand your intention is the reading of the status is observed before the following reads. This can be fulfilled by __atomic_load_n(...) with __ATOMIC_ACQUIRE semantics. This C11 way applies to all the arches, and you don't need the differentiation of arches. /Gavin Thanks, i have changed it to __atomic_load_n(...) with __ATOMIC_ACQUIRE semantics, and send a new patch V3. Best regards Xiaoyun Wang 在 2019/9/27 10:08, Gavin Hu (Arm Technology China) 写道: > Hi Xiaoyun, > >> -----Original Message----- >> From: dev On Behalf Of Xiaoyun wang >> Sent: Wednesday, September 25, 2019 10:31 PM >> To: ferruh.yigit@intel.com >> Cc: dev@dpdk.org; xuanziyang2@huawei.com; shahar.belkar@huawei.com; >> luoxianjun@huawei.com; tanya.brokhman@huawei.com; >> zhouguoyang@huawei.com; wulike1@huawei.com; Xiaoyun wang >> >> Subject: [dpdk-dev] [PATCH v2 17/17] net/hinic: optimize tx&rx >> performance >> >> This patch optimizes receive packets performance >> in arm platform. >> >> Signed-off-by: Xiaoyun wang >> --- >> drivers/net/hinic/hinic_pmd_rx.c | 17 +++++++++++++++++ >> drivers/net/hinic/hinic_pmd_rx.h | 11 +++++++++++ >> 2 files changed, 28 insertions(+) >> >> diff --git a/drivers/net/hinic/hinic_pmd_rx.c >> b/drivers/net/hinic/hinic_pmd_rx.c >> index 37b4f5c..94071ee 100644 >> --- a/drivers/net/hinic/hinic_pmd_rx.c >> +++ b/drivers/net/hinic/hinic_pmd_rx.c >> @@ -950,6 +950,19 @@ void hinic_rx_alloc_pkts(struct hinic_rxq *rxq) >> } >> } >> >> +#if defined(__ARM64_NEON__) > No NEON intrinsics used, maybe RTE_ARCH_ARM64 is better. > In the following line __rte_always_inline is commonly used in DPDK, the effect is same. > /Gavin > >> +static inline uint32_t __attribute__((always_inline)) >> +hinic_read_cqe_status(uintptr_t addr) >> +{ >> + uint32_t val; >> + >> + asm volatile("ldar %x[val], [%x[addr]]" >> + : [val] "=r" (val) >> + : [addr] "r" (addr)); >> + return val; >> +} >> +#endif > I understand your intention is the reading of the status is observed before the following reads. > This can be fulfilled by __atomic_load_n(...) with __ATOMIC_ACQUIRE semantics. > This C11 way applies to all the arches, and you don't need the differentiation of arches. > /Gavin >> + >> u16 hinic_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, u16 >> nb_pkts) >> { >> struct rte_mbuf *rxm; >> @@ -972,7 +985,11 @@ u16 hinic_recv_pkts(void *rx_queue, struct >> rte_mbuf **rx_pkts, u16 nb_pkts) >> while (pkts < nb_pkts) { >> /* 2. current ci is done */ >> rx_cqe = &rxq->rx_cqe[sw_ci]; >> +#if defined(__X86_64_SSE__) >> status = rx_cqe->status; >> +#elif defined(__ARM64_NEON__) >> + status = hinic_read_cqe_status((uintptr_t)&rxq- >>> rx_cqe[sw_ci]); >> +#endif >> if (!HINIC_GET_RX_DONE_BE(status)) >> break; >> >> diff --git a/drivers/net/hinic/hinic_pmd_rx.h >> b/drivers/net/hinic/hinic_pmd_rx.h >> index fe2735b..fa27e91 100644 >> --- a/drivers/net/hinic/hinic_pmd_rx.h >> +++ b/drivers/net/hinic/hinic_pmd_rx.h >> @@ -28,6 +28,7 @@ struct hinic_rq_ctrl { >> u32 ctrl_fmt; >> }; >> >> +#if defined(__X86_64_SSE__) >> struct hinic_rq_cqe { >> u32 status; >> u32 vlan_len; >> @@ -36,6 +37,16 @@ struct hinic_rq_cqe { >> >> u32 rsvd[4]; >> }; >> +#elif defined(__ARM64_NEON__) >> +struct hinic_rq_cqe { >> + u32 status; >> + u32 vlan_len; >> + u32 offload_type; >> + u32 rss_hash; >> + >> + u32 rsvd[4]; >> +} __rte_cache_aligned; >> +#endif >> >> struct hinic_rq_cqe_sect { >> struct hinic_sge sge; >> -- >> 1.8.3.1 >