DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
@ 2019-08-13 10:02 Ruifeng Wang
  2019-08-13 10:02 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-13 10:02 UTC (permalink / raw)
  To: jerinj, gavin.hu; +Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang

Couple of changes to IXGBE vector PMD on aarch64 platform.
An unnecessary memory barrier was identified and removed.
Also part of processing was replaced with NEON intrinsics.
Both of the changes will help to improve performance.

Ruifeng Wang (2):
  net/ixgbe: remove barrier in vPMD for aarch64
  net/ixgbe: use neon intrinsics to count packet for aarch64

 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
 1 file changed, 16 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 1/2] net/ixgbe: remove barrier in vPMD for aarch64
  2019-08-13 10:02 [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64 Ruifeng Wang
@ 2019-08-13 10:02 ` " Ruifeng Wang
  2019-08-13 10:02 ` [dpdk-dev] [PATCH 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-13 10:02 UTC (permalink / raw)
  To: jerinj, gavin.hu; +Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang

The memory barrier was intended for descriptor data integrity (see
comments in [1]). However, since NEON loads are atomic, there is
no need for the memory barrier. Remove it accordingly.

Corrected couple of code comments.

In terms of performance, observed slightly higher average throughput
in tests with 82599ES NIC.

[1] http://patches.dpdk.org/patch/18153/

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index edb138354..86fb3afdb 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -214,13 +214,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint32_t var = 0;
 		uint32_t stat;
 
-		/* B.1 load 1 mbuf point */
+		/* B.1 load 2 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
 
-		/* B.1 load 1 mbuf point */
+		/* B.1 load 2 mbuf point */
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
 
 		/* A. load 4 pkts descs */
@@ -228,7 +228,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
 		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
-		rte_smp_rmb();
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH 2/2] net/ixgbe: use neon intrinsics to count packet for aarch64
  2019-08-13 10:02 [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64 Ruifeng Wang
  2019-08-13 10:02 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
@ 2019-08-13 10:02 ` " Ruifeng Wang
  2019-08-25  1:33 ` [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes " Ye Xiaolong
  2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
  3 siblings, 0 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-13 10:02 UTC (permalink / raw)
  To: jerinj, gavin.hu; +Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang

vPMD for aarch64 calculates the number of received packets using a loop.
Change to use NEON intrinsics for calculation. This saves CPU cycles
and has slightly better performance.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 27 +++++++++++++------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 86fb3afdb..eeb825911 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -144,6 +144,7 @@ desc_to_olflags_v(uint8x16x2_t sterr_tmp1, uint8x16x2_t sterr_tmp2,
 
 #define IXGBE_VPMD_DESC_DD_MASK		0x01010101
 #define IXGBE_VPMD_DESC_EOP_MASK	0x02020202
+#define IXGBE_UINT8_BIT			(CHAR_BIT * sizeof(uint8_t))
 
 static inline uint16_t
 _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
@@ -211,7 +212,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint64x2_t mbp1, mbp2;
 		uint8x16_t staterr;
 		uint16x8_t tmp;
-		uint32_t var = 0;
 		uint32_t stat;
 
 		/* B.1 load 2 mbuf point */
@@ -256,7 +256,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* C.2 get 4 pkts staterr value  */
 		staterr = vzipq_u8(sterr_tmp1.val[1], sterr_tmp2.val[1]).val[0];
-		stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
 
 		/* set ol_flags with vlan packet type */
 		desc_to_olflags_v(sterr_tmp1, sterr_tmp2, staterr,
@@ -282,12 +281,20 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* C* extract and record EOP bit */
 		if (split_packet) {
+			stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
 			/* and with mask to extract bits, flipping 1-0 */
 			*(int *)split_packet = ~stat & IXGBE_VPMD_DESC_EOP_MASK;
 
 			split_packet += RTE_IXGBE_DESCS_PER_LOOP;
 		}
 
+		/* C.4 expand DD bit to saturate UINT8 */
+		staterr = vshlq_n_u8(staterr, IXGBE_UINT8_BIT - 1);
+		staterr = vreinterpretq_u8_s8
+				(vshrq_n_s8(vreinterpretq_s8_u8(staterr),
+					IXGBE_UINT8_BIT - 1));
+		stat = ~vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
+
 		rte_prefetch_non_temporal(rxdp + RTE_IXGBE_DESCS_PER_LOOP);
 
 		/* D.3 copy final 1,2 data to rx_pkts */
@@ -296,18 +303,12 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1,
 			 pkt_mb1);
 
-		stat &= IXGBE_VPMD_DESC_DD_MASK;
-
-		/* C.4 calc avaialbe number of desc */
-		if (likely(stat != IXGBE_VPMD_DESC_DD_MASK)) {
-			while (stat & 0x01) {
-				++var;
-				stat = stat >> 8;
-			}
-			nb_pkts_recd += var;
-			break;
-		} else {
+		/* C.5 calc available number of desc */
+		if (unlikely(stat == 0)) {
 			nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP;
+		} else {
+			nb_pkts_recd += __builtin_ctz(stat) / IXGBE_UINT8_BIT;
+			break;
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
  2019-08-13 10:02 [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64 Ruifeng Wang
  2019-08-13 10:02 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
  2019-08-13 10:02 ` [dpdk-dev] [PATCH 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
@ 2019-08-25  1:33 ` " Ye Xiaolong
  2019-08-26  2:52   ` Ruifeng Wang (Arm Technology China)
  2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
  3 siblings, 1 reply; 11+ messages in thread
From: Ye Xiaolong @ 2019-08-25  1:33 UTC (permalink / raw)
  To: Ruifeng Wang; +Cc: jerinj, gavin.hu, dev, honnappa.nagarahalli, nd

Hi, 

Thanks for the patches, could you also provide the Fixes tag and cc stable?
The patchset looks good to me.

Thanks,
Xiaolong

On 08/13, Ruifeng Wang wrote:
>Couple of changes to IXGBE vector PMD on aarch64 platform.
>An unnecessary memory barrier was identified and removed.
>Also part of processing was replaced with NEON intrinsics.
>Both of the changes will help to improve performance.
>
>Ruifeng Wang (2):
>  net/ixgbe: remove barrier in vPMD for aarch64
>  net/ixgbe: use neon intrinsics to count packet for aarch64
>
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
> 1 file changed, 16 insertions(+), 16 deletions(-)
>
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
  2019-08-25  1:33 ` [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes " Ye Xiaolong
@ 2019-08-26  2:52   ` Ruifeng Wang (Arm Technology China)
  2019-08-26 10:39     ` Ferruh Yigit
  0 siblings, 1 reply; 11+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-08-26  2:52 UTC (permalink / raw)
  To: Ye Xiaolong
  Cc: jerinj, Gavin Hu (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, nd

Hi Xiaolong,

> -----Original Message-----
> From: Ye Xiaolong <xiaolong.ye@intel.com>
> Sent: Sunday, August 25, 2019 09:34
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> Cc: jerinj@marvell.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
> 
> Hi,
> 
> Thanks for the patches, could you also provide the Fixes tag and cc stable?
> The patchset looks good to me.

Code changes in both patches are not for bug fixing.
Patch 1/2 includes fix for code comments. I don't think it deserves a Fixes tag or backporting. Can we skip the Fixes tag?

> 
> Thanks,
> Xiaolong
> 
> On 08/13, Ruifeng Wang wrote:
> >Couple of changes to IXGBE vector PMD on aarch64 platform.
> >An unnecessary memory barrier was identified and removed.
> >Also part of processing was replaced with NEON intrinsics.
> >Both of the changes will help to improve performance.
> >
> >Ruifeng Wang (2):
> >  net/ixgbe: remove barrier in vPMD for aarch64
> >  net/ixgbe: use neon intrinsics to count packet for aarch64
> >
> > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
> > 1 file changed, 16 insertions(+), 16 deletions(-)
> >
> >--
> >2.17.1
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
  2019-08-26  2:52   ` Ruifeng Wang (Arm Technology China)
@ 2019-08-26 10:39     ` Ferruh Yigit
  2019-08-26 10:53       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 1 reply; 11+ messages in thread
From: Ferruh Yigit @ 2019-08-26 10:39 UTC (permalink / raw)
  To: Ruifeng Wang (Arm Technology China), Ye Xiaolong
  Cc: jerinj, Gavin Hu (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, Kevin Traynor, Luca Boccassi

On 8/26/2019 3:52 AM, Ruifeng Wang (Arm Technology China) wrote:
> Hi Xiaolong,
> 
>> -----Original Message-----
>> From: Ye Xiaolong <xiaolong.ye@intel.com>
>> Sent: Sunday, August 25, 2019 09:34
>> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
>> Cc: jerinj@marvell.com; Gavin Hu (Arm Technology China)
>> <Gavin.Hu@arm.com>; dev@dpdk.org; Honnappa Nagarahalli
>> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
>>
>> Hi,
>>
>> Thanks for the patches, could you also provide the Fixes tag and cc stable?
>> The patchset looks good to me.
> 
> Code changes in both patches are not for bug fixing.
> Patch 1/2 includes fix for code comments. I don't think it deserves a Fixes tag or backporting. Can we skip the Fixes tag?

In 1/2 a memory barrier is removed, it means it was wrong to add it at first
place and you are fixing it, no?


Performance improvements are in gray are, but if there is no ABI/API break why
not take is performance fix and backport and have the performance improvement in
LTS?
Also I think taking as much as possible may help to maintain LTS, since it
reduces the chance of conflict in later commits, LTS is two years and these
small things can accumulate and make getting important fixes hard by time.

Is there any specific reason not to backport these patches to LTS releases?


> 
>>
>> Thanks,
>> Xiaolong
>>
>> On 08/13, Ruifeng Wang wrote:
>>> Couple of changes to IXGBE vector PMD on aarch64 platform.
>>> An unnecessary memory barrier was identified and removed.
>>> Also part of processing was replaced with NEON intrinsics.
>>> Both of the changes will help to improve performance.
>>>
>>> Ruifeng Wang (2):
>>>  net/ixgbe: remove barrier in vPMD for aarch64
>>>  net/ixgbe: use neon intrinsics to count packet for aarch64
>>>
>>> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
>>> 1 file changed, 16 insertions(+), 16 deletions(-)
>>>
>>> --
>>> 2.17.1
>>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
  2019-08-26 10:39     ` Ferruh Yigit
@ 2019-08-26 10:53       ` Ruifeng Wang (Arm Technology China)
  0 siblings, 0 replies; 11+ messages in thread
From: Ruifeng Wang (Arm Technology China) @ 2019-08-26 10:53 UTC (permalink / raw)
  To: Ferruh Yigit, Ye Xiaolong
  Cc: jerinj, Gavin Hu (Arm Technology China),
	dev, Honnappa Nagarahalli, nd, Kevin Traynor, Luca Boccassi, nd


> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Monday, August 26, 2019 18:40
> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Ye
> Xiaolong <xiaolong.ye@intel.com>
> Cc: jerinj@marvell.com; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Kevin Traynor
> <ktraynor@redhat.com>; Luca Boccassi <bluca@debian.org>
> Subject: Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
> 
> On 8/26/2019 3:52 AM, Ruifeng Wang (Arm Technology China) wrote:
> > Hi Xiaolong,
> >
> >> -----Original Message-----
> >> From: Ye Xiaolong <xiaolong.ye@intel.com>
> >> Sent: Sunday, August 25, 2019 09:34
> >> To: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>
> >> Cc: jerinj@marvell.com; Gavin Hu (Arm Technology China)
> >> <Gavin.Hu@arm.com>; dev@dpdk.org; Honnappa Nagarahalli
> >> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> >> Subject: Re: [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64
> >>
> >> Hi,
> >>
> >> Thanks for the patches, could you also provide the Fixes tag and cc stable?
> >> The patchset looks good to me.
> >
> > Code changes in both patches are not for bug fixing.
> > Patch 1/2 includes fix for code comments. I don't think it deserves a Fixes
> tag or backporting. Can we skip the Fixes tag?
> 
> In 1/2 a memory barrier is removed, it means it was wrong to add it at first
> place and you are fixing it, no?
> 
> 
> Performance improvements are in gray are, but if there is no ABI/API break
> why not take is performance fix and backport and have the performance
> improvement in LTS?
> Also I think taking as much as possible may help to maintain LTS, since it
> reduces the chance of conflict in later commits, LTS is two years and these
> small things can accumulate and make getting important fixes hard by time.
> 
> Is there any specific reason not to backport these patches to LTS releases?
> 
Thanks for your explanation.
Understand that. No objection to backporting.
I'll send out new version.

> 
> >
> >>
> >> Thanks,
> >> Xiaolong
> >>
> >> On 08/13, Ruifeng Wang wrote:
> >>> Couple of changes to IXGBE vector PMD on aarch64 platform.
> >>> An unnecessary memory barrier was identified and removed.
> >>> Also part of processing was replaced with NEON intrinsics.
> >>> Both of the changes will help to improve performance.
> >>>
> >>> Ruifeng Wang (2):
> >>>  net/ixgbe: remove barrier in vPMD for aarch64
> >>>  net/ixgbe: use neon intrinsics to count packet for aarch64
> >>>
> >>> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32
> >>> ++++++++++++-------------
> >>> 1 file changed, 16 insertions(+), 16 deletions(-)
> >>>
> >>> --
> >>> 2.17.1
> >>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v2 0/2] IXGBE vPMD changes for aarch64
  2019-08-13 10:02 [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64 Ruifeng Wang
                   ` (2 preceding siblings ...)
  2019-08-25  1:33 ` [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes " Ye Xiaolong
@ 2019-08-28  8:24 ` " Ruifeng Wang
  2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
                     ` (2 more replies)
  3 siblings, 3 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-28  8:24 UTC (permalink / raw)
  To: xiaolong.ye, ferruh.yigit, jerinj, gavin.hu
  Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang

Couple of changes to IXGBE vector PMD on aarch64 platform.
An unnecessary memory barrier was identified and removed.
Also part of processing was replaced with NEON intrinsics.
Both of the changes will help to improve performance.

Ruifeng Wang (2):
  net/ixgbe: remove barrier in vPMD for aarch64
  net/ixgbe: use neon intrinsics to count packet for aarch64

 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
 1 file changed, 16 insertions(+), 16 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD for aarch64
  2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
@ 2019-08-28  8:24   ` " Ruifeng Wang
  2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
  2019-08-28 15:14   ` [dpdk-dev] [PATCH v2 0/2] IXGBE vPMD changes " Ye Xiaolong
  2 siblings, 0 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-28  8:24 UTC (permalink / raw)
  To: xiaolong.ye, ferruh.yigit, jerinj, gavin.hu
  Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang, stable

The memory barrier was intended for descriptor data integrity (see
comments in [1]). As later NEON loads were implemented and a whole
entry is loaded in one-run and atomic, that makes the ordering of
partial loading unnecessary. Remove it accordingly.

Corrected couple of code comments.

In terms of performance, observed slightly higher average throughput
in tests with 82599ES NIC.

[1] http://patches.dpdk.org/patch/18153/

Fixes: 989a84050542 ("net/ixgbe: fix received packets number for ARM NEON")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index edb138354..86fb3afdb 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -214,13 +214,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint32_t var = 0;
 		uint32_t stat;
 
-		/* B.1 load 1 mbuf point */
+		/* B.1 load 2 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
 
-		/* B.1 load 1 mbuf point */
+		/* B.1 load 2 mbuf point */
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
 
 		/* A. load 4 pkts descs */
@@ -228,7 +228,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
 		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
-		rte_smp_rmb();
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [dpdk-dev] [PATCH v2 2/2] net/ixgbe: use neon intrinsics to count packet for aarch64
  2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
  2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
@ 2019-08-28  8:24   ` " Ruifeng Wang
  2019-08-28 15:14   ` [dpdk-dev] [PATCH v2 0/2] IXGBE vPMD changes " Ye Xiaolong
  2 siblings, 0 replies; 11+ messages in thread
From: Ruifeng Wang @ 2019-08-28  8:24 UTC (permalink / raw)
  To: xiaolong.ye, ferruh.yigit, jerinj, gavin.hu
  Cc: dev, honnappa.nagarahalli, nd, Ruifeng Wang

vPMD for aarch64 calculates the number of received packets using a loop.
Change to use NEON intrinsics for calculation. This saves CPU cycles
and has slightly better performance.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 27 +++++++++++++------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index 86fb3afdb..eeb825911 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -144,6 +144,7 @@ desc_to_olflags_v(uint8x16x2_t sterr_tmp1, uint8x16x2_t sterr_tmp2,
 
 #define IXGBE_VPMD_DESC_DD_MASK		0x01010101
 #define IXGBE_VPMD_DESC_EOP_MASK	0x02020202
+#define IXGBE_UINT8_BIT			(CHAR_BIT * sizeof(uint8_t))
 
 static inline uint16_t
 _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
@@ -211,7 +212,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint64x2_t mbp1, mbp2;
 		uint8x16_t staterr;
 		uint16x8_t tmp;
-		uint32_t var = 0;
 		uint32_t stat;
 
 		/* B.1 load 2 mbuf point */
@@ -256,7 +256,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* C.2 get 4 pkts staterr value  */
 		staterr = vzipq_u8(sterr_tmp1.val[1], sterr_tmp2.val[1]).val[0];
-		stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
 
 		/* set ol_flags with vlan packet type */
 		desc_to_olflags_v(sterr_tmp1, sterr_tmp2, staterr,
@@ -282,12 +281,20 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 
 		/* C* extract and record EOP bit */
 		if (split_packet) {
+			stat = vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
 			/* and with mask to extract bits, flipping 1-0 */
 			*(int *)split_packet = ~stat & IXGBE_VPMD_DESC_EOP_MASK;
 
 			split_packet += RTE_IXGBE_DESCS_PER_LOOP;
 		}
 
+		/* C.4 expand DD bit to saturate UINT8 */
+		staterr = vshlq_n_u8(staterr, IXGBE_UINT8_BIT - 1);
+		staterr = vreinterpretq_u8_s8
+				(vshrq_n_s8(vreinterpretq_s8_u8(staterr),
+					IXGBE_UINT8_BIT - 1));
+		stat = ~vgetq_lane_u32(vreinterpretq_u32_u8(staterr), 0);
+
 		rte_prefetch_non_temporal(rxdp + RTE_IXGBE_DESCS_PER_LOOP);
 
 		/* D.3 copy final 1,2 data to rx_pkts */
@@ -296,18 +303,12 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1,
 			 pkt_mb1);
 
-		stat &= IXGBE_VPMD_DESC_DD_MASK;
-
-		/* C.4 calc avaialbe number of desc */
-		if (likely(stat != IXGBE_VPMD_DESC_DD_MASK)) {
-			while (stat & 0x01) {
-				++var;
-				stat = stat >> 8;
-			}
-			nb_pkts_recd += var;
-			break;
-		} else {
+		/* C.5 calc available number of desc */
+		if (unlikely(stat == 0)) {
 			nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP;
+		} else {
+			nb_pkts_recd += __builtin_ctz(stat) / IXGBE_UINT8_BIT;
+			break;
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/2] IXGBE vPMD changes for aarch64
  2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
  2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
  2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
@ 2019-08-28 15:14   ` " Ye Xiaolong
  2 siblings, 0 replies; 11+ messages in thread
From: Ye Xiaolong @ 2019-08-28 15:14 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: ferruh.yigit, jerinj, gavin.hu, dev, honnappa.nagarahalli, nd

On 08/28, Ruifeng Wang wrote:
>Couple of changes to IXGBE vector PMD on aarch64 platform.
>An unnecessary memory barrier was identified and removed.
>Also part of processing was replaced with NEON intrinsics.
>Both of the changes will help to improve performance.
>
>Ruifeng Wang (2):
>  net/ixgbe: remove barrier in vPMD for aarch64
>  net/ixgbe: use neon intrinsics to count packet for aarch64
>
> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 32 ++++++++++++-------------
> 1 file changed, 16 insertions(+), 16 deletions(-)
>
>-- 
>2.17.1
>

Series applied to dpdk-next-net-intel/for-next-net.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-13 10:02 [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes for aarch64 Ruifeng Wang
2019-08-13 10:02 ` [dpdk-dev] [PATCH 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
2019-08-13 10:02 ` [dpdk-dev] [PATCH 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
2019-08-25  1:33 ` [dpdk-dev] [PATCH 0/2] IXGBE vPMD changes " Ye Xiaolong
2019-08-26  2:52   ` Ruifeng Wang (Arm Technology China)
2019-08-26 10:39     ` Ferruh Yigit
2019-08-26 10:53       ` Ruifeng Wang (Arm Technology China)
2019-08-28  8:24 ` [dpdk-dev] [PATCH v2 " Ruifeng Wang
2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 1/2] net/ixgbe: remove barrier in vPMD " Ruifeng Wang
2019-08-28  8:24   ` [dpdk-dev] [PATCH v2 2/2] net/ixgbe: use neon intrinsics to count packet " Ruifeng Wang
2019-08-28 15:14   ` [dpdk-dev] [PATCH v2 0/2] IXGBE vPMD changes " Ye Xiaolong

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox