patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH 0/2] i40e Rx descriptor loads ordering
@ 2021-09-06  3:31 Ruifeng Wang
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-06  3:31 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

On Rx path, NIC fills Rx descriptor with data pertains to received packet. 

A single descriptor consists of multiple words. Word1 has the bit that 
indicates readiness of descriptor for software to use. So word1 should
be loaded before other words. 

On architectures with weaker memory ordering, barrier is needed to ensure
the ordering of loads.

This patch set fixed the risk on both scalar path and aarch64 vector path.

Ruifeng Wang (2):
  net/i40e: fix risk in Rx descriptor read in NEON vector path
  net/i40e: fix risk in Rx descriptor read in scalar path

 drivers/net/i40e/i40e_rxtx.c          | 12 ++++++++++++
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  8 ++++++++
 2 files changed, 20 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path
  2021-09-06  3:31 [dpdk-stable] [PATCH 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
@ 2021-09-06  3:32 ` Ruifeng Wang
  2021-09-14 18:33   ` Honnappa Nagarahalli
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
  2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-06  3:32 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

Rx descriptor is 16B/32B in size and consists of multiple words.
The word that includes DD field should be read first. Read result
with DD bit set indicates the rest part in a descriptor is valid.

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes DD field could be
reordered after read of other words. In this case, some words could
be invalid data.

Read barrier is added after read of qword1 that includes DD field.
And qword0 is reloaded to update vector register. This ensures
what fetched is correct descriptor data.

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index b2683fda60..71191c7cc8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
 
+		/* Use acquire fence to order loads of descriptor qwords */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* A.2 reload qword0 to make it ordered after qword1 load */
+		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-06  3:31 [dpdk-stable] [PATCH 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
@ 2021-09-06  3:32 ` Ruifeng Wang
  2021-09-14 18:06   ` Honnappa Nagarahalli
  2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-06  3:32 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

Rx descriptor is 16B/32B in size and consists of multiple words.
The word that includes DD field should be read first. Read result
with DD bit set indicates the rest part in a descriptor is valid.

In functions for simple Rx, the descriptor is not read atomically
in whole. On weaker ordered systems like aarch64, read of the word
that includes DD field could be reordered after read of other words.
In this case, some words could be invalid data.

Read barrier is inserted between read of the word with DD field
and read of other words. The barrier ensures what fetched is correct
descriptor data.

Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
The change should not impact performance on x86 as acquire fence is
ignored on x86.

 drivers/net/i40e/i40e_rxtx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8329cbdd4e..c4cd6b6b60 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -746,6 +746,12 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			break;
 		}
 
+		/**
+		 * Use acquire fence to ensure that qword1 which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 		nb_hold++;
 		rxe = &sw_ring[rx_id];
@@ -862,6 +868,12 @@ i40e_recv_scattered_pkts(void *rx_queue,
 			break;
 		}
 
+		/**
+		 * Use acquire fence to ensure that qword1 which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 		nb_hold++;
 		rxe = &sw_ring[rx_id];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
@ 2021-09-14 18:06   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 13+ messages in thread
From: Honnappa Nagarahalli @ 2021-09-14 18:06 UTC (permalink / raw)
  To: Ruifeng Wang, dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, stable, nd, Ruifeng Wang,
	Honnappa Nagarahalli, nd

<snip>

> 
> Rx descriptor is 16B/32B in size and consists of multiple words.
> The word that includes DD field should be read first. Read result with DD bit
> set indicates the rest part in a descriptor is valid.
Suggest rewording as follows:
Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the rest of the descriptor words have valid values. Hence, the word containing DD bit must be read first before reading the rest of the descriptor words.

> 
> In functions for simple Rx, the descriptor is not read atomically in whole. On
> weaker ordered systems like aarch64, read of the word that includes DD field
> could be reordered after read of other words.
> In this case, some words could be invalid data.
Since the entire descriptor is not read atomically, on relaxed memory ordered systems like Aarch64, read of the word containing DD field could be reordered after read of other words.

> 
> Read barrier is inserted between read of the word with DD field and read of
> other words. The barrier ensures what fetched is correct descriptor data.
Suggest capturing the performance impact, so it is clearly documented.

> 
> Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
With the above comments,
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
> The change should not impact performance on x86 as acquire fence is ignored
> on x86.
> 
>  drivers/net/i40e/i40e_rxtx.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index
> 8329cbdd4e..c4cd6b6b60 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -746,6 +746,12 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>  			break;
>  		}
> 
> +		/**
> +		 * Use acquire fence to ensure that qword1 which includes DD
> +		 * bit is loaded before loading of other descriptor words.
> +		 */
> +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
>  		rxd = *rxdp;
>  		nb_hold++;
>  		rxe = &sw_ring[rx_id];
> @@ -862,6 +868,12 @@ i40e_recv_scattered_pkts(void *rx_queue,
>  			break;
>  		}
> 
> +		/**
> +		 * Use acquire fence to ensure that qword1 which includes DD
> +		 * bit is loaded before loading of other descriptor words.
> +		 */
> +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +
>  		rxd = *rxdp;
>  		nb_hold++;
>  		rxe = &sw_ring[rx_id];
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
@ 2021-09-14 18:33   ` Honnappa Nagarahalli
  2021-09-15  8:42     ` Ruifeng Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2021-09-14 18:33 UTC (permalink / raw)
  To: Ruifeng Wang, dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, stable, nd, Ruifeng Wang,
	Honnappa Nagarahalli, nd

<snip>
Similar comments that I have to patch 2/2

> 
> Rx descriptor is 16B/32B in size and consists of multiple words.
> The word that includes DD field should be read first. Read result with DD bit
> set indicates the rest part in a descriptor is valid.
Suggest rewording as follows:
Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the rest of the descriptor words have valid values. Hence, the word containing DD bit must be read first before reading the rest of the descriptor words.

> 
> In NEON vector PMD, vector load loads two contiguous 8B of descriptor data
> into vector register. Given vector load ensures no 16B atomicity, read of the
> word that includes DD field could be reordered after read of other words. In
> this case, some words could be invalid data.
"some words could contain invalid data"

> 
> Read barrier is added after read of qword1 that includes DD field.
> And qword0 is reloaded to update vector register. This ensures what fetched
> is correct descriptor data.
"This ensures that the fetched data is correct".

Suggest capturing the performance impact, so it is clearly documented.
> 
> Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
With the above comments,
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index b2683fda60..71191c7cc8 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> *__rte_restrict rxq,
>  		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
>  		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> 
> +		/* Use acquire fence to order loads of descriptor qwords */
> +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		/* A.2 reload qword0 to make it ordered after qword1 load */
> +		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> 0);
> +		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> 0);
> +		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> 0);
> +		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> +
>  		/* B.1 load 4 mbuf point */
>  		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
>  		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering
  2021-09-06  3:31 [dpdk-stable] [PATCH 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
  2021-09-06  3:32 ` [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
@ 2021-09-15  8:33 ` Ruifeng Wang
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
                     ` (2 more replies)
  2 siblings, 3 replies; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-15  8:33 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

On Rx path, NIC fills Rx descriptor with data pertains to received packet.

A single descriptor consists of multiple words. Word1 has the bit that
indicates readiness of descriptor for software to use. So word1 should
be loaded before other words.

On architectures with weaker memory ordering, barrier is needed to ensure
the ordering of loads.

This patch set fixed the risk on both scalar path and aarch64 vector path.

v2:
Updated commit message. Performance impact added. (Honnappa)

Ruifeng Wang (2):
  net/i40e: fix risk in Rx descriptor read in NEON vector path
  net/i40e: fix risk in Rx descriptor read in scalar path

 drivers/net/i40e/i40e_rxtx.c          | 12 ++++++++++++
 drivers/net/i40e/i40e_rxtx_vec_neon.c |  8 ++++++++
 2 files changed, 20 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path
  2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
@ 2021-09-15  8:33   ` Ruifeng Wang
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
  2021-09-24 11:08   ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Zhang, Qi Z
  2 siblings, 0 replies; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-15  8:33 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
that the rest of the descriptor words have valid values. Hence, the
word containing DD bit must be read first before reading the rest of
the descriptor words.

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes DD field could be
reordered after read of other words. In this case, some words could
contain invalid data.

Read barrier is added after read of qword1 that includes DD field.
And qword0 is reloaded to update vector register. This ensures
that the fetched data is correct.

Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index b2683fda60..71191c7cc8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
 
+		/* Use acquire fence to order loads of descriptor qwords */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* A.2 reload qword0 to make it ordered after qword1 load */
+		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
@ 2021-09-15  8:33   ` Ruifeng Wang
  2021-09-29 15:05     ` Ferruh Yigit
  2021-09-24 11:08   ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Zhang, Qi Z
  2 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-15  8:33 UTC (permalink / raw)
  To: dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd,
	Ruifeng Wang

Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
that the rest of the descriptor words have valid values. Hence, the
word containing DD bit must be read first before reading the rest of
the descriptor words.

Since the entire descriptor is not read atomically, on relaxed memory
ordered systems like Aarch64, read of the word containing DD field
could be reordered after read of other words.

Read barrier is inserted between read of the word with DD field
and read of other words. The barrier ensures that the fetched data
is correct.

Testpmd single core test showed no performance drop on x86 or N1SDP.
On ThunderX2, 22% performance regression was observed.

Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 drivers/net/i40e/i40e_rxtx.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8329cbdd4e..c4cd6b6b60 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -746,6 +746,12 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 			break;
 		}
 
+		/**
+		 * Use acquire fence to ensure that qword1 which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 		nb_hold++;
 		rxe = &sw_ring[rx_id];
@@ -862,6 +868,12 @@ i40e_recv_scattered_pkts(void *rx_queue,
 			break;
 		}
 
+		/**
+		 * Use acquire fence to ensure that qword1 which includes DD
+		 * bit is loaded before loading of other descriptor words.
+		 */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+
 		rxd = *rxdp;
 		nb_hold++;
 		rxe = &sw_ring[rx_id];
-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path
  2021-09-14 18:33   ` Honnappa Nagarahalli
@ 2021-09-15  8:42     ` Ruifeng Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Ruifeng Wang @ 2021-09-15  8:42 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, stable, nd, nd, nd

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, September 15, 2021 2:33 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev@dpdk.org
> Cc: beilei.xing@intel.com; qi.z.zhang@intel.com;
> bruce.richardson@intel.com; jerinj@marvell.com;
> hemant.agrawal@nxp.com; drc@linux.vnet.ibm.com; stable@dpdk.org; nd
> <nd@arm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON
> vector path
> 
> <snip>
> Similar comments that I have to patch 2/2
> 
> >
> > Rx descriptor is 16B/32B in size and consists of multiple words.
> > The word that includes DD field should be read first. Read result with
> > DD bit set indicates the rest part in a descriptor is valid.
> Suggest rewording as follows:
> Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the rest of
> the descriptor words have valid values. Hence, the word containing DD bit
> must be read first before reading the rest of the descriptor words.
> 
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes DD field could be reordered after read
> > of other words. In this case, some words could be invalid data.
> "some words could contain invalid data"
> 
> >
> > Read barrier is added after read of qword1 that includes DD field.
> > And qword0 is reloaded to update vector register. This ensures what
> > fetched is correct descriptor data.
> "This ensures that the fetched data is correct".
> 
> Suggest capturing the performance impact, so it is clearly documented.

Added performance impact to commit message in v2.
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> With the above comments,
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 

Thanks for your review.
Comments are addressed in v2.
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index b2683fda60..71191c7cc8 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> > *__rte_restrict rxq,
> >  		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
> >  		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> >
> > +		/* Use acquire fence to order loads of descriptor qwords */
> > +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > +		/* A.2 reload qword0 to make it ordered after qword1 load
> */
> > +		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> > 0);
> > +		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> > 0);
> > +		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> > 0);
> > +		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> > +
> >  		/* B.1 load 4 mbuf point */
> >  		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
> >  		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> > --
> > 2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering
  2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
@ 2021-09-24 11:08   ` Zhang, Qi Z
  2 siblings, 0 replies; 13+ messages in thread
From: Zhang, Qi Z @ 2021-09-24 11:08 UTC (permalink / raw)
  To: Ruifeng Wang, dev
  Cc: Xing, Beilei, Richardson, Bruce, jerinj, hemant.agrawal, drc,
	honnappa.nagarahalli, stable, nd



> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Wednesday, September 15, 2021 4:34 PM
> To: dev@dpdk.org
> Cc: Xing, Beilei <beilei.xing@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; jerinj@marvell.com;
> hemant.agrawal@nxp.com; drc@linux.vnet.ibm.com;
> honnappa.nagarahalli@arm.com; stable@dpdk.org; nd@arm.com; Ruifeng
> Wang <ruifeng.wang@arm.com>
> Subject: [PATCH v2 0/2] i40e Rx descriptor loads ordering
> 
> On Rx path, NIC fills Rx descriptor with data pertains to received packet.
> 
> A single descriptor consists of multiple words. Word1 has the bit that indicates
> readiness of descriptor for software to use. So word1 should be loaded before
> other words.
> 
> On architectures with weaker memory ordering, barrier is needed to ensure
> the ordering of loads.
> 
> This patch set fixed the risk on both scalar path and aarch64 vector path.
> 
> v2:
> Updated commit message. Performance impact added. (Honnappa)
> 
> Ruifeng Wang (2):
>   net/i40e: fix risk in Rx descriptor read in NEON vector path
>   net/i40e: fix risk in Rx descriptor read in scalar path
> 
>  drivers/net/i40e/i40e_rxtx.c          | 12 ++++++++++++
>  drivers/net/i40e/i40e_rxtx_vec_neon.c |  8 ++++++++
>  2 files changed, 20 insertions(+)
> 
> --
> 2.25.1

Applied to dpdk-next-net-intel.

Thanks
Qi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
@ 2021-09-29 15:05     ` Ferruh Yigit
  2021-09-29 15:29       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 13+ messages in thread
From: Ferruh Yigit @ 2021-09-29 15:05 UTC (permalink / raw)
  To: Ruifeng Wang, dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, honnappa.nagarahalli, stable, nd

On 9/15/2021 9:33 AM, Ruifeng Wang wrote:
> Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
> that the rest of the descriptor words have valid values. Hence, the
> word containing DD bit must be read first before reading the rest of
> the descriptor words.
> 
> Since the entire descriptor is not read atomically, on relaxed memory
> ordered systems like Aarch64, read of the word containing DD field
> could be reordered after read of other words.
> 
> Read barrier is inserted between read of the word with DD field
> and read of other words. The barrier ensures that the fetched data
> is correct.
> 
> Testpmd single core test showed no performance drop on x86 or N1SDP.
> On ThunderX2, 22% performance regression was observed.
> 

Is 22% performance drop value correct? That is a big drop, is it acceptable?

Is this performance drop valid for all Arm scalar datapath, or is it specific to
ThunderX2?

> Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-29 15:05     ` Ferruh Yigit
@ 2021-09-29 15:29       ` Honnappa Nagarahalli
  2021-10-11 16:26         ` Ferruh Yigit
  0 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2021-09-29 15:29 UTC (permalink / raw)
  To: Ferruh Yigit, Ruifeng Wang, dev
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, stable, nd, humin29, nd

<snip>
> 
> On 9/15/2021 9:33 AM, Ruifeng Wang wrote:
> > Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
> > that the rest of the descriptor words have valid values. Hence, the
> > word containing DD bit must be read first before reading the rest of
> > the descriptor words.
> >
> > Since the entire descriptor is not read atomically, on relaxed memory
> > ordered systems like Aarch64, read of the word containing DD field
> > could be reordered after read of other words.
> >
> > Read barrier is inserted between read of the word with DD field and
> > read of other words. The barrier ensures that the fetched data is
> > correct.
> >
> > Testpmd single core test showed no performance drop on x86 or N1SDP.
> > On ThunderX2, 22% performance regression was observed.
> >
> 
> Is 22% performance drop value correct? That is a big drop, is it acceptable?
Agree, it is a big drop. Fixing it will require using the barrier less frequently. For ex: read 4 descriptors (4 words containing the DD bits) before using the barrier.

> 
> Is this performance drop valid for all Arm scalar datapath, or is it specific to
> ThunderX2?
This is specific to ThunderX2. N1 CPU does not see any impact. A72 is not tested. Considering that the ThunderXx line of CPUs are not in further development, and it is scalar path, I would not suggest to make further changes to the code.

It would be good to test this on Kunpeng servers and get some feedback.

> 
> > Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path
  2021-09-29 15:29       ` Honnappa Nagarahalli
@ 2021-10-11 16:26         ` Ferruh Yigit
  0 siblings, 0 replies; 13+ messages in thread
From: Ferruh Yigit @ 2021-10-11 16:26 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ruifeng Wang, dev, Min Hu (Connor),
	Yisen Zhuang, Lijun Ou
  Cc: beilei.xing, qi.z.zhang, bruce.richardson, jerinj,
	hemant.agrawal, drc, stable, nd, humin29

On 9/29/2021 4:29 PM, Honnappa Nagarahalli wrote:
> <snip>
>>
>> On 9/15/2021 9:33 AM, Ruifeng Wang wrote:
>>> Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates
>>> that the rest of the descriptor words have valid values. Hence, the
>>> word containing DD bit must be read first before reading the rest of
>>> the descriptor words.
>>>
>>> Since the entire descriptor is not read atomically, on relaxed memory
>>> ordered systems like Aarch64, read of the word containing DD field
>>> could be reordered after read of other words.
>>>
>>> Read barrier is inserted between read of the word with DD field and
>>> read of other words. The barrier ensures that the fetched data is
>>> correct.
>>>
>>> Testpmd single core test showed no performance drop on x86 or N1SDP.
>>> On ThunderX2, 22% performance regression was observed.
>>>
>>
>> Is 22% performance drop value correct? That is a big drop, is it acceptable?
> Agree, it is a big drop. Fixing it will require using the barrier less frequently. For ex: read 4 descriptors (4 words containing the DD bits) before using the barrier.
> 
>>
>> Is this performance drop valid for all Arm scalar datapath, or is it specific to
>> ThunderX2?
> This is specific to ThunderX2. N1 CPU does not see any impact. A72 is not tested. Considering that the ThunderXx line of CPUs are not in further development, and it is scalar path, I would not suggest to make further changes to the code.
> 
> It would be good to test this on Kunpeng servers and get some feedback.

Hi Connor, Yisen, Lijun,

Can you please check this patch? I don't know if you are using i40e nic
on your platform but if you do can you please test it?

Overall this patch cause a big performance drop on Arm for i40e, I just
want to be sure this is not impacting any user negatively.

> 
>>
>>> Fixes: 7b0cf70135d1 ("net/i40e: support ARM platform")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-11 16:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06  3:31 [dpdk-stable] [PATCH 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
2021-09-06  3:32 ` [dpdk-stable] [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
2021-09-14 18:33   ` Honnappa Nagarahalli
2021-09-15  8:42     ` Ruifeng Wang
2021-09-06  3:32 ` [dpdk-stable] [PATCH 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
2021-09-14 18:06   ` Honnappa Nagarahalli
2021-09-15  8:33 ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Ruifeng Wang
2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
2021-09-15  8:33   ` [dpdk-stable] [PATCH v2 2/2] net/i40e: fix risk in Rx descriptor read in scalar path Ruifeng Wang
2021-09-29 15:05     ` Ferruh Yigit
2021-09-29 15:29       ` Honnappa Nagarahalli
2021-10-11 16:26         ` Ferruh Yigit
2021-09-24 11:08   ` [dpdk-stable] [PATCH v2 0/2] i40e Rx descriptor loads ordering Zhang, Qi Z

patches for DPDK stable branches

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ http://inbox.dpdk.org/stable \
		stable@dpdk.org
	public-inbox-index stable

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.stable


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git