DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/6] remove incorrect code for loading 16B descriptors
@ 2024-01-23 11:40 Bruce Richardson
  2024-01-23 11:40 ` [PATCH 1/6] net/i40e: remove incorrect 16B descriptor read block Bruce Richardson
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

Inside the AVX2 code paths, there was special case code for loading two
16-byte descriptors simultaneously, if that build-time feature was
enabled. As well as not being enabled by default, these code blocks also
were incorrect as there is no guarantee of the two descriptors being
loaded either atomically or in a defined order. If they were loaded in
an unexpected order the driver logic would break. Therefore we remove
these blocks, and do come cleanup of the following code to remove
indentation.

NOTE: I've split out the removal and subsequent cleanup into separate
patches for ease of review. These can be merged into a single patch on
merge, if so desired.

Bruce Richardson (6):
  net/i40e: remove incorrect 16B descriptor read block
  net/i40e: reduce code indentation
  net/iavf: remove incorrect 16B descriptor read block
  net/ice: remove incorrect 16B descriptor read block
  net/ice: reduce code indent
  net/iavf: reduce code indent

 drivers/net/i40e/i40e_rxtx_vec_avx2.c | 64 ++++++++-------------
 drivers/net/iavf/iavf_rxtx_vec_avx2.c | 80 ++++++++-------------------
 drivers/net/ice/ice_rxtx_vec_avx2.c   | 80 ++++++++-------------------
 3 files changed, 72 insertions(+), 152 deletions(-)

--
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] net/i40e: remove incorrect 16B descriptor read block
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-01-23 11:40 ` [PATCH 2/6] net/i40e: reduce code indentation Bruce Richardson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, stable

By default, the driver works with 32B descriptors, but has a separate
descriptor read block for reading two descriptors at a time when using
16B descriptors. However, the 32B reads used are not guaranteed to be
atomic, which will cause issues if that is not the case on a system,
since the descriptors may be read in an undefined order.  Remove the
block, to avoid issues, and just use the regular descriptor reading path
for 16B descriptors, if that support is enabled at build time.

Fixes: dafadd73762e ("net/i40e: add AVX2 Rx function")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/i40e/i40e_rxtx_vec_avx2.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index f468c1fd90..ce87e185f0 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -277,19 +277,6 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-#ifdef RTE_LIBRTE_I40E_16BYTE_RX_DESC
-		/* for AVX we need alignment otherwise loads are not atomic */
-		if (avx_aligned) {
-			/* load in descriptors, 2 at a time, in reverse order */
-			raw_desc6_7 = _mm256_load_si256((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			raw_desc4_5 = _mm256_load_si256((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			raw_desc2_3 = _mm256_load_si256((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			raw_desc0_1 = _mm256_load_si256((void *)(rxdp + 0));
-		} else
-#endif
 		do {
 			const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
 			rte_compiler_barrier();
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/6] net/i40e: reduce code indentation
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
  2024-01-23 11:40 ` [PATCH 1/6] net/i40e: remove incorrect 16B descriptor read block Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-01-23 11:40 ` [PATCH 3/6] net/iavf: remove incorrect 16B descriptor read block Bruce Richardson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

With the removal of the #ifdef block for 16-byte descriptor loads, the
do { } while(0) around the descriptor load block becomes unnecessary.
Removing that do-while allows us to reduce indentation level of the code
by one tab, and makes the function that little cleaner and clearer to
read.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/i40e/i40e_rxtx_vec_avx2.c | 51 +++++++++++++--------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
index ce87e185f0..19cf0ac718 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c
@@ -276,33 +276,30 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 				_mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-		do {
-			const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
-			rte_compiler_barrier();
-			const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
-			rte_compiler_barrier();
-			const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
-			rte_compiler_barrier();
-			const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
-			rte_compiler_barrier();
-			const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
-
-			raw_desc6_7 = _mm256_inserti128_si256(
-					_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
-			raw_desc4_5 = _mm256_inserti128_si256(
-					_mm256_castsi128_si256(raw_desc4), raw_desc5, 1);
-			raw_desc2_3 = _mm256_inserti128_si256(
-					_mm256_castsi128_si256(raw_desc2), raw_desc3, 1);
-			raw_desc0_1 = _mm256_inserti128_si256(
-					_mm256_castsi128_si256(raw_desc0), raw_desc1, 1);
-		} while (0);
+		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		rte_compiler_barrier();
+		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		rte_compiler_barrier();
+		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		rte_compiler_barrier();
+		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		rte_compiler_barrier();
+		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		rte_compiler_barrier();
+		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		rte_compiler_barrier();
+		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		rte_compiler_barrier();
+		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+
+		const __m256i raw_desc6_7 = _mm256_inserti128_si256(
+				_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
+		const __m256i raw_desc4_5 = _mm256_inserti128_si256(
+				_mm256_castsi128_si256(raw_desc4), raw_desc5, 1);
+		const __m256i raw_desc2_3 = _mm256_inserti128_si256(
+				_mm256_castsi128_si256(raw_desc2), raw_desc3, 1);
+		const __m256i raw_desc0_1 = _mm256_inserti128_si256(
+				_mm256_castsi128_si256(raw_desc0), raw_desc1, 1);
 
 		if (split_packet) {
 			int j;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/6] net/iavf: remove incorrect 16B descriptor read block
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
  2024-01-23 11:40 ` [PATCH 1/6] net/i40e: remove incorrect 16B descriptor read block Bruce Richardson
  2024-01-23 11:40 ` [PATCH 2/6] net/i40e: reduce code indentation Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-01-23 11:40 ` [PATCH 4/6] net/iavf: reduce code indent Bruce Richardson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, stable

By default, the driver works with 32B descriptors, but has a separate
descriptor read block for reading two descriptors at a time when using
16B descriptors. However, the 32B reads used are not guaranteed to be
atomic, which will cause issues if that is not the case on a system,
since the descriptors may be read in an undefined order.  Remove the
block, to avoid issues, and just use the regular descriptor reading path
for 16B descriptors, if that support is enabled at build time.

Fixes: af0c246a3800 ("net/iavf: enable AVX2 for iavf")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index 510b4d8f1c..3cec1eef9d 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -194,19 +194,6 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq,
 #endif
 
 		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-#ifdef RTE_LIBRTE_IAVF_16BYTE_RX_DESC
-		/* for AVX we need alignment otherwise loads are not atomic */
-		if (avx_aligned) {
-			/* load in descriptors, 2 at a time, in reverse order */
-			raw_desc6_7 = _mm256_load_si256((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			raw_desc4_5 = _mm256_load_si256((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			raw_desc2_3 = _mm256_load_si256((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			raw_desc0_1 = _mm256_load_si256((void *)(rxdp + 0));
-		} else
-#endif
 		{
 			const __m128i raw_desc7 =
 				_mm_load_si128((void *)(rxdp + 7));
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/6] net/iavf: reduce code indent
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
                   ` (2 preceding siblings ...)
  2024-01-23 11:40 ` [PATCH 3/6] net/iavf: remove incorrect 16B descriptor read block Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-01-23 11:40 ` [PATCH 5/6] net/ice: remove incorrect 16B descriptor read block Bruce Richardson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

With the removal of the separate block for 16B-descriptors, we can
remove the superfluous braces and dedent the code a bit. This allows us
to reduce overall number of lines, since we can merge quite a number of
lines together.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/iavf/iavf_rxtx_vec_avx2.c | 67 ++++++++++-----------------
 1 file changed, 24 insertions(+), 43 deletions(-)

diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
index 3cec1eef9d..49d41af953 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c
@@ -193,49 +193,30 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq,
 			 _mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-		{
-			const __m128i raw_desc7 =
-				_mm_load_si128((void *)(rxdp + 7));
-			rte_compiler_barrier();
-			const __m128i raw_desc6 =
-				_mm_load_si128((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			const __m128i raw_desc5 =
-				_mm_load_si128((void *)(rxdp + 5));
-			rte_compiler_barrier();
-			const __m128i raw_desc4 =
-				_mm_load_si128((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			const __m128i raw_desc3 =
-				_mm_load_si128((void *)(rxdp + 3));
-			rte_compiler_barrier();
-			const __m128i raw_desc2 =
-				_mm_load_si128((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			const __m128i raw_desc1 =
-				_mm_load_si128((void *)(rxdp + 1));
-			rte_compiler_barrier();
-			const __m128i raw_desc0 =
-				_mm_load_si128((void *)(rxdp + 0));
-
-			raw_desc6_7 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc6),
-					 raw_desc7, 1);
-			raw_desc4_5 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc4),
-					 raw_desc5, 1);
-			raw_desc2_3 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc2),
-					 raw_desc3, 1);
-			raw_desc0_1 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc0),
-					 raw_desc1, 1);
-		}
+		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		rte_compiler_barrier();
+		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		rte_compiler_barrier();
+		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		rte_compiler_barrier();
+		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		rte_compiler_barrier();
+		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		rte_compiler_barrier();
+		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		rte_compiler_barrier();
+		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		rte_compiler_barrier();
+		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+
+		const __m256i raw_desc6_7 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
+		const __m256i raw_desc4_5 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc4), raw_desc5, 1);
+		const __m256i raw_desc2_3 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc2), raw_desc3, 1);
+		const __m256i raw_desc0_1 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc0), raw_desc1, 1);
 
 		if (split_packet) {
 			int j;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 5/6] net/ice: remove incorrect 16B descriptor read block
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
                   ` (3 preceding siblings ...)
  2024-01-23 11:40 ` [PATCH 4/6] net/iavf: reduce code indent Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-01-23 11:40 ` [PATCH 6/6] net/ice: reduce code indent Bruce Richardson
  2024-02-22 14:57 ` [PATCH 0/6] remove incorrect code for loading 16B descriptors Burakov, Anatoly
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, stable

By default, the driver works with 32B descriptors, but has a separate
descriptor read block for reading two descriptors at a time when using
16B descriptors. However, the 32B reads used are not guaranteed to be
atomic, which will cause issues if that is not the case on a system,
since the descriptors may be read in an undefined order.  Remove the
block, to avoid issues, and just use the regular descriptor reading path
for 16B descriptors, if that support is enabled at build time.

Fixes: ae60d3c9b227 ("net/ice: support Rx AVX2 vector")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_rxtx_vec_avx2.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index 6f6d790967..b93e9c109e 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -255,19 +255,6 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 #endif
 
 		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-#ifdef RTE_LIBRTE_ICE_16BYTE_RX_DESC
-		/* for AVX we need alignment otherwise loads are not atomic */
-		if (avx_aligned) {
-			/* load in descriptors, 2 at a time, in reverse order */
-			raw_desc6_7 = _mm256_load_si256((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			raw_desc4_5 = _mm256_load_si256((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			raw_desc2_3 = _mm256_load_si256((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			raw_desc0_1 = _mm256_load_si256((void *)(rxdp + 0));
-		} else
-#endif
 		{
 			const __m128i raw_desc7 =
 				_mm_load_si128((void *)(rxdp + 7));
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 6/6] net/ice: reduce code indent
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
                   ` (4 preceding siblings ...)
  2024-01-23 11:40 ` [PATCH 5/6] net/ice: remove incorrect 16B descriptor read block Bruce Richardson
@ 2024-01-23 11:40 ` Bruce Richardson
  2024-02-22 14:57 ` [PATCH 0/6] remove incorrect code for loading 16B descriptors Burakov, Anatoly
  6 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-01-23 11:40 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson

With the removal of the separate block for 16B-descriptors, we can
remove the superfluous braces and dedent the code a bit. This allows us
to reduce overall number of lines, since we can merge quite a number of
lines together.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/ice/ice_rxtx_vec_avx2.c | 67 +++++++++++------------------
 1 file changed, 24 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b93e9c109e..d6e88dbb29 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -254,49 +254,30 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			 _mm256_loadu_si256((void *)&sw_ring[i + 4]));
 #endif
 
-		__m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7;
-		{
-			const __m128i raw_desc7 =
-				_mm_load_si128((void *)(rxdp + 7));
-			rte_compiler_barrier();
-			const __m128i raw_desc6 =
-				_mm_load_si128((void *)(rxdp + 6));
-			rte_compiler_barrier();
-			const __m128i raw_desc5 =
-				_mm_load_si128((void *)(rxdp + 5));
-			rte_compiler_barrier();
-			const __m128i raw_desc4 =
-				_mm_load_si128((void *)(rxdp + 4));
-			rte_compiler_barrier();
-			const __m128i raw_desc3 =
-				_mm_load_si128((void *)(rxdp + 3));
-			rte_compiler_barrier();
-			const __m128i raw_desc2 =
-				_mm_load_si128((void *)(rxdp + 2));
-			rte_compiler_barrier();
-			const __m128i raw_desc1 =
-				_mm_load_si128((void *)(rxdp + 1));
-			rte_compiler_barrier();
-			const __m128i raw_desc0 =
-				_mm_load_si128((void *)(rxdp + 0));
-
-			raw_desc6_7 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc6),
-					 raw_desc7, 1);
-			raw_desc4_5 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc4),
-					 raw_desc5, 1);
-			raw_desc2_3 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc2),
-					 raw_desc3, 1);
-			raw_desc0_1 =
-				_mm256_inserti128_si256
-					(_mm256_castsi128_si256(raw_desc0),
-					 raw_desc1, 1);
-		}
+		const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7));
+		rte_compiler_barrier();
+		const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6));
+		rte_compiler_barrier();
+		const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5));
+		rte_compiler_barrier();
+		const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4));
+		rte_compiler_barrier();
+		const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3));
+		rte_compiler_barrier();
+		const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2));
+		rte_compiler_barrier();
+		const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1));
+		rte_compiler_barrier();
+		const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0));
+
+		const __m256i raw_desc6_7 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1);
+		const __m256i raw_desc4_5 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc4), raw_desc5, 1);
+		const __m256i raw_desc2_3 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc2), raw_desc3, 1);
+		const __m256i raw_desc0_1 =
+			_mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc0), raw_desc1, 1);
 
 		if (split_packet) {
 			int j;
-- 
2.40.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] remove incorrect code for loading 16B descriptors
  2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
                   ` (5 preceding siblings ...)
  2024-01-23 11:40 ` [PATCH 6/6] net/ice: reduce code indent Bruce Richardson
@ 2024-02-22 14:57 ` Burakov, Anatoly
  2024-02-29 16:06   ` Bruce Richardson
  6 siblings, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2024-02-22 14:57 UTC (permalink / raw)
  To: Bruce Richardson, dev

On 1/23/2024 12:40 PM, Bruce Richardson wrote:
> Inside the AVX2 code paths, there was special case code for loading two
> 16-byte descriptors simultaneously, if that build-time feature was
> enabled. As well as not being enabled by default, these code blocks also
> were incorrect as there is no guarantee of the two descriptors being
> loaded either atomically or in a defined order. If they were loaded in
> an unexpected order the driver logic would break. Therefore we remove
> these blocks, and do come cleanup of the following code to remove
> indentation.
> 
> NOTE: I've split out the removal and subsequent cleanup into separate
> patches for ease of review. These can be merged into a single patch on
> merge, if so desired.
> 
> Bruce Richardson (6):
>    net/i40e: remove incorrect 16B descriptor read block
>    net/i40e: reduce code indentation
>    net/iavf: remove incorrect 16B descriptor read block
>    net/ice: remove incorrect 16B descriptor read block
>    net/ice: reduce code indent
>    net/iavf: reduce code indent
> 
>   drivers/net/i40e/i40e_rxtx_vec_avx2.c | 64 ++++++++-------------
>   drivers/net/iavf/iavf_rxtx_vec_avx2.c | 80 ++++++++-------------------
>   drivers/net/ice/ice_rxtx_vec_avx2.c   | 80 ++++++++-------------------
>   3 files changed, 72 insertions(+), 152 deletions(-)
> 
> --
> 2.40.1
> 
Series-Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] remove incorrect code for loading 16B descriptors
  2024-02-22 14:57 ` [PATCH 0/6] remove incorrect code for loading 16B descriptors Burakov, Anatoly
@ 2024-02-29 16:06   ` Bruce Richardson
  0 siblings, 0 replies; 9+ messages in thread
From: Bruce Richardson @ 2024-02-29 16:06 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Thu, Feb 22, 2024 at 03:57:09PM +0100, Burakov, Anatoly wrote:
> On 1/23/2024 12:40 PM, Bruce Richardson wrote:
> > Inside the AVX2 code paths, there was special case code for loading two
> > 16-byte descriptors simultaneously, if that build-time feature was
> > enabled. As well as not being enabled by default, these code blocks also
> > were incorrect as there is no guarantee of the two descriptors being
> > loaded either atomically or in a defined order. If they were loaded in
> > an unexpected order the driver logic would break. Therefore we remove
> > these blocks, and do come cleanup of the following code to remove
> > indentation.
> > 
> > NOTE: I've split out the removal and subsequent cleanup into separate
> > patches for ease of review. These can be merged into a single patch on
> > merge, if so desired.
> > 
> > Bruce Richardson (6):
> >    net/i40e: remove incorrect 16B descriptor read block
> >    net/i40e: reduce code indentation
> >    net/iavf: remove incorrect 16B descriptor read block
> >    net/ice: remove incorrect 16B descriptor read block
> >    net/ice: reduce code indent
> >    net/iavf: reduce code indent
> > 
> >   drivers/net/i40e/i40e_rxtx_vec_avx2.c | 64 ++++++++-------------
> >   drivers/net/iavf/iavf_rxtx_vec_avx2.c | 80 ++++++++-------------------
> >   drivers/net/ice/ice_rxtx_vec_avx2.c   | 80 ++++++++-------------------
> >   3 files changed, 72 insertions(+), 152 deletions(-)
> > 
> > --
> > 2.40.1
> > 
> Series-Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Squashed the 6 patches down to 3, and applied to dpdk-next-net-intel

/Bruce

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-29 16:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23 11:40 [PATCH 0/6] remove incorrect code for loading 16B descriptors Bruce Richardson
2024-01-23 11:40 ` [PATCH 1/6] net/i40e: remove incorrect 16B descriptor read block Bruce Richardson
2024-01-23 11:40 ` [PATCH 2/6] net/i40e: reduce code indentation Bruce Richardson
2024-01-23 11:40 ` [PATCH 3/6] net/iavf: remove incorrect 16B descriptor read block Bruce Richardson
2024-01-23 11:40 ` [PATCH 4/6] net/iavf: reduce code indent Bruce Richardson
2024-01-23 11:40 ` [PATCH 5/6] net/ice: remove incorrect 16B descriptor read block Bruce Richardson
2024-01-23 11:40 ` [PATCH 6/6] net/ice: reduce code indent Bruce Richardson
2024-02-22 14:57 ` [PATCH 0/6] remove incorrect code for loading 16B descriptors Burakov, Anatoly
2024-02-29 16:06   ` Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).