* [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown
@ 2020-10-14 8:34 murphy yang
2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang
0 siblings, 1 reply; 14+ messages in thread
From: murphy yang @ 2020-10-14 8:34 UTC (permalink / raw)
To: dev; +Cc: qiming.yang, qi.z.zhang, murphy
From: murphy <murphyx.yang@intel.com>
When set 'csum set outer-udp hw 0' ,support for
ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet
PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Signed-off-by: murphy <murphyx.yang@intel.com>
---
drivers/net/ice/ice_rxtx.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 93a0ac691..e74741732 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 1/1] net/ice: fix outher chksum on cvl unknown
2020-10-14 8:34 [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown murphy yang
@ 2020-10-16 5:07 ` murphy yang
2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang
0 siblings, 1 reply; 14+ messages in thread
From: murphy yang @ 2020-10-16 5:07 UTC (permalink / raw)
To: dev; +Cc: qiming.yang, qi.z.zhang, murphy
From: murphy <murphyx.yang@intel.com>
When set 'csum set outer-udp hw 0' ,support for
ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet
PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: murphy <murphyx.yang@intel.com>
v2:
- cover vector path
---
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 110 ++++++++++++++++++++--------
drivers/net/ice/ice_rxtx_vec_sse.c | 71 ++++++++++++------
3 files changed, 133 insertions(+), 53 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 93a0ac691..e74741732 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index 5969a3048..771734cf3 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,79 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +505,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l3_l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l3_l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l3_l4_outer_mask);
+ l3_l4_outer_flags = _mm256_slli_epi32(l3_l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x3);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l3_l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index c4c9a9126..12eaab83e 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,59 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +186,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l3_l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3);
+ __m128i l3_l4_outer_flags = _mm_and_si128(flags, l3_l4_outer_mask);
+ l3_l4_outer_flags = _mm_slli_epi32(l3_l4_outer_flags, 20);
+
+ __m128i l3_l4_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3);
+ flags = _mm_and_si128(flags, l3_l4_mask);
+ flags = _mm_or_si128(flags, l3_l4_outer_flags);
+
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +246,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v3] net/ice: fix outher chksum on cvl unknown
2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang
@ 2020-10-27 6:09 ` Murphy Yang
2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang
0 siblings, 1 reply; 14+ messages in thread
From: Murphy Yang @ 2020-10-27 6:09 UTC (permalink / raw)
To: dev; +Cc: qiming.yang, qi.z.zhang, murphy
From: murphy <murphyx.yang@intel.com>
When set 'csum set outer-udp hw 0' ,support for
ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet
PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: murphy <murphyx.yang@intel.com>
v2:
- cover vector path
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in vector path.
- rename some variable name.
---
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 117 ++++++++++++++++++++--------
drivers/net/ice/ice_rxtx_vec_sse.c | 75 +++++++++++++-----
3 files changed, 144 insertions(+), 53 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 93a0ac691..e74741732 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index 5969a3048..edf681113 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x3);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index c4c9a9126..b1581b908 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3);
+ __m128i l3_flags = _mm_and_si128(flags, l3_mask);
+ flags = _mm_or_si128(l3_flags, l4_outer_flags);
+
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v4] net/ice: fix outher chksum on cvl unknown
2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang
@ 2020-11-03 6:43 ` Murphy Yang
2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang
2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang
0 siblings, 2 replies; 14+ messages in thread
From: Murphy Yang @ 2020-11-03 6:43 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang
Currently, driver does not supports parse UDP outer checksum flag of
tunneled packets.
When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
commands to enable hardware UDP outer checksum. This patch supports
parse UDP outer checksum flag of tunneled packets.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
v4:
- cover AVX512 vector path.
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
- rename some variable name.
v2:
- cover AVX2 and SSE vector path
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++-----
4 files changed, 228 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5fbd68eafc..24a7caeb98 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..80a17c450a 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x3);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index e5e7cc1482..a4a40be233 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x3);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..8baac94806 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3);
+ __m128i l3_flags = _mm_and_si128(flags, l3_mask);
+ flags = _mm_or_si128(l3_flags, l4_outer_flags);
+
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown
2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang
@ 2020-11-03 6:43 ` Murphy Yang
2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang
1 sibling, 0 replies; 14+ messages in thread
From: Murphy Yang @ 2020-11-03 6:43 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang
Currently, driver does not supports parse UDP outer checksum flag of
tunneled packets.
When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
commands to enable hardware UDP outer checksum. This patch supports
parse UDP outer checksum flag of tunneled packets.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++-----
4 files changed, 228 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5fbd68eafc..24a7caeb98 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..80a17c450a 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x3);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index e5e7cc1482..a4a40be233 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x3);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x3);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..8baac94806 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3);
+ __m128i l3_flags = _mm_and_si128(flags, l3_mask);
+ flags = _mm_or_si128(l3_flags, l4_outer_flags);
+
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v5] net/ice: fix outher chksum on cvl unknown
2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang
2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang
@ 2020-11-04 9:26 ` Murphy Yang
2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang
1 sibling, 1 reply; 14+ messages in thread
From: Murphy Yang @ 2020-11-04 9:26 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang
Currently, driver does not supports parse UDP outer checksum flag of
tunneled packets.
When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
commands to enable hardware UDP outer checksum. This patch supports
parse UDP outer checksum flag of tunneled packets.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
v5:
- fix outer L4 checksum mask for vector path.
v4:
- cover AVX512 vector path.
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
- rename some variable name.
v2:
- cover AVX2 and SSE vector path
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++-----
4 files changed, 228 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5fbd68eafc..24a7caeb98 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..2802c4caae 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x6);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index e5e7cc1482..9cda91a2d8 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* second 128-bits */
+ /** shift right 20 bits to make sure it not exceed 255 and
+ * use the low two bits to indicate outer checksum status
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_mask = _mm256_set1_epi32(~0x6);
+ __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask);
+ l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags);
+
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..658148aa8a 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /* shift right 1 bit to make sure it not exceed 255 */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
+ __m128i l3_flags = _mm_and_si128(flags, l3_mask);
+ flags = _mm_or_si128(l3_flags, l4_outer_flags);
+
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang
@ 2020-11-09 6:06 ` Murphy Yang
2020-11-13 3:16 ` Xie, WeiX
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Murphy Yang @ 2020-11-09 6:06 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang
Currently, driver does not support parse UDP outer checksum flag of
tunneled packets.
When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
commands to enable hardware UDP outer checksum. This patch supports
parse UDP outer checksum flag of tunneled packets.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
v6:
- rename variable name.
- update comments.
v5:
- fix outer L4 checksum mask for vector path.
v4:
- cover AVX512 vector path.
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
- rename variable name.
v2:
- cover AVX2 and SSE vector path
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++-----
4 files changed, 233 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5fbd68eafc..24a7caeb98 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..7838e17787 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index e5e7cc1482..d33ef2a042 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -211,43 +211,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -432,6 +477,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..f8b3574e36 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
+ __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask);
+ flags = _mm_or_si128(l3_l4_flags, l4_outer_flags);
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
+ rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
+ rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
+ rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang
@ 2020-11-13 3:16 ` Xie, WeiX
2020-11-13 5:33 ` Zhang, Qi Z
2020-11-13 11:51 ` Ferruh Yigit
2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang
2 siblings, 1 reply; 14+ messages in thread
From: Xie, WeiX @ 2020-11-13 3:16 UTC (permalink / raw)
To: Yang, MurphyX, dev
Cc: Yang, Qiming, Zhang, Qi Z, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo,
Yang, MurphyX
Tested-by: Xie,WeiX < weix.xie@intel.com>
Regards,
Xie Wei
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang
> Sent: Monday, November 9, 2020 2:07 PM
> To: dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi
> <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang,
> MurphyX <murphyx.yang@intel.com>
> Subject: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
>
> Currently, driver does not support parse UDP outer checksum flag of
> tunneled packets.
>
> When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
> commands to enable hardware UDP outer checksum. This patch supports
> parse UDP outer checksum flag of tunneled packets.
>
> Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
> Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
> Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
>
> Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
> ---
> v6:
> - rename variable name.
> - update comments.
> v5:
> - fix outer L4 checksum mask for vector path.
> v4:
> - cover AVX512 vector path.
> v3:
> - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
> - rename variable name.
> v2:
> - cover AVX2 and SSE vector path
>
> drivers/net/ice/ice_rxtx.c | 5 ++
> drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++-------
> drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++-------
> drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++-----
> 4 files changed, 233 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index
> 5fbd68eafc..24a7caeb98 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
> if (unlikely(stat_err0 & (1 <<
> ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
> flags |= PKT_RX_EIP_CKSUM_BAD;
>
> + if (unlikely(stat_err0 & (1 <<
> ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
> + flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
> + else
> + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
> +
> return flags;
> }
>
> diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/ice/ice_rxtx_vec_avx2.c
> index b72a9e7025..7838e17787 100644
> --- a/drivers/net/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
> @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct
> ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
> * bit13 is for VLAN indication.
> */
> const __m256i flags_mask =
> - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
> + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
> /**
> * data to be shuffled by the result of the flags mask shifted by 4
> * bits. This gives use the l3_l4 flags.
> */
> - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0,
> 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - /* second 128-bits */
> - 0, 0, 0, 0, 0, 0, 0, 0,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m256i l3_l4_flags_shuf =
> + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20
> |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * second 128-bits
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
> const __m256i cksum_mask =
> - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD |
> - PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_EIP_CKSUM_BAD);
> + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
> + PKT_RX_L4_CKSUM_MASK |
> + PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_OUTER_L4_CKSUM_MASK);
> /**
> * data to be shuffled by result of flag mask, shifted down 12.
> * If RSS(bit12)/VLAN(bit13) are set,
> @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue
> *rxq, struct rte_mbuf **rx_pkts,
> __m256i l3_l4_flags =
> _mm256_shuffle_epi8(l3_l4_flags_shuf,
> _mm256_srli_epi32(flag_bits, 4));
> l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
> +
> + __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
> + __m256i l4_outer_flags =
> + _mm256_and_si256(l3_l4_flags,
> l4_outer_mask);
> + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
> +
> + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
> + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
> + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
> l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
> /* set rss and vlan flags */
> const __m256i rss_vlan_flag_bits =
> diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c
> b/drivers/net/ice/ice_rxtx_vec_avx512.c
> index e5e7cc1482..d33ef2a042 100644
> --- a/drivers/net/ice/ice_rxtx_vec_avx512.c
> +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
> @@ -211,43 +211,88 @@ _ice_recv_raw_pkts_vec_avx512(struct
> ice_rx_queue *rxq,
> * bit13 is for VLAN indication.
> */
> const __m256i flags_mask =
> - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
> + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
> /**
> * data to be shuffled by the result of the flags mask shifted by 4
> * bits. This gives use the l3_l4 flags.
> */
> - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0,
> 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - /* 2nd 128-bits */
> - 0, 0, 0, 0, 0, 0, 0, 0,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m256i l3_l4_flags_shuf =
> + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20
> |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * second 128-bits
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
> const __m256i cksum_mask =
> - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD |
> - PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_EIP_CKSUM_BAD);
> + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
> + PKT_RX_L4_CKSUM_MASK |
> + PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_OUTER_L4_CKSUM_MASK);
> /**
> * data to be shuffled by result of flag mask, shifted down 12.
> * If RSS(bit12)/VLAN(bit13) are set,
> @@ -432,6 +477,14 @@ _ice_recv_raw_pkts_vec_avx512(struct
> ice_rx_queue *rxq,
> __m256i l3_l4_flags =
> _mm256_shuffle_epi8(l3_l4_flags_shuf,
> _mm256_srli_epi32(flag_bits, 4));
> l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
> + __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
> + __m256i l4_outer_flags =
> + _mm256_and_si256(l3_l4_flags,
> l4_outer_mask);
> + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
> +
> + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
> + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
> + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
> l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
> /* set rss and vlan flags */
> const __m256i rss_vlan_flag_bits =
> diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c
> b/drivers/net/ice/ice_rxtx_vec_sse.c
> index 626364719b..f8b3574e36 100644
> --- a/drivers/net/ice/ice_rxtx_vec_sse.c
> +++ b/drivers/net/ice/ice_rxtx_vec_sse.c
> @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> * bit12 for RSS indication.
> * bit13 for VLAN indication.
> */
> - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
> - 0x3070, 0x3070);
> -
> + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
> + 0x30f0, 0x30f0);
> const __m128i cksum_mask =
> _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD);
>
> /* map the checksum, rss and vlan fields to the checksum, rss
> * and vlan flag
> */
> - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m128i cksum_flags =
> + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
>
> const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
> 0, 0, 0, 0,
> @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
> /* then we shift left 1 bit */
> flags = _mm_slli_epi32(flags, 1);
> +
> + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
> + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
> + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
> +
> + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
> + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask);
> + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags);
> /* we need to mask out the reduntant bits introduced by RSS or
> * VLAN fields.
> */
> @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> * appropriate flags means that we have to do a shift and blend for
> * each mbuf before we do the write.
> */
> - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8),
> 0x10);
> - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4),
> 0x10);
> - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
> - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4),
> 0x10);
> + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8),
> 0x04);
> + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4),
> 0x04);
> + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
> + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4),
> 0x04);
>
> /* write the rearm data and the olflags in one write */
> RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
> --
> 2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
2020-11-13 3:16 ` Xie, WeiX
@ 2020-11-13 5:33 ` Zhang, Qi Z
0 siblings, 0 replies; 14+ messages in thread
From: Zhang, Qi Z @ 2020-11-13 5:33 UTC (permalink / raw)
To: Xie, WeiX, Yang, MurphyX, dev
Cc: Yang, Qiming, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yang, MurphyX
> -----Original Message-----
> From: Xie, WeiX <weix.xie@intel.com>
> Sent: Friday, November 13, 2020 11:16 AM
> To: Yang, MurphyX <murphyx.yang@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi
> <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang,
> MurphyX <murphyx.yang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
>
> Tested-by: Xie,WeiX < weix.xie@intel.com>
>
> Regards,
> Xie Wei
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang
> > Sent: Monday, November 9, 2020 2:07 PM
> > To: dev@dpdk.org
> > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong,
> > Leyi <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang,
> > MurphyX <murphyx.yang@intel.com>
> > Subject: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl
> > unknown
> >
> > Currently, driver does not support parse UDP outer checksum flag of
> > tunneled packets.
> >
> > When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
> > commands to enable hardware UDP outer checksum. This patch supports
> > parse UDP outer checksum flag of tunneled packets.
> >
> > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
> > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX
> > path")
> > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE
> > path")
> >
> > Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Applied to dpdk-next-net-intel.
Thanks
Qi
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown
2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang
2020-11-13 3:16 ` Xie, WeiX
@ 2020-11-13 11:51 ` Ferruh Yigit
2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang
2 siblings, 0 replies; 14+ messages in thread
From: Ferruh Yigit @ 2020-11-13 11:51 UTC (permalink / raw)
To: Murphy Yang, dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu
On 11/9/2020 6:06 AM, Murphy Yang wrote:
> Currently, driver does not support parse UDP outer checksum flag of
> tunneled packets.
>
> When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0'
> commands to enable hardware UDP outer checksum. This patch supports
> parse UDP outer checksum flag of tunneled packets.
>
> Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
> Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
> Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
>
> Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
<...>
> @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
> * appropriate flags means that we have to do a shift and blend for
> * each mbuf before we do the write.
> */
> - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
> - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
> - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
> - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
> + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04);
> + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04);
> + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04);
> + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04);
Hi Murphy,
This change is in the 'ice_rxtx_vec_sse.c' file, but is the '_mm_blend_epi32'
intrinsic, an SSE intrinsic?
Since it is causing a compile error with default target, you can test with
'./devtools/test-meson-builds.sh' script.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown
2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang
2020-11-13 3:16 ` Xie, WeiX
2020-11-13 11:51 ` Ferruh Yigit
@ 2020-11-23 8:34 ` Murphy Yang
2020-11-23 8:56 ` Xie, WeiX
2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang
2 siblings, 2 replies; 14+ messages in thread
From: Murphy Yang @ 2020-11-23 8:34 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang
When received tunneled packets, the testpmd output log shows 'ol_flags'
value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is
'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'.
Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and 'PKT_RX_OUTER_L4_CKSUM_BAD' to
'flags' for normal path, 'l3_l4_flags_shuf' for AVX2 and AVX512 vector
path and 'cksum_flags' for SSE vector path to ensure that the 'ol_flags'
can match correct flags.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
v7:
- fix compile error with default target on SSE vector path.
v6:
- rename variable name.
- update comments.
v5:
- fix outer L4 checksum mask for vector path.
v4:
- cover AVX512 vector path.
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
- rename variable name.
v2:
- cover AVX2 and SSE vector path
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++-----
4 files changed, 233 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 5fbd68eafc..24a7caeb98 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..7838e17787 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index df5d2be1e6..fd5d724329 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..87e0c3db2e 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
+ __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask);
+ flags = _mm_or_si128(l3_l4_flags, l4_outer_flags);
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x30);
+ rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x30);
+ rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30);
+ rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x30);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown
2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang
@ 2020-11-23 8:56 ` Xie, WeiX
2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang
1 sibling, 0 replies; 14+ messages in thread
From: Xie, WeiX @ 2020-11-23 8:56 UTC (permalink / raw)
To: Yang, MurphyX, dev
Cc: Yang, Qiming, Zhang, Qi Z, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo,
Yang, MurphyX
Tested-by: Xie,WeiX < weix.xie@intel.com>
Regards,
Xie Wei
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang
> Sent: Monday, November 23, 2020 4:35 PM
> To: dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi
> <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang,
> MurphyX <murphyx.yang@intel.com>
> Subject: [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown
>
> When received tunneled packets, the testpmd output log shows 'ol_flags'
> value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected
> value is 'PKT_RX_OUTER_L4_CKSUM_GOOD' or
> 'PKT_RX_OUTER_L4_CKSUM_BAD'.
>
> Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and
> 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path,
> 'l3_l4_flags_shuf' for AVX2 and AVX512 vector path and 'cksum_flags' for SSE
> vector path to ensure that the 'ol_flags'
> can match correct flags.
>
> Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
> Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
> Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
>
> Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
> ---
> v7:
> - fix compile error with default target on SSE vector path.
> v6:
> - rename variable name.
> - update comments.
> v5:
> - fix outer L4 checksum mask for vector path.
> v4:
> - cover AVX512 vector path.
> v3:
> - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
> - rename variable name.
> v2:
> - cover AVX2 and SSE vector path
> drivers/net/ice/ice_rxtx.c | 5 ++
> drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++-------
> drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++-------
> drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++-----
> 4 files changed, 233 insertions(+), 85 deletions(-)
>
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index
> 5fbd68eafc..24a7caeb98 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
> if (unlikely(stat_err0 & (1 <<
> ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
> flags |= PKT_RX_EIP_CKSUM_BAD;
>
> + if (unlikely(stat_err0 & (1 <<
> ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
> + flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
> + else
> + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
> +
> return flags;
> }
>
> diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c
> b/drivers/net/ice/ice_rxtx_vec_avx2.c
> index b72a9e7025..7838e17787 100644
> --- a/drivers/net/ice/ice_rxtx_vec_avx2.c
> +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
> @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct
> ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
> * bit13 is for VLAN indication.
> */
> const __m256i flags_mask =
> - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
> + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
> /**
> * data to be shuffled by the result of the flags mask shifted by 4
> * bits. This gives use the l3_l4 flags.
> */
> - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0,
> 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - /* second 128-bits */
> - 0, 0, 0, 0, 0, 0, 0, 0,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m256i l3_l4_flags_shuf =
> + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20
> |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * second 128-bits
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
> const __m256i cksum_mask =
> - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD |
> - PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_EIP_CKSUM_BAD);
> + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
> + PKT_RX_L4_CKSUM_MASK |
> + PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_OUTER_L4_CKSUM_MASK);
> /**
> * data to be shuffled by result of flag mask, shifted down 12.
> * If RSS(bit12)/VLAN(bit13) are set,
> @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue
> *rxq, struct rte_mbuf **rx_pkts,
> __m256i l3_l4_flags =
> _mm256_shuffle_epi8(l3_l4_flags_shuf,
> _mm256_srli_epi32(flag_bits, 4));
> l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
> +
> + __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
> + __m256i l4_outer_flags =
> + _mm256_and_si256(l3_l4_flags,
> l4_outer_mask);
> + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
> +
> + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
> + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
> + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
> l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
> /* set rss and vlan flags */
> const __m256i rss_vlan_flag_bits =
> diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c
> b/drivers/net/ice/ice_rxtx_vec_avx512.c
> index df5d2be1e6..fd5d724329 100644
> --- a/drivers/net/ice/ice_rxtx_vec_avx512.c
> +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
> @@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct
> ice_rx_queue *rxq,
> * bit13 is for VLAN indication.
> */
> const __m256i flags_mask =
> - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
> + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
> /**
> * data to be shuffled by the result of the flags mask shifted by 4
> * bits. This gives use the l3_l4 flags.
> */
> - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0,
> 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - /* 2nd 128-bits */
> - 0, 0, 0, 0, 0, 0, 0, 0,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m256i l3_l4_flags_shuf =
> + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20
> |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * second 128-bits
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
> const __m256i cksum_mask =
> - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD |
> - PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_EIP_CKSUM_BAD);
> + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
> + PKT_RX_L4_CKSUM_MASK |
> + PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_OUTER_L4_CKSUM_MASK);
> /**
> * data to be shuffled by result of flag mask, shifted down 12.
> * If RSS(bit12)/VLAN(bit13) are set,
> @@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct
> ice_rx_queue *rxq,
> __m256i l3_l4_flags =
> _mm256_shuffle_epi8(l3_l4_flags_shuf,
> _mm256_srli_epi32(flag_bits, 4));
> l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
> + __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
> + __m256i l4_outer_flags =
> + _mm256_and_si256(l3_l4_flags,
> l4_outer_mask);
> + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
> +
> + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
> + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
> + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
> l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
> /* set rss and vlan flags */
> const __m256i rss_vlan_flag_bits =
> diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c
> b/drivers/net/ice/ice_rxtx_vec_sse.c
> index 626364719b..87e0c3db2e 100644
> --- a/drivers/net/ice/ice_rxtx_vec_sse.c
> +++ b/drivers/net/ice/ice_rxtx_vec_sse.c
> @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> * bit12 for RSS indication.
> * bit13 for VLAN indication.
> */
> - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
> - 0x3070, 0x3070);
> -
> + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
> + 0x30f0, 0x30f0);
> const __m128i cksum_mask =
> _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD,
> PKT_RX_IP_CKSUM_MASK |
> PKT_RX_L4_CKSUM_MASK |
> +
> PKT_RX_OUTER_L4_CKSUM_MASK |
> PKT_RX_EIP_CKSUM_BAD);
>
> /* map the checksum, rss and vlan fields to the checksum, rss
> * and vlan flag
> */
> - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
> - /* shift right 1 bit to make sure it not exceed 255 */
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_BAD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_EIP_CKSUM_BAD |
> PKT_RX_L4_CKSUM_GOOD |
> - PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_BAD |
> PKT_RX_IP_CKSUM_GOOD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_BAD) >> 1,
> - (PKT_RX_L4_CKSUM_GOOD |
> PKT_RX_IP_CKSUM_GOOD) >> 1);
> + const __m128i cksum_flags =
> + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + /**
> + * shift right 20 bits to use the low two bits to indicate
> + * outer checksum status
> + * shift right 1 bit to make sure it not exceed 255
> + */
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_EIP_CKSUM_BAD |
> + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >>
> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_BAD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_BAD) >> 1,
> + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 |
> PKT_RX_L4_CKSUM_GOOD |
> + PKT_RX_IP_CKSUM_GOOD) >> 1);
>
> const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
> 0, 0, 0, 0,
> @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
> /* then we shift left 1 bit */
> flags = _mm_slli_epi32(flags, 1);
> +
> + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
> + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
> + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
> +
> + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
> + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask);
> + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags);
> /* we need to mask out the reduntant bits introduced by RSS or
> * VLAN fields.
> */
> @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq,
> __m128i descs[4],
> * appropriate flags means that we have to do a shift and blend for
> * each mbuf before we do the write.
> */
> - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8),
> 0x10);
> - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4),
> 0x10);
> - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
> - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4),
> 0x10);
> + rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8),
> 0x30);
> + rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4),
> 0x30);
> + rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30);
> + rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4),
> 0x30);
>
> /* write the rearm data and the olflags in one write */
> RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
> --
> 2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown
2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang
2020-11-23 8:56 ` Xie, WeiX
@ 2020-12-15 8:10 ` Murphy Yang
2020-12-15 12:11 ` Zhang, Qi Z
1 sibling, 1 reply; 14+ messages in thread
From: Murphy Yang @ 2020-12-15 8:10 UTC (permalink / raw)
To: dev
Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu,
ferruh.yigit, Murphy Yang
When received tunneled packets, the testpmd output log shows 'ol_flags'
value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is
'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'.
Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and 'PKT_RX_OUTER_L4_CKSUM_BAD' to
'flags' for normal path, 'l3_l4_flags_shuf' for AVX2 and AVX512 vector
path and 'cksum_flags' for SSE vector path to ensure that the 'ol_flags'
can match correct flags.
Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
---
v8:
- tune the commit title.
v7:
- fix compile error with default target on SSE vector path.
v6:
- rename variable name.
- update comments.
v5:
- fix outer L4 checksum mask for vector path.
v4:
- cover AVX512 vector path.
v3:
- add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path.
- rename variable name.
v2:
- cover AVX2 and SSE vector path
drivers/net/ice/ice_rxtx.c | 5 ++
drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++-------
drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++-----
4 files changed, 233 insertions(+), 85 deletions(-)
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 9769e216bf..d052bd0f1b 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0)
if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S)))
flags |= PKT_RX_EIP_CKSUM_BAD;
+ if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S)))
+ flags |= PKT_RX_OUTER_L4_CKSUM_BAD;
+ else
+ flags |= PKT_RX_OUTER_L4_CKSUM_GOOD;
+
return flags;
}
diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c
index b72a9e7025..7838e17787 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx2.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx2.c
@@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* second 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c
index df5d2be1e6..fd5d724329 100644
--- a/drivers/net/ice/ice_rxtx_vec_avx512.c
+++ b/drivers/net/ice/ice_rxtx_vec_avx512.c
@@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
* bit13 is for VLAN indication.
*/
const __m256i flags_mask =
- _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13));
+ _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13));
/**
* data to be shuffled by the result of the flags mask shifted by 4
* bits. This gives use the l3_l4 flags.
*/
- const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- /* 2nd 128-bits */
- 0, 0, 0, 0, 0, 0, 0, 0,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m256i l3_l4_flags_shuf =
+ _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * second 128-bits
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m256i cksum_mask =
- _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD |
- PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_EIP_CKSUM_BAD);
+ _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK |
+ PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_OUTER_L4_CKSUM_MASK);
/**
* data to be shuffled by result of flag mask, shifted down 12.
* If RSS(bit12)/VLAN(bit13) are set,
@@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq,
__m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf,
_mm256_srli_epi32(flag_bits, 4));
l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1);
+ __m256i l4_outer_mask = _mm256_set1_epi32(0x6);
+ __m256i l4_outer_flags =
+ _mm256_and_si256(l3_l4_flags, l4_outer_mask);
+ l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20);
+
+ __m256i l3_l4_mask = _mm256_set1_epi32(~0x6);
+ l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask);
+ l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags);
l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask);
/* set rss and vlan flags */
const __m256i rss_vlan_flag_bits =
diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c
index 626364719b..87e0c3db2e 100644
--- a/drivers/net/ice/ice_rxtx_vec_sse.c
+++ b/drivers/net/ice/ice_rxtx_vec_sse.c
@@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* bit12 for RSS indication.
* bit13 for VLAN indication.
*/
- const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070,
- 0x3070, 0x3070);
-
+ const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0,
+ 0x30f0, 0x30f0);
const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD,
PKT_RX_IP_CKSUM_MASK |
PKT_RX_L4_CKSUM_MASK |
+ PKT_RX_OUTER_L4_CKSUM_MASK |
PKT_RX_EIP_CKSUM_BAD);
/* map the checksum, rss and vlan fields to the checksum, rss
* and vlan flag
*/
- const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0,
- /* shift right 1 bit to make sure it not exceed 255 */
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD |
- PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
- (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1);
+ const __m128i cksum_flags =
+ _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 |
+ PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ /**
+ * shift right 20 bits to use the low two bits to indicate
+ * outer checksum status
+ * shift right 1 bit to make sure it not exceed 255
+ */
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD |
+ PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_BAD) >> 1,
+ (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD |
+ PKT_RX_IP_CKSUM_GOOD) >> 1);
const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0,
0, 0, 0, 0,
@@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
flags = _mm_shuffle_epi8(cksum_flags, tmp_desc);
/* then we shift left 1 bit */
flags = _mm_slli_epi32(flags, 1);
+
+ __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6);
+ __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask);
+ l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20);
+
+ __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6);
+ __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask);
+ flags = _mm_or_si128(l3_l4_flags, l4_outer_flags);
/* we need to mask out the reduntant bits introduced by RSS or
* VLAN fields.
*/
@@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4],
* appropriate flags means that we have to do a shift and blend for
* each mbuf before we do the write.
*/
- rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10);
- rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10);
- rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10);
- rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10);
+ rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x30);
+ rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x30);
+ rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30);
+ rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x30);
/* write the rearm data and the olflags in one write */
RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) !=
--
2.17.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown
2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang
@ 2020-12-15 12:11 ` Zhang, Qi Z
0 siblings, 0 replies; 14+ messages in thread
From: Zhang, Qi Z @ 2020-12-15 12:11 UTC (permalink / raw)
To: Yang, MurphyX, dev
Cc: Yang, Qiming, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yigit,
Ferruh, Yang, MurphyX
> -----Original Message-----
> From: Murphy Yang <murphyx.yang@intel.com>
> Sent: Tuesday, December 15, 2020 4:11 PM
> To: dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi
> <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Yang, MurphyX <murphyx.yang@intel.com>
> Subject: [PATCH v8] net/ice: fix outer checksum unknown
>
> When received tunneled packets, the testpmd output log shows 'ol_flags'
> value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is
> 'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'.
>
> Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and
> 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path, 'l3_l4_flags_shuf'
> for AVX2 and AVX512 vector path and 'cksum_flags' for SSE vector path to
> ensure that the 'ol_flags'
> can match correct flags.
>
> Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor")
> Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path")
> Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path")
>
> Signed-off-by: Murphy Yang <murphyx.yang@intel.com>
Acked-by: Qi Zhang <qi.z.zhang@intel.com>
Applied to dpdk-next-net-intel.
Thanks
Qi
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-12-15 12:11 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-14 8:34 [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown murphy yang
2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang
2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang
2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang
2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang
2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang
2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang
2020-11-13 3:16 ` Xie, WeiX
2020-11-13 5:33 ` Zhang, Qi Z
2020-11-13 11:51 ` Ferruh Yigit
2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang
2020-11-23 8:56 ` Xie, WeiX
2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang
2020-12-15 12:11 ` Zhang, Qi Z
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).