* [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown @ 2020-10-14 8:34 murphy yang 2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang 0 siblings, 1 reply; 14+ messages in thread From: murphy yang @ 2020-10-14 8:34 UTC (permalink / raw) To: dev; +Cc: qiming.yang, qi.z.zhang, murphy From: murphy <murphyx.yang@intel.com> When set 'csum set outer-udp hw 0' ,support for ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Signed-off-by: murphy <murphyx.yang@intel.com> --- drivers/net/ice/ice_rxtx.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 93a0ac691..e74741732 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH 1/1] net/ice: fix outher chksum on cvl unknown 2020-10-14 8:34 [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown murphy yang @ 2020-10-16 5:07 ` murphy yang 2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang 0 siblings, 1 reply; 14+ messages in thread From: murphy yang @ 2020-10-16 5:07 UTC (permalink / raw) To: dev; +Cc: qiming.yang, qi.z.zhang, murphy From: murphy <murphyx.yang@intel.com> When set 'csum set outer-udp hw 0' ,support for ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: murphy <murphyx.yang@intel.com> v2: - cover vector path --- drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 110 ++++++++++++++++++++-------- drivers/net/ice/ice_rxtx_vec_sse.c | 71 ++++++++++++------ 3 files changed, 133 insertions(+), 53 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 93a0ac691..e74741732 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index 5969a3048..771734cf3 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,79 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +505,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l3_l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l3_l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l3_l4_outer_mask); + l3_l4_outer_flags = _mm256_slli_epi32(l3_l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x3); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l3_l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c4c9a9126..12eaab83e 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,59 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +186,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l3_l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3); + __m128i l3_l4_outer_flags = _mm_and_si128(flags, l3_l4_outer_mask); + l3_l4_outer_flags = _mm_slli_epi32(l3_l4_outer_flags, 20); + + __m128i l3_l4_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3); + flags = _mm_and_si128(flags, l3_l4_mask); + flags = _mm_or_si128(flags, l3_l4_outer_flags); + /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +246,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v3] net/ice: fix outher chksum on cvl unknown 2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang @ 2020-10-27 6:09 ` Murphy Yang 2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang 0 siblings, 1 reply; 14+ messages in thread From: Murphy Yang @ 2020-10-27 6:09 UTC (permalink / raw) To: dev; +Cc: qiming.yang, qi.z.zhang, murphy From: murphy <murphyx.yang@intel.com> When set 'csum set outer-udp hw 0' ,support for ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S,mark the packet PKT_RX_OUTER_L4_CKSUM_BAD or PKT_RX_OUTER_L4_CKSUM_GOOD. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: murphy <murphyx.yang@intel.com> v2: - cover vector path v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in vector path. - rename some variable name. --- drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 117 ++++++++++++++++++++-------- drivers/net/ice/ice_rxtx_vec_sse.c | 75 +++++++++++++----- 3 files changed, 144 insertions(+), 53 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 93a0ac691..e74741732 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1424,6 +1424,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index 5969a3048..edf681113 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x3); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c4c9a9126..b1581b908 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3); + __m128i l3_flags = _mm_and_si128(flags, l3_mask); + flags = _mm_or_si128(l3_flags, l4_outer_flags); + /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v4] net/ice: fix outher chksum on cvl unknown 2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang @ 2020-11-03 6:43 ` Murphy Yang 2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang 2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang 0 siblings, 2 replies; 14+ messages in thread From: Murphy Yang @ 2020-11-03 6:43 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang Currently, driver does not supports parse UDP outer checksum flag of tunneled packets. When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' commands to enable hardware UDP outer checksum. This patch supports parse UDP outer checksum flag of tunneled packets. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- v4: - cover AVX512 vector path. v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. - rename some variable name. v2: - cover AVX2 and SSE vector path drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++----- 4 files changed, 228 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..24a7caeb98 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..80a17c450a 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x3); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index e5e7cc1482..a4a40be233 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x3); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..8baac94806 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3); + __m128i l3_flags = _mm_and_si128(flags, l3_mask); + flags = _mm_or_si128(l3_flags, l4_outer_flags); + /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown 2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang @ 2020-11-03 6:43 ` Murphy Yang 2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang 1 sibling, 0 replies; 14+ messages in thread From: Murphy Yang @ 2020-11-03 6:43 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang Currently, driver does not supports parse UDP outer checksum flag of tunneled packets. When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' commands to enable hardware UDP outer checksum. This patch supports parse UDP outer checksum flag of tunneled packets. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++----- 4 files changed, 228 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..24a7caeb98 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..80a17c450a 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x3); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index e5e7cc1482..a4a40be233 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x3); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x3); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..8baac94806 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x3, 0x3, 0x3, 0x3); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_mask = _mm_set_epi32(~0x3, ~0x3, ~0x3, ~0x3); + __m128i l3_flags = _mm_and_si128(flags, l3_mask); + flags = _mm_or_si128(l3_flags, l4_outer_flags); + /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v5] net/ice: fix outher chksum on cvl unknown 2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang 2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang @ 2020-11-04 9:26 ` Murphy Yang 2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang 1 sibling, 1 reply; 14+ messages in thread From: Murphy Yang @ 2020-11-04 9:26 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang Currently, driver does not supports parse UDP outer checksum flag of tunneled packets. When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' commands to enable hardware UDP outer checksum. This patch supports parse UDP outer checksum flag of tunneled packets. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- v5: - fix outer L4 checksum mask for vector path. v4: - cover AVX512 vector path. v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. - rename some variable name. v2: - cover AVX2 and SSE vector path drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 117 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 116 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 75 ++++++++++++----- 4 files changed, 228 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..24a7caeb98 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..2802c4caae 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,86 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +512,16 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x6); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index e5e7cc1482..9cda91a2d8 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -211,43 +211,86 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* second 128-bits */ + /** shift right 20 bits to make sure it not exceed 255 and + * use the low two bits to indicate outer checksum status + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -432,6 +475,15 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_mask = _mm256_set1_epi32(~0x6); + __m256i l3_flags = _mm256_and_si256(l3_l4_flags, l3_mask); + l3_l4_flags = _mm256_or_si256(l3_flags, l4_outer_flags); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..658148aa8a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,63 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /* shift right 1 bit to make sure it not exceed 255 */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +190,15 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); + __m128i l3_flags = _mm_and_si128(flags, l3_mask); + flags = _mm_or_si128(l3_flags, l4_outer_flags); + /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +250,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown 2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang @ 2020-11-09 6:06 ` Murphy Yang 2020-11-13 3:16 ` Xie, WeiX ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Murphy Yang @ 2020-11-09 6:06 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang Currently, driver does not support parse UDP outer checksum flag of tunneled packets. When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' commands to enable hardware UDP outer checksum. This patch supports parse UDP outer checksum flag of tunneled packets. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- v6: - rename variable name. - update comments. v5: - fix outer L4 checksum mask for vector path. v4: - cover AVX512 vector path. v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. - rename variable name. v2: - cover AVX2 and SSE vector path drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++----- 4 files changed, 233 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..24a7caeb98 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..7838e17787 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index e5e7cc1482..d33ef2a042 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -211,43 +211,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -432,6 +477,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..f8b3574e36 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask); + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags); /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown 2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang @ 2020-11-13 3:16 ` Xie, WeiX 2020-11-13 5:33 ` Zhang, Qi Z 2020-11-13 11:51 ` Ferruh Yigit 2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang 2 siblings, 1 reply; 14+ messages in thread From: Xie, WeiX @ 2020-11-13 3:16 UTC (permalink / raw) To: Yang, MurphyX, dev Cc: Yang, Qiming, Zhang, Qi Z, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yang, MurphyX Tested-by: Xie,WeiX < weix.xie@intel.com> Regards, Xie Wei > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang > Sent: Monday, November 9, 2020 2:07 PM > To: dev@dpdk.org > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi > <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, > MurphyX <murphyx.yang@intel.com> > Subject: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown > > Currently, driver does not support parse UDP outer checksum flag of > tunneled packets. > > When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' > commands to enable hardware UDP outer checksum. This patch supports > parse UDP outer checksum flag of tunneled packets. > > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") > > Signed-off-by: Murphy Yang <murphyx.yang@intel.com> > --- > v6: > - rename variable name. > - update comments. > v5: > - fix outer L4 checksum mask for vector path. > v4: > - cover AVX512 vector path. > v3: > - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. > - rename variable name. > v2: > - cover AVX2 and SSE vector path > > drivers/net/ice/ice_rxtx.c | 5 ++ > drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++------- > drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++------- > drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++----- > 4 files changed, 233 insertions(+), 85 deletions(-) > > diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index > 5fbd68eafc..24a7caeb98 100644 > --- a/drivers/net/ice/ice_rxtx.c > +++ b/drivers/net/ice/ice_rxtx.c > @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) > if (unlikely(stat_err0 & (1 << > ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) > flags |= PKT_RX_EIP_CKSUM_BAD; > > + if (unlikely(stat_err0 & (1 << > ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) > + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; > + else > + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; > + > return flags; > } > > diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c > b/drivers/net/ice/ice_rxtx_vec_avx2.c > index b72a9e7025..7838e17787 100644 > --- a/drivers/net/ice/ice_rxtx_vec_avx2.c > +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c > @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct > ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, > * bit13 is for VLAN indication. > */ > const __m256i flags_mask = > - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); > + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); > /** > * data to be shuffled by the result of the flags mask shifted by 4 > * bits. This gives use the l3_l4 flags. > */ > - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, > 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - /* second 128-bits */ > - 0, 0, 0, 0, 0, 0, 0, 0, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m256i l3_l4_flags_shuf = > + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 > | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * second 128-bits > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > const __m256i cksum_mask = > - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD | > - PKT_RX_L4_CKSUM_GOOD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_EIP_CKSUM_BAD); > + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | > + PKT_RX_L4_CKSUM_MASK | > + PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_OUTER_L4_CKSUM_MASK); > /** > * data to be shuffled by result of flag mask, shifted down 12. > * If RSS(bit12)/VLAN(bit13) are set, > @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue > *rxq, struct rte_mbuf **rx_pkts, > __m256i l3_l4_flags = > _mm256_shuffle_epi8(l3_l4_flags_shuf, > _mm256_srli_epi32(flag_bits, 4)); > l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); > + > + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); > + __m256i l4_outer_flags = > + _mm256_and_si256(l3_l4_flags, > l4_outer_mask); > + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); > + > + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); > + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); > + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); > l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); > /* set rss and vlan flags */ > const __m256i rss_vlan_flag_bits = > diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c > b/drivers/net/ice/ice_rxtx_vec_avx512.c > index e5e7cc1482..d33ef2a042 100644 > --- a/drivers/net/ice/ice_rxtx_vec_avx512.c > +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c > @@ -211,43 +211,88 @@ _ice_recv_raw_pkts_vec_avx512(struct > ice_rx_queue *rxq, > * bit13 is for VLAN indication. > */ > const __m256i flags_mask = > - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); > + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); > /** > * data to be shuffled by the result of the flags mask shifted by 4 > * bits. This gives use the l3_l4 flags. > */ > - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, > 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - /* 2nd 128-bits */ > - 0, 0, 0, 0, 0, 0, 0, 0, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m256i l3_l4_flags_shuf = > + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 > | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * second 128-bits > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > const __m256i cksum_mask = > - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD | > - PKT_RX_L4_CKSUM_GOOD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_EIP_CKSUM_BAD); > + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | > + PKT_RX_L4_CKSUM_MASK | > + PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_OUTER_L4_CKSUM_MASK); > /** > * data to be shuffled by result of flag mask, shifted down 12. > * If RSS(bit12)/VLAN(bit13) are set, > @@ -432,6 +477,14 @@ _ice_recv_raw_pkts_vec_avx512(struct > ice_rx_queue *rxq, > __m256i l3_l4_flags = > _mm256_shuffle_epi8(l3_l4_flags_shuf, > _mm256_srli_epi32(flag_bits, 4)); > l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); > + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); > + __m256i l4_outer_flags = > + _mm256_and_si256(l3_l4_flags, > l4_outer_mask); > + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); > + > + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); > + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); > + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); > l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); > /* set rss and vlan flags */ > const __m256i rss_vlan_flag_bits = > diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c > b/drivers/net/ice/ice_rxtx_vec_sse.c > index 626364719b..f8b3574e36 100644 > --- a/drivers/net/ice/ice_rxtx_vec_sse.c > +++ b/drivers/net/ice/ice_rxtx_vec_sse.c > @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > * bit12 for RSS indication. > * bit13 for VLAN indication. > */ > - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, > - 0x3070, 0x3070); > - > + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, > + 0x30f0, 0x30f0); > const __m128i cksum_mask = > _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD); > > /* map the checksum, rss and vlan fields to the checksum, rss > * and vlan flag > */ > - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m128i cksum_flags = > + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > > const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, > 0, 0, 0, 0, > @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); > /* then we shift left 1 bit */ > flags = _mm_slli_epi32(flags, 1); > + > + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); > + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); > + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); > + > + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); > + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask); > + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags); > /* we need to mask out the reduntant bits introduced by RSS or > * VLAN fields. > */ > @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > * appropriate flags means that we have to do a shift and blend for > * each mbuf before we do the write. > */ > - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), > 0x10); > - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), > 0x10); > - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); > - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), > 0x10); > + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), > 0x04); > + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), > 0x04); > + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); > + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), > 0x04); > > /* write the rearm data and the olflags in one write */ > RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != > -- > 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown 2020-11-13 3:16 ` Xie, WeiX @ 2020-11-13 5:33 ` Zhang, Qi Z 0 siblings, 0 replies; 14+ messages in thread From: Zhang, Qi Z @ 2020-11-13 5:33 UTC (permalink / raw) To: Xie, WeiX, Yang, MurphyX, dev Cc: Yang, Qiming, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yang, MurphyX > -----Original Message----- > From: Xie, WeiX <weix.xie@intel.com> > Sent: Friday, November 13, 2020 11:16 AM > To: Yang, MurphyX <murphyx.yang@intel.com>; dev@dpdk.org > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi > <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, > MurphyX <murphyx.yang@intel.com> > Subject: RE: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown > > Tested-by: Xie,WeiX < weix.xie@intel.com> > > Regards, > Xie Wei > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang > > Sent: Monday, November 9, 2020 2:07 PM > > To: dev@dpdk.org > > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z > > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, > > Leyi <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, > > MurphyX <murphyx.yang@intel.com> > > Subject: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl > > unknown > > > > Currently, driver does not support parse UDP outer checksum flag of > > tunneled packets. > > > > When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' > > commands to enable hardware UDP outer checksum. This patch supports > > parse UDP outer checksum flag of tunneled packets. > > > > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") > > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX > > path") > > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE > > path") > > > > Signed-off-by: Murphy Yang <murphyx.yang@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com> Applied to dpdk-next-net-intel. Thanks Qi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v6] net/ice: fix outer checksum on cvl unknown 2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang 2020-11-13 3:16 ` Xie, WeiX @ 2020-11-13 11:51 ` Ferruh Yigit 2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang 2 siblings, 0 replies; 14+ messages in thread From: Ferruh Yigit @ 2020-11-13 11:51 UTC (permalink / raw) To: Murphy Yang, dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu On 11/9/2020 6:06 AM, Murphy Yang wrote: > Currently, driver does not support parse UDP outer checksum flag of > tunneled packets. > > When execute 'csum set outer-udp hw 0' and 'csum parse-tunnel on 0' > commands to enable hardware UDP outer checksum. This patch supports > parse UDP outer checksum flag of tunneled packets. > > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") > > Signed-off-by: Murphy Yang <murphyx.yang@intel.com> <...> > @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], > * appropriate flags means that we have to do a shift and blend for > * each mbuf before we do the write. > */ > - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); > - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); > - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); > - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); > + rearm0 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 8), 0x04); > + rearm1 = _mm_blend_epi32(mbuf_init, _mm_slli_si128(flags, 4), 0x04); > + rearm2 = _mm_blend_epi32(mbuf_init, flags, 0x04); > + rearm3 = _mm_blend_epi32(mbuf_init, _mm_srli_si128(flags, 4), 0x04); Hi Murphy, This change is in the 'ice_rxtx_vec_sse.c' file, but is the '_mm_blend_epi32' intrinsic, an SSE intrinsic? Since it is causing a compile error with default target, you can test with './devtools/test-meson-builds.sh' script. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown 2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang 2020-11-13 3:16 ` Xie, WeiX 2020-11-13 11:51 ` Ferruh Yigit @ 2020-11-23 8:34 ` Murphy Yang 2020-11-23 8:56 ` Xie, WeiX 2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang 2 siblings, 2 replies; 14+ messages in thread From: Murphy Yang @ 2020-11-23 8:34 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, Murphy Yang When received tunneled packets, the testpmd output log shows 'ol_flags' value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is 'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'. Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path, 'l3_l4_flags_shuf' for AVX2 and AVX512 vector path and 'cksum_flags' for SSE vector path to ensure that the 'ol_flags' can match correct flags. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- v7: - fix compile error with default target on SSE vector path. v6: - rename variable name. - update comments. v5: - fix outer L4 checksum mask for vector path. v4: - cover AVX512 vector path. v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. - rename variable name. v2: - cover AVX2 and SSE vector path drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++----- 4 files changed, 233 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..24a7caeb98 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..7838e17787 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index df5d2be1e6..fd5d724329 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..87e0c3db2e 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask); + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags); /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x30); + rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x30); + rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30); + rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x30); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown 2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang @ 2020-11-23 8:56 ` Xie, WeiX 2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang 1 sibling, 0 replies; 14+ messages in thread From: Xie, WeiX @ 2020-11-23 8:56 UTC (permalink / raw) To: Yang, MurphyX, dev Cc: Yang, Qiming, Zhang, Qi Z, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yang, MurphyX Tested-by: Xie,WeiX < weix.xie@intel.com> Regards, Xie Wei > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Murphy Yang > Sent: Monday, November 23, 2020 4:35 PM > To: dev@dpdk.org > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi > <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, > MurphyX <murphyx.yang@intel.com> > Subject: [dpdk-dev] [PATCH v7] net/ice: fix outer checksum on cvl unknown > > When received tunneled packets, the testpmd output log shows 'ol_flags' > value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected > value is 'PKT_RX_OUTER_L4_CKSUM_GOOD' or > 'PKT_RX_OUTER_L4_CKSUM_BAD'. > > Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and > 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path, > 'l3_l4_flags_shuf' for AVX2 and AVX512 vector path and 'cksum_flags' for SSE > vector path to ensure that the 'ol_flags' > can match correct flags. > > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") > > Signed-off-by: Murphy Yang <murphyx.yang@intel.com> > --- > v7: > - fix compile error with default target on SSE vector path. > v6: > - rename variable name. > - update comments. > v5: > - fix outer L4 checksum mask for vector path. > v4: > - cover AVX512 vector path. > v3: > - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. > - rename variable name. > v2: > - cover AVX2 and SSE vector path > drivers/net/ice/ice_rxtx.c | 5 ++ > drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++------- > drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++------- > drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++----- > 4 files changed, 233 insertions(+), 85 deletions(-) > > diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index > 5fbd68eafc..24a7caeb98 100644 > --- a/drivers/net/ice/ice_rxtx.c > +++ b/drivers/net/ice/ice_rxtx.c > @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) > if (unlikely(stat_err0 & (1 << > ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) > flags |= PKT_RX_EIP_CKSUM_BAD; > > + if (unlikely(stat_err0 & (1 << > ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) > + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; > + else > + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; > + > return flags; > } > > diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c > b/drivers/net/ice/ice_rxtx_vec_avx2.c > index b72a9e7025..7838e17787 100644 > --- a/drivers/net/ice/ice_rxtx_vec_avx2.c > +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c > @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct > ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, > * bit13 is for VLAN indication. > */ > const __m256i flags_mask = > - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); > + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); > /** > * data to be shuffled by the result of the flags mask shifted by 4 > * bits. This gives use the l3_l4 flags. > */ > - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, > 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - /* second 128-bits */ > - 0, 0, 0, 0, 0, 0, 0, 0, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m256i l3_l4_flags_shuf = > + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 > | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * second 128-bits > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > const __m256i cksum_mask = > - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD | > - PKT_RX_L4_CKSUM_GOOD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_EIP_CKSUM_BAD); > + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | > + PKT_RX_L4_CKSUM_MASK | > + PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_OUTER_L4_CKSUM_MASK); > /** > * data to be shuffled by result of flag mask, shifted down 12. > * If RSS(bit12)/VLAN(bit13) are set, > @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue > *rxq, struct rte_mbuf **rx_pkts, > __m256i l3_l4_flags = > _mm256_shuffle_epi8(l3_l4_flags_shuf, > _mm256_srli_epi32(flag_bits, 4)); > l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); > + > + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); > + __m256i l4_outer_flags = > + _mm256_and_si256(l3_l4_flags, > l4_outer_mask); > + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); > + > + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); > + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); > + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); > l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); > /* set rss and vlan flags */ > const __m256i rss_vlan_flag_bits = > diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c > b/drivers/net/ice/ice_rxtx_vec_avx512.c > index df5d2be1e6..fd5d724329 100644 > --- a/drivers/net/ice/ice_rxtx_vec_avx512.c > +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c > @@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct > ice_rx_queue *rxq, > * bit13 is for VLAN indication. > */ > const __m256i flags_mask = > - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); > + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); > /** > * data to be shuffled by the result of the flags mask shifted by 4 > * bits. This gives use the l3_l4 flags. > */ > - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, > 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - /* 2nd 128-bits */ > - 0, 0, 0, 0, 0, 0, 0, 0, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m256i l3_l4_flags_shuf = > + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 > | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * second 128-bits > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > const __m256i cksum_mask = > - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD | > - PKT_RX_L4_CKSUM_GOOD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_EIP_CKSUM_BAD); > + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | > + PKT_RX_L4_CKSUM_MASK | > + PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_OUTER_L4_CKSUM_MASK); > /** > * data to be shuffled by result of flag mask, shifted down 12. > * If RSS(bit12)/VLAN(bit13) are set, > @@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct > ice_rx_queue *rxq, > __m256i l3_l4_flags = > _mm256_shuffle_epi8(l3_l4_flags_shuf, > _mm256_srli_epi32(flag_bits, 4)); > l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); > + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); > + __m256i l4_outer_flags = > + _mm256_and_si256(l3_l4_flags, > l4_outer_mask); > + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); > + > + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); > + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); > + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); > l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); > /* set rss and vlan flags */ > const __m256i rss_vlan_flag_bits = > diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c > b/drivers/net/ice/ice_rxtx_vec_sse.c > index 626364719b..87e0c3db2e 100644 > --- a/drivers/net/ice/ice_rxtx_vec_sse.c > +++ b/drivers/net/ice/ice_rxtx_vec_sse.c > @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > * bit12 for RSS indication. > * bit13 for VLAN indication. > */ > - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, > - 0x3070, 0x3070); > - > + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, > + 0x30f0, 0x30f0); > const __m128i cksum_mask = > _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD, > PKT_RX_IP_CKSUM_MASK | > PKT_RX_L4_CKSUM_MASK | > + > PKT_RX_OUTER_L4_CKSUM_MASK | > PKT_RX_EIP_CKSUM_BAD); > > /* map the checksum, rss and vlan fields to the checksum, rss > * and vlan flag > */ > - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, > - /* shift right 1 bit to make sure it not exceed 255 */ > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_BAD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_EIP_CKSUM_BAD | > PKT_RX_L4_CKSUM_GOOD | > - PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_BAD | > PKT_RX_IP_CKSUM_GOOD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_BAD) >> 1, > - (PKT_RX_L4_CKSUM_GOOD | > PKT_RX_IP_CKSUM_GOOD) >> 1); > + const __m128i cksum_flags = > + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + /** > + * shift right 20 bits to use the low two bits to indicate > + * outer checksum status > + * shift right 1 bit to make sure it not exceed 255 > + */ > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_EIP_CKSUM_BAD | > + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> > 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_BAD | > + PKT_RX_IP_CKSUM_GOOD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_BAD) >> 1, > + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | > PKT_RX_L4_CKSUM_GOOD | > + PKT_RX_IP_CKSUM_GOOD) >> 1); > > const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, > 0, 0, 0, 0, > @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); > /* then we shift left 1 bit */ > flags = _mm_slli_epi32(flags, 1); > + > + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); > + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); > + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); > + > + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); > + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask); > + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags); > /* we need to mask out the reduntant bits introduced by RSS or > * VLAN fields. > */ > @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, > __m128i descs[4], > * appropriate flags means that we have to do a shift and blend for > * each mbuf before we do the write. > */ > - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), > 0x10); > - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), > 0x10); > - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); > - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), > 0x10); > + rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), > 0x30); > + rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), > 0x30); > + rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30); > + rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), > 0x30); > > /* write the rearm data and the olflags in one write */ > RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != > -- > 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown 2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang 2020-11-23 8:56 ` Xie, WeiX @ 2020-12-15 8:10 ` Murphy Yang 2020-12-15 12:11 ` Zhang, Qi Z 1 sibling, 1 reply; 14+ messages in thread From: Murphy Yang @ 2020-12-15 8:10 UTC (permalink / raw) To: dev Cc: qiming.yang, qi.z.zhang, stevex.yang, leyi.rong, wenzhuo.lu, ferruh.yigit, Murphy Yang When received tunneled packets, the testpmd output log shows 'ol_flags' value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is 'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'. Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path, 'l3_l4_flags_shuf' for AVX2 and AVX512 vector path and 'cksum_flags' for SSE vector path to ensure that the 'ol_flags' can match correct flags. Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") Signed-off-by: Murphy Yang <murphyx.yang@intel.com> --- v8: - tune the commit title. v7: - fix compile error with default target on SSE vector path. v6: - rename variable name. - update comments. v5: - fix outer L4 checksum mask for vector path. v4: - cover AVX512 vector path. v3: - add PKT_RX_OUTER_L4_CKSUM_GOOD in AVX2 and SSE vector path. - rename variable name. v2: - cover AVX2 and SSE vector path drivers/net/ice/ice_rxtx.c | 5 ++ drivers/net/ice/ice_rxtx_vec_avx2.c | 118 +++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_avx512.c | 117 ++++++++++++++++++------- drivers/net/ice/ice_rxtx_vec_sse.c | 78 ++++++++++++----- 4 files changed, 233 insertions(+), 85 deletions(-) diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 9769e216bf..d052bd0f1b 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -1451,6 +1451,11 @@ ice_rxd_error_to_pkt_flags(uint16_t stat_err0) if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))) flags |= PKT_RX_EIP_CKSUM_BAD; + if (unlikely(stat_err0 & (1 << ICE_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_S))) + flags |= PKT_RX_OUTER_L4_CKSUM_BAD; + else + flags |= PKT_RX_OUTER_L4_CKSUM_GOOD; + return flags; } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index b72a9e7025..7838e17787 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -251,43 +251,88 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* second 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -469,6 +514,15 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index df5d2be1e6..fd5d724329 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -230,43 +230,88 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, * bit13 is for VLAN indication. */ const __m256i flags_mask = - _mm256_set1_epi32((7 << 4) | (1 << 12) | (1 << 13)); + _mm256_set1_epi32((0xF << 4) | (1 << 12) | (1 << 13)); /** * data to be shuffled by the result of the flags mask shifted by 4 * bits. This gives use the l3_l4 flags. */ - const __m256i l3_l4_flags_shuf = _mm256_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, - /* 2nd 128-bits */ - 0, 0, 0, 0, 0, 0, 0, 0, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m256i l3_l4_flags_shuf = + _mm256_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * second 128-bits + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m256i cksum_mask = - _mm256_set1_epi32(PKT_RX_IP_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD | - PKT_RX_L4_CKSUM_GOOD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_EIP_CKSUM_BAD); + _mm256_set1_epi32(PKT_RX_IP_CKSUM_MASK | + PKT_RX_L4_CKSUM_MASK | + PKT_RX_EIP_CKSUM_BAD | + PKT_RX_OUTER_L4_CKSUM_MASK); /** * data to be shuffled by result of flag mask, shifted down 12. * If RSS(bit12)/VLAN(bit13) are set, @@ -451,6 +496,14 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i l3_l4_flags = _mm256_shuffle_epi8(l3_l4_flags_shuf, _mm256_srli_epi32(flag_bits, 4)); l3_l4_flags = _mm256_slli_epi32(l3_l4_flags, 1); + __m256i l4_outer_mask = _mm256_set1_epi32(0x6); + __m256i l4_outer_flags = + _mm256_and_si256(l3_l4_flags, l4_outer_mask); + l4_outer_flags = _mm256_slli_epi32(l4_outer_flags, 20); + + __m256i l3_l4_mask = _mm256_set1_epi32(~0x6); + l3_l4_flags = _mm256_and_si256(l3_l4_flags, l3_l4_mask); + l3_l4_flags = _mm256_or_si256(l3_l4_flags, l4_outer_flags); l3_l4_flags = _mm256_and_si256(l3_l4_flags, cksum_mask); /* set rss and vlan flags */ const __m256i rss_vlan_flag_bits = diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index 626364719b..87e0c3db2e 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -114,39 +114,67 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * bit12 for RSS indication. * bit13 for VLAN indication. */ - const __m128i desc_mask = _mm_set_epi32(0x3070, 0x3070, - 0x3070, 0x3070); - + const __m128i desc_mask = _mm_set_epi32(0x30f0, 0x30f0, + 0x30f0, 0x30f0); const __m128i cksum_mask = _mm_set_epi32(PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD, PKT_RX_IP_CKSUM_MASK | PKT_RX_L4_CKSUM_MASK | + PKT_RX_OUTER_L4_CKSUM_MASK | PKT_RX_EIP_CKSUM_BAD); /* map the checksum, rss and vlan fields to the checksum, rss * and vlan flag */ - const __m128i cksum_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, - /* shift right 1 bit to make sure it not exceed 255 */ - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_GOOD | - PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, - (PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1); + const __m128i cksum_flags = + _mm_set_epi8((PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | + PKT_RX_EIP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_BAD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + /** + * shift right 20 bits to use the low two bits to indicate + * outer checksum status + * shift right 1 bit to make sure it not exceed 255 + */ + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_BAD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_EIP_CKSUM_BAD | + PKT_RX_L4_CKSUM_GOOD | PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_BAD | + PKT_RX_IP_CKSUM_GOOD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_BAD) >> 1, + (PKT_RX_OUTER_L4_CKSUM_GOOD >> 20 | PKT_RX_L4_CKSUM_GOOD | + PKT_RX_IP_CKSUM_GOOD) >> 1); const __m128i rss_vlan_flags = _mm_set_epi8(0, 0, 0, 0, 0, 0, 0, 0, @@ -166,6 +194,14 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], flags = _mm_shuffle_epi8(cksum_flags, tmp_desc); /* then we shift left 1 bit */ flags = _mm_slli_epi32(flags, 1); + + __m128i l4_outer_mask = _mm_set_epi32(0x6, 0x6, 0x6, 0x6); + __m128i l4_outer_flags = _mm_and_si128(flags, l4_outer_mask); + l4_outer_flags = _mm_slli_epi32(l4_outer_flags, 20); + + __m128i l3_l4_mask = _mm_set_epi32(~0x6, ~0x6, ~0x6, ~0x6); + __m128i l3_l4_flags = _mm_and_si128(flags, l3_l4_mask); + flags = _mm_or_si128(l3_l4_flags, l4_outer_flags); /* we need to mask out the reduntant bits introduced by RSS or * VLAN fields. */ @@ -217,10 +253,10 @@ ice_rx_desc_to_olflags_v(struct ice_rx_queue *rxq, __m128i descs[4], * appropriate flags means that we have to do a shift and blend for * each mbuf before we do the write. */ - rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x10); - rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x10); - rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x10); - rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x10); + rearm0 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 8), 0x30); + rearm1 = _mm_blend_epi16(mbuf_init, _mm_slli_si128(flags, 4), 0x30); + rearm2 = _mm_blend_epi16(mbuf_init, flags, 0x30); + rearm3 = _mm_blend_epi16(mbuf_init, _mm_srli_si128(flags, 4), 0x30); /* write the rearm data and the olflags in one write */ RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown 2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang @ 2020-12-15 12:11 ` Zhang, Qi Z 0 siblings, 0 replies; 14+ messages in thread From: Zhang, Qi Z @ 2020-12-15 12:11 UTC (permalink / raw) To: Yang, MurphyX, dev Cc: Yang, Qiming, Yang, SteveX, Rong, Leyi, Lu, Wenzhuo, Yigit, Ferruh, Yang, MurphyX > -----Original Message----- > From: Murphy Yang <murphyx.yang@intel.com> > Sent: Tuesday, December 15, 2020 4:11 PM > To: dev@dpdk.org > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhang, Qi Z > <qi.z.zhang@intel.com>; Yang, SteveX <stevex.yang@intel.com>; Rong, Leyi > <leyi.rong@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yigit, Ferruh > <ferruh.yigit@intel.com>; Yang, MurphyX <murphyx.yang@intel.com> > Subject: [PATCH v8] net/ice: fix outer checksum unknown > > When received tunneled packets, the testpmd output log shows 'ol_flags' > value always is 'PKT_RX_OUTER_L4_CKSUM_UNKNOWN', but expected value is > 'PKT_RX_OUTER_L4_CKSUM_GOOD' or 'PKT_RX_OUTER_L4_CKSUM_BAD'. > > Add the 'PKT_RX_OUTER_L4_CKSUM_GOOD' and > 'PKT_RX_OUTER_L4_CKSUM_BAD' to 'flags' for normal path, 'l3_l4_flags_shuf' > for AVX2 and AVX512 vector path and 'cksum_flags' for SSE vector path to > ensure that the 'ol_flags' > can match correct flags. > > Fixes: dbf3c0e77a22 ("net/ice: handle Rx flex descriptor") > Fixes: 4ab7dbb0a0f6 ("net/ice: switch to Rx flexible descriptor in AVX path") > Fixes: ece1f8a8f1c8 ("net/ice: switch to flexible descriptor in SSE path") > > Signed-off-by: Murphy Yang <murphyx.yang@intel.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com> Applied to dpdk-next-net-intel. Thanks Qi ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-12-15 12:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-14 8:34 [dpdk-dev] [PATCH] net/ice: fix outher chksum on cvl unknown murphy yang 2020-10-16 5:07 ` [dpdk-dev] [PATCH 1/1] " murphy yang 2020-10-27 6:09 ` [dpdk-dev] [PATCH v3] " Murphy Yang 2020-11-03 6:43 ` [dpdk-dev] [PATCH v4] " Murphy Yang 2020-11-03 6:43 ` [dpdk-dev] [PATCH] " Murphy Yang 2020-11-04 9:26 ` [dpdk-dev] [PATCH v5] " Murphy Yang 2020-11-09 6:06 ` [dpdk-dev] [PATCH v6] net/ice: fix outer checksum " Murphy Yang 2020-11-13 3:16 ` Xie, WeiX 2020-11-13 5:33 ` Zhang, Qi Z 2020-11-13 11:51 ` Ferruh Yigit 2020-11-23 8:34 ` [dpdk-dev] [PATCH v7] " Murphy Yang 2020-11-23 8:56 ` Xie, WeiX 2020-12-15 8:10 ` [dpdk-dev] [PATCH v8] net/ice: fix outer checksum unknown Murphy Yang 2020-12-15 12:11 ` Zhang, Qi Z
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).