* [PATCH] net/mlx5: fix error packets drop in the regular Rx
@ 2024-03-11 18:14 Viacheslav Ovsiienko
  2024-04-18 12:16 ` Kevin Traynor
  0 siblings, 1 reply; 5+ messages in thread
From: Viacheslav Ovsiienko @ 2024-03-11 18:14 UTC (permalink / raw)
  To: stable; +Cc: bluca, ktraynor, christian.ehrhardt, xuemingl
[ upstream commit ef296e8f6140ea469b50c7bfe73501b1c9ef86e1 ]
When packet gets received with error it is reported in CQE
structure and PMD analyzes the error syndrome and provides
two options - either reset the entire queue for the critical
errors, or just ignore the packet.
The non-vectorized rx_burst did not ignore the non-critical
error packets, and in case of packet length exceeding the
mbuf data buffer length it took the next element in the queue
WQE ring, resulting in CQE/WQE consume indices synchronization
lost.
Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 5bf1a679b2..cc087348a4 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -613,7 +613,8 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec,
  * @param mprq
  *   Indication if it is called from MPRQ.
  * @return
- *   0 in case of empty CQE, MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
+ *   0 in case of empty CQE,
+ *   MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
  *   MLX5_CRITICAL_ERROR_CQE_RET in case of error CQE lead to Rx queue reset,
  *   otherwise the packet size in regular RxQ,
  *   and striding byte count format in mprq case.
@@ -697,6 +698,11 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
 					if (ret == MLX5_RECOVERY_ERROR_RET ||
 						ret == MLX5_RECOVERY_COMPLETED_RET)
 						return MLX5_CRITICAL_ERROR_CQE_RET;
+					if (!mprq && ret == MLX5_RECOVERY_IGNORE_RET) {
+						*skip_cnt = 1;
+						++rxq->cq_ci;
+						return MLX5_ERROR_CQE_MASK;
+					}
 				} else {
 					return 0;
 				}
@@ -971,19 +977,18 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
 			len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, &mcqe, &skip_cnt, false);
 			if (unlikely(len & MLX5_ERROR_CQE_MASK)) {
+				/* We drop packets with non-critical errors */
+				rte_mbuf_raw_free(rep);
 				if (len == MLX5_CRITICAL_ERROR_CQE_RET) {
-					rte_mbuf_raw_free(rep);
 					rq_ci = rxq->rq_ci << sges_n;
 					break;
 				}
+				/* Skip specified amount of error CQEs packets */
 				rq_ci >>= sges_n;
 				rq_ci += skip_cnt;
 				rq_ci <<= sges_n;
-				idx = rq_ci & wqe_mask;
-				wqe = &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[idx];
-				seg = (*rxq->elts)[idx];
-				cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
-				len = len & ~MLX5_ERROR_CQE_MASK;
+				MLX5_ASSERT(!pkt);
+				continue;
 			}
 			if (len == 0) {
 				rte_mbuf_raw_free(rep);
-- 
2.34.1
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH] net/mlx5: fix error packets drop in the regular Rx
  2024-03-11 18:14 [PATCH] net/mlx5: fix error packets drop in the regular Rx Viacheslav Ovsiienko
@ 2024-04-18 12:16 ` Kevin Traynor
  0 siblings, 0 replies; 5+ messages in thread
From: Kevin Traynor @ 2024-04-18 12:16 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, stable; +Cc: bluca, christian.ehrhardt, xuemingl
On 11/03/2024 18:14, Viacheslav Ovsiienko wrote:
> [ upstream commit ef296e8f6140ea469b50c7bfe73501b1c9ef86e1 ]
> 
> When packet gets received with error it is reported in CQE
> structure and PMD analyzes the error syndrome and provides
> two options - either reset the entire queue for the critical
> errors, or just ignore the packet.
> 
> The non-vectorized rx_burst did not ignore the non-critical
> error packets, and in case of packet length exceeding the
> mbuf data buffer length it took the next element in the queue
> WQE ring, resulting in CQE/WQE consume indices synchronization
> lost.
> 
> Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx queue")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5_rx.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
fyi - for 21.11 branch, I had already rebased and applied this. It seems
to be on 22.11 and 23.11 branches (or queued) also.
https://git.dpdk.org/dpdk-stable/commit/?h=21.11&id=c52e6e0ecda72ad163fc7757abe825105d7a16c8
^ permalink raw reply	[flat|nested] 5+ messages in thread
* RE: [PATCH] net/mlx5: fix error packets drop in the regular Rx
  2024-02-20 11:45 Viacheslav Ovsiienko
  2024-02-20 14:04 ` Dariusz Sosnowski
@ 2024-02-27 16:16 ` Raslan Darawsheh
  1 sibling, 0 replies; 5+ messages in thread
From: Raslan Darawsheh @ 2024-02-27 16:16 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Matan Azrad, Ori Kam, Dariusz Sosnowski, stable
Hi,
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, February 20, 2024 1:45 PM
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix error packets drop in the regular Rx
> 
> When packet gets received with error it is reported in CQE structure and PMD
> analyzes the error syndrome and provides two options - either reset the entire
> queue for the critical errors, or just ignore the packet.
> 
> The non-vectorized rx_burst did not ignore the non-critical error packets, and
> in case of packet length exceeding the mbuf data buffer length it took the next
> element in the queue WQE ring, resulting in CQE/WQE consume indices
> synchronization lost.
> 
> Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx
> queue")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply	[flat|nested] 5+ messages in thread
* RE: [PATCH] net/mlx5: fix error packets drop in the regular Rx
  2024-02-20 11:45 Viacheslav Ovsiienko
@ 2024-02-20 14:04 ` Dariusz Sosnowski
  2024-02-27 16:16 ` Raslan Darawsheh
  1 sibling, 0 replies; 5+ messages in thread
From: Dariusz Sosnowski @ 2024-02-20 14:04 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Matan Azrad, Raslan Darawsheh, Ori Kam, stable
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Tuesday, February 20, 2024 12:45
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@nvidia.com>; Raslan Darawsheh
> <rasland@nvidia.com>; Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> <dsosnowski@nvidia.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix error packets drop in the regular Rx
> 
> When packet gets received with error it is reported in CQE structure and PMD
> analyzes the error syndrome and provides two options - either reset the entire
> queue for the critical errors, or just ignore the packet.
> 
> The non-vectorized rx_burst did not ignore the non-critical error packets, and
> in case of packet length exceeding the mbuf data buffer length it took the next
> element in the queue WQE ring, resulting in CQE/WQE consume indices
> synchronization lost.
> 
> Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx
> queue")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Best regards,
Dariusz Sosnowski
^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH] net/mlx5: fix error packets drop in the regular Rx
@ 2024-02-20 11:45 Viacheslav Ovsiienko
  2024-02-20 14:04 ` Dariusz Sosnowski
  2024-02-27 16:16 ` Raslan Darawsheh
  0 siblings, 2 replies; 5+ messages in thread
From: Viacheslav Ovsiienko @ 2024-02-20 11:45 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, orika, dsosnowski, stable
When packet gets received with error it is reported in CQE
structure and PMD analyzes the error syndrome and provides
two options - either reset the entire queue for the critical
errors, or just ignore the packet.
The non-vectorized rx_burst did not ignore the non-critical
error packets, and in case of packet length exceeding the
mbuf data buffer length it took the next element in the queue
WQE ring, resulting in CQE/WQE consume indices synchronization
lost.
Fixes: aa67ed308458 ("net/mlx5: ignore non-critical syndromes for Rx queue")
Cc: stable@dpdk.org
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_rx.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 5bf1a679b2..cc087348a4 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -613,7 +613,8 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec,
  * @param mprq
  *   Indication if it is called from MPRQ.
  * @return
- *   0 in case of empty CQE, MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
+ *   0 in case of empty CQE,
+ *   MLX5_REGULAR_ERROR_CQE_RET in case of error CQE,
  *   MLX5_CRITICAL_ERROR_CQE_RET in case of error CQE lead to Rx queue reset,
  *   otherwise the packet size in regular RxQ,
  *   and striding byte count format in mprq case.
@@ -697,6 +698,11 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
 					if (ret == MLX5_RECOVERY_ERROR_RET ||
 						ret == MLX5_RECOVERY_COMPLETED_RET)
 						return MLX5_CRITICAL_ERROR_CQE_RET;
+					if (!mprq && ret == MLX5_RECOVERY_IGNORE_RET) {
+						*skip_cnt = 1;
+						++rxq->cq_ci;
+						return MLX5_ERROR_CQE_MASK;
+					}
 				} else {
 					return 0;
 				}
@@ -971,19 +977,18 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
 			cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
 			len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, &mcqe, &skip_cnt, false);
 			if (unlikely(len & MLX5_ERROR_CQE_MASK)) {
+				/* We drop packets with non-critical errors */
+				rte_mbuf_raw_free(rep);
 				if (len == MLX5_CRITICAL_ERROR_CQE_RET) {
-					rte_mbuf_raw_free(rep);
 					rq_ci = rxq->rq_ci << sges_n;
 					break;
 				}
+				/* Skip specified amount of error CQEs packets */
 				rq_ci >>= sges_n;
 				rq_ci += skip_cnt;
 				rq_ci <<= sges_n;
-				idx = rq_ci & wqe_mask;
-				wqe = &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[idx];
-				seg = (*rxq->elts)[idx];
-				cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask];
-				len = len & ~MLX5_ERROR_CQE_MASK;
+				MLX5_ASSERT(!pkt);
+				continue;
 			}
 			if (len == 0) {
 				rte_mbuf_raw_free(rep);
-- 
2.18.1
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-18 12:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 18:14 [PATCH] net/mlx5: fix error packets drop in the regular Rx Viacheslav Ovsiienko
2024-04-18 12:16 ` Kevin Traynor
  -- strict thread matches above, loose matches on Subject: below --
2024-02-20 11:45 Viacheslav Ovsiienko
2024-02-20 14:04 ` Dariusz Sosnowski
2024-02-27 16:16 ` Raslan Darawsheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).