From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EABE946BAE for ; Fri, 18 Jul 2025 21:38:25 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E14D040E42; Fri, 18 Jul 2025 21:38:25 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id 9ADF240E35 for ; Fri, 18 Jul 2025 21:38:24 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752867504; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FGA7ID2jbNZtjLajHBI4bsuppW5BVC+aTSBy0v5FDKw=; b=TaoUSG7t4fA12IIxiQxzUe4KIhAEuZRMXFSb/ZkX1okN22y6Lr8MUSzqVfAT4iQ0FWmTdP QZFHvj+i6VHs8G80B81gmPnsQmhgFGLVcC+5XvoiB0j5o5GEMYoVA7kh1f9rxUxxm3K8H/ VlLCBEslFY4ssADXik4kS0FZ2mUf1Q8= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-320-eM0DLc4eMQGGS9yDNv_8Nw-1; Fri, 18 Jul 2025 15:38:21 -0400 X-MC-Unique: eM0DLc4eMQGGS9yDNv_8Nw-1 X-Mimecast-MFC-AGG-ID: eM0DLc4eMQGGS9yDNv_8Nw_1752867499 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3B9D01800286; Fri, 18 Jul 2025 19:38:18 +0000 (UTC) Received: from rh.redhat.com (unknown [10.44.32.40]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0AC7218004AD; Fri, 18 Jul 2025 19:38:16 +0000 (UTC) From: Kevin Traynor To: Viacheslav Ovsiienko Cc: dpdk stable Subject: patch 'net/mlx5: fix out-of-order completions in ordinary Rx burst' has been queued to stable release 24.11.3 Date: Fri, 18 Jul 2025 20:31:31 +0100 Message-ID: <20250718193247.1008129-157-ktraynor@redhat.com> In-Reply-To: <20250718193247.1008129-1-ktraynor@redhat.com> References: <20250718193247.1008129-1-ktraynor@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: BZN4Ea90QgJe4xGbb2sVy_b_xCAbx9rDEPzU1BSXvVk_1752867499 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: 8bit content-type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 24.11.3 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 07/23/25. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/kevintraynor/dpdk-stable This queued commit can be viewed at: https://github.com/kevintraynor/dpdk-stable/commit/f22dca1f87a4cc856e9221aae5de4df58b19a7b3 Thanks. Kevin --- >From f22dca1f87a4cc856e9221aae5de4df58b19a7b3 Mon Sep 17 00:00:00 2001 From: Viacheslav Ovsiienko Date: Tue, 8 Jul 2025 13:46:41 +0300 Subject: [PATCH] net/mlx5: fix out-of-order completions in ordinary Rx burst [ upstream commit 5f9223611f3570c974b9c8e6c0b62db605fb3076 ] The existing Rx burst routines suppose the completions in CQ arrive in order and address the WQEs in receiving queue in order. That is not true for the shared RQs, CQEs can arrive in out of order and to address appropriate WQE we should fetch its index from the CQE wqe_counter field. Also, we can advance the RQ CI if and only if all the WQEs are handled in the covered range. This requires slide window to track handled WQEs. We support the out-of-order window size up to the full queue size. Fixes: 09c2555303be ("net/mlx5: support shared Rx queue") Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/linux/mlx5_verbs.c | 8 +- drivers/net/mlx5/mlx5_devx.c | 7 +- drivers/net/mlx5/mlx5_ethdev.c | 8 +- drivers/net/mlx5/mlx5_rx.c | 284 +++++++++++++++++++++++++++- drivers/net/mlx5/mlx5_rx.h | 28 ++- drivers/net/mlx5/mlx5_rxq.c | 11 +- 6 files changed, 334 insertions(+), 12 deletions(-) diff --git a/drivers/net/mlx5/linux/mlx5_verbs.c b/drivers/net/mlx5/linux/mlx5_verbs.c index 454bd7c77e..9011319a3e 100644 --- a/drivers/net/mlx5/linux/mlx5_verbs.c +++ b/drivers/net/mlx5/linux/mlx5_verbs.c @@ -398,5 +398,11 @@ mlx5_rxq_ibv_obj_new(struct mlx5_rxq_priv *rxq) rxq_data->rq_db = rwq.dbrec; rxq_data->cq_arm_sn = 0; - mlx5_rxq_initialize(rxq_data); + ret = mlx5_rxq_initialize(rxq_data); + if (ret) { + DRV_LOG(ERR, "Port %u Rx queue %u RQ initialization failure.", + priv->dev_data->port_id, rxq->idx); + rte_errno = ENOMEM; + goto error; + } rxq_data->cq_ci = 0; priv->dev_data->rx_queue_state[idx] = RTE_ETH_QUEUE_STATE_STARTED; diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c index b9d29ca7d5..f9081b0e30 100644 --- a/drivers/net/mlx5/mlx5_devx.c +++ b/drivers/net/mlx5/mlx5_devx.c @@ -710,5 +710,10 @@ mlx5_rxq_devx_obj_new(struct mlx5_rxq_priv *rxq) } if (!rxq_ctrl->started) { - mlx5_rxq_initialize(rxq_data); + if (mlx5_rxq_initialize(rxq_data)) { + DRV_LOG(ERR, "Port %u Rx queue %u RQ initialization failure.", + priv->dev_data->port_id, rxq->idx); + rte_errno = ENOMEM; + goto error; + } rxq_ctrl->wqn = rxq->devx_rq.rq->id; } diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index f2ae75a8e1..ddfe968a99 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -618,4 +618,5 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev, size_t *no_of_elements) if (dev->rx_pkt_burst == mlx5_rx_burst || + dev->rx_pkt_burst == mlx5_rx_burst_out_of_order || dev->rx_pkt_burst == mlx5_rx_burst_mprq || dev->rx_pkt_burst == mlx5_rx_burst_vec || @@ -688,5 +689,10 @@ mlx5_select_rx_function(struct rte_eth_dev *dev) MLX5_ASSERT(dev != NULL); - if (mlx5_check_vec_rx_support(dev) > 0) { + if (mlx5_shared_rq_enabled(dev)) { + rx_pkt_burst = mlx5_rx_burst_out_of_order; + DRV_LOG(DEBUG, "port %u forced to use SPRQ" + " Rx function with Out-of-Order completions", + dev->data->port_id); + } else if (mlx5_check_vec_rx_support(dev) > 0) { if (mlx5_mprq_enabled(dev)) { rx_pkt_burst = mlx5_rx_burst_mprq_vec; diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c index 5e58eb8bc9..0f2152fdb0 100644 --- a/drivers/net/mlx5/mlx5_rx.c +++ b/drivers/net/mlx5/mlx5_rx.c @@ -42,5 +42,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, uint16_t cqe_n, uint16_t cqe_mask, volatile struct mlx5_mini_cqe8 **mcqe, - uint16_t *skip_cnt, bool mprq); + uint16_t *skip_cnt, bool mprq, uint32_t *widx); static __rte_always_inline uint32_t @@ -221,4 +221,6 @@ mlx5_rx_burst_mode_get(struct rte_eth_dev *dev, if (pkt_burst == mlx5_rx_burst) { snprintf(mode->info, sizeof(mode->info), "%s", "Scalar"); + } else if (pkt_burst == mlx5_rx_burst_out_of_order) { + snprintf(mode->info, sizeof(mode->info), "%s", "Scalar Out-of-Order"); } else if (pkt_burst == mlx5_rx_burst_mprq) { snprintf(mode->info, sizeof(mode->info), "%s", "Multi-Packet RQ"); @@ -359,4 +361,75 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, } +static inline void mlx5_rq_win_reset(struct mlx5_rxq_data *rxq) +{ + static_assert(MLX5_WINOOO_BITS == (sizeof(*rxq->rq_win_data) * CHAR_BIT), + "Invalid out-of-order window bitwidth"); + rxq->rq_win_idx = 0; + rxq->rq_win_cnt = 0; + if (rxq->rq_win_data != NULL && rxq->rq_win_idx_mask != 0) + memset(rxq->rq_win_data, 0, (rxq->rq_win_idx_mask + 1) * sizeof(*rxq->rq_win_data)); +} + +static inline int mlx5_rq_win_init(struct mlx5_rxq_data *rxq) +{ + struct mlx5_rxq_ctrl *ctrl = container_of(rxq, struct mlx5_rxq_ctrl, rxq); + uint32_t win_size, win_mask; + + /* Set queue size as window size */ + win_size = 1u << rxq->elts_n; + win_size = RTE_MAX(win_size, MLX5_WINOOO_BITS); + win_size = win_size / MLX5_WINOOO_BITS; + win_mask = win_size - 1; + if (win_mask != rxq->rq_win_idx_mask || rxq->rq_win_data == NULL) { + mlx5_free(rxq->rq_win_data); + rxq->rq_win_idx_mask = 0; + rxq->rq_win_data = mlx5_malloc(MLX5_MEM_RTE, + win_size * sizeof(*rxq->rq_win_data), + RTE_CACHE_LINE_SIZE, ctrl->socket); + if (rxq->rq_win_data == NULL) + return -ENOMEM; + rxq->rq_win_idx_mask = (uint16_t)win_mask; + } + mlx5_rq_win_reset(rxq); + return 0; +} + +static inline bool mlx5_rq_win_test(struct mlx5_rxq_data *rxq) +{ + return !!rxq->rq_win_cnt; +} + +static inline void mlx5_rq_win_update(struct mlx5_rxq_data *rxq, uint32_t delta) +{ + uint32_t idx; + + idx = (delta / MLX5_WINOOO_BITS) + rxq->rq_win_idx; + idx &= rxq->rq_win_idx_mask; + rxq->rq_win_cnt = 1; + rxq->rq_win_data[idx] |= 1u << (delta % MLX5_WINOOO_BITS); +} + +static inline uint32_t mlx5_rq_win_advance(struct mlx5_rxq_data *rxq, uint32_t delta) +{ + uint32_t idx; + + idx = (delta / MLX5_WINOOO_BITS) + rxq->rq_win_idx; + idx &= rxq->rq_win_idx_mask; + rxq->rq_win_data[idx] |= 1u << (delta % MLX5_WINOOO_BITS); + ++rxq->rq_win_cnt; + if (delta >= MLX5_WINOOO_BITS) + return 0; + delta = 0; + while (~rxq->rq_win_data[idx] == 0) { + rxq->rq_win_data[idx] = 0; + MLX5_ASSERT(rxq->rq_win_cnt >= MLX5_WINOOO_BITS); + rxq->rq_win_cnt -= MLX5_WINOOO_BITS; + idx = (idx + 1) & rxq->rq_win_idx_mask; + rxq->rq_win_idx = idx; + delta += MLX5_WINOOO_BITS; + } + return delta; +} + /** * Initialize Rx WQ and indexes. @@ -365,5 +438,5 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, * Pointer to RX queue structure. */ -void +int mlx5_rxq_initialize(struct mlx5_rxq_data *rxq) { @@ -414,6 +487,10 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq) /* Update doorbell counter. */ rxq->rq_ci = wqe_n >> rxq->sges_n; + rxq->rq_ci_ooo = rxq->rq_ci; + if (mlx5_rq_win_init(rxq)) + return -ENOMEM; rte_io_wmb(); *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci); + return 0; } @@ -524,4 +601,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, rxq_ctrl->dump_file_n++; } + /* Try to find the actual cq_ci in hardware for shared queue. */ + if (rxq->shared) + rxq_sync_cq(rxq); rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_READY; /* Fall-through */ @@ -583,5 +663,6 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, &rxq->fake_mbuf; } - mlx5_rxq_initialize(rxq); + if (mlx5_rxq_initialize(rxq)) + return MLX5_RECOVERY_ERROR_RET; rxq->err_state = MLX5_RXQ_ERR_STATE_NO_ERROR; return MLX5_RECOVERY_COMPLETED_RET; @@ -613,4 +694,8 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, * @param mprq * Indication if it is called from MPRQ. + * @param[out] widx + * Store WQE index from CQE to support out of order completions. NULL + * can be specified if index is not needed + * * @return * 0 in case of empty CQE, @@ -624,5 +709,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, uint16_t cqe_n, uint16_t cqe_mask, volatile struct mlx5_mini_cqe8 **mcqe, - uint16_t *skip_cnt, bool mprq) + uint16_t *skip_cnt, bool mprq, uint32_t *widx) { struct rxq_zip *zip = &rxq->zip; @@ -640,4 +725,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, len = rte_be_to_cpu_32((*mc)[zip->ai & 7].byte_cnt & rxq->byte_mask); + if (widx != NULL) + *widx = zip->wqe_idx + zip->ai; *mcqe = &(*mc)[zip->ai & 7]; if (rxq->cqe_comp_layout) { @@ -693,4 +780,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, if (unlikely(ret == MLX5_CQE_STATUS_ERR || rxq->err_state)) { + /* We should try to track out-pf-order WQE */ + if (widx != NULL) + *widx = rte_be_to_cpu_16(cqe->wqe_counter); ret = mlx5_rx_err_handle(rxq, 0, 1, skip_cnt); if (ret == MLX5_CQE_STATUS_HW_OWN) @@ -737,4 +827,8 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, zip->ca = cq_ci; zip->na = zip->ca + 7; + if (widx != NULL) { + zip->wqe_idx = rte_be_to_cpu_16(cqe->wqe_counter); + *widx = zip->wqe_idx; + } /* Compute the next non compressed CQE. */ zip->cq_ci = rxq->cq_ci + zip->cqe_cnt; @@ -761,4 +855,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, ++rxq->cq_ci; len = rte_be_to_cpu_32(cqe->byte_cnt); + if (widx != NULL) + *widx = rte_be_to_cpu_16(cqe->wqe_counter); if (rxq->cqe_comp_layout) { volatile struct mlx5_cqe *next; @@ -976,5 +1072,6 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) if (!pkt) { cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask]; - len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, &mcqe, &skip_cnt, false); + len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, + &mcqe, &skip_cnt, false, NULL); if (unlikely(len & MLX5_ERROR_CQE_MASK)) { /* We drop packets with non-critical errors */ @@ -1062,4 +1159,179 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) } +/** + * DPDK callback for RX with Out-of-Order completions support. + * + * @param dpdk_rxq + * Generic pointer to RX queue structure. + * @param[out] pkts + * Array to store received packets. + * @param pkts_n + * Maximum number of packets in array. + * + * @return + * Number of packets successfully received (<= pkts_n). + */ +uint16_t +mlx5_rx_burst_out_of_order(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +{ + struct mlx5_rxq_data *rxq = dpdk_rxq; + const uint32_t wqe_n = 1 << rxq->elts_n; + const uint32_t wqe_mask = wqe_n - 1; + const uint32_t cqe_n = 1 << rxq->cqe_n; + const uint32_t cqe_mask = cqe_n - 1; + const unsigned int sges_n = rxq->sges_n; + const uint32_t pkt_mask = wqe_mask >> sges_n; + struct rte_mbuf *pkt = NULL; + struct rte_mbuf *seg = NULL; + volatile struct mlx5_cqe *cqe = + &(*rxq->cqes)[rxq->cq_ci & cqe_mask]; + unsigned int i = 0; + int len = 0; /* keep its value across iterations. */ + const uint32_t rq_ci = rxq->rq_ci; + uint32_t idx = 0; + + do { + volatile struct mlx5_wqe_data_seg *wqe; + struct rte_mbuf *rep = NULL; + volatile struct mlx5_mini_cqe8 *mcqe = NULL; + uint32_t delta; + uint16_t skip_cnt; + + if (!pkt) { + cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask]; + rte_prefetch0(cqe); + /* Allocate from the first packet mbuf pool */ + rep = (*rxq->elts)[0]; + /* We must allocate before CQE consuming to allow retry */ + rep = rte_mbuf_raw_alloc(rep->pool); + if (unlikely(rep == NULL)) { + ++rxq->stats.rx_nombuf; + break; + } + len = mlx5_rx_poll_len(rxq, cqe, cqe_n, cqe_mask, + &mcqe, &skip_cnt, false, &idx); + if (unlikely(len == MLX5_CRITICAL_ERROR_CQE_RET)) { + rte_mbuf_raw_free(rep); + mlx5_rq_win_reset(rxq); + break; + } + if (len == 0) { + rte_mbuf_raw_free(rep); + break; + } + idx &= pkt_mask; + delta = (idx - rxq->rq_ci) & pkt_mask; + MLX5_ASSERT(delta < ((rxq->rq_win_idx_mask + 1) * MLX5_WINOOO_BITS)); + if (likely(!mlx5_rq_win_test(rxq))) { + /* No out of order completions in sliding window */ + if (likely(delta == 0)) + rxq->rq_ci++; + else + mlx5_rq_win_update(rxq, delta); + } else { + /* We have out of order completions */ + rxq->rq_ci += mlx5_rq_win_advance(rxq, delta); + } + if (rxq->zip.ai == 0) + rxq->rq_ci_ooo = rxq->rq_ci; + idx <<= sges_n; + /* We drop packets with non-critical errors */ + if (unlikely(len & MLX5_ERROR_CQE_MASK)) { + rte_mbuf_raw_free(rep); + continue; + } + } + wqe = &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[idx]; + if (unlikely(pkt)) + NEXT(seg) = (*rxq->elts)[idx]; + seg = (*rxq->elts)[idx]; + rte_prefetch0(seg); + rte_prefetch0(wqe); + /* Allocate the buf from the same pool. */ + if (unlikely(rep == NULL)) { + rep = rte_mbuf_raw_alloc(seg->pool); + if (unlikely(rep == NULL)) { + ++rxq->stats.rx_nombuf; + if (!pkt) { + /* + * no buffers before we even started, + * bail out silently. + */ + break; + } + while (pkt != seg) { + MLX5_ASSERT(pkt != (*rxq->elts)[idx]); + rep = NEXT(pkt); + NEXT(pkt) = NULL; + NB_SEGS(pkt) = 1; + rte_mbuf_raw_free(pkt); + pkt = rep; + } + break; + } + } + if (!pkt) { + pkt = seg; + MLX5_ASSERT(len >= (rxq->crc_present << 2)); + pkt->ol_flags &= RTE_MBUF_F_EXTERNAL; + if (rxq->cqe_comp_layout && mcqe) + cqe = &rxq->title_cqe; + rxq_cq_to_mbuf(rxq, pkt, cqe, mcqe); + if (rxq->crc_present) + len -= RTE_ETHER_CRC_LEN; + PKT_LEN(pkt) = len; + if (cqe->lro_num_seg > 1) { + mlx5_lro_update_hdr + (rte_pktmbuf_mtod(pkt, uint8_t *), cqe, + mcqe, rxq, len); + pkt->ol_flags |= RTE_MBUF_F_RX_LRO; + pkt->tso_segsz = len / cqe->lro_num_seg; + } + } + DATA_LEN(rep) = DATA_LEN(seg); + PKT_LEN(rep) = PKT_LEN(seg); + SET_DATA_OFF(rep, DATA_OFF(seg)); + PORT(rep) = PORT(seg); + (*rxq->elts)[idx] = rep; + /* + * Fill NIC descriptor with the new buffer. The lkey and size + * of the buffers are already known, only the buffer address + * changes. + */ + wqe->addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(rep, uintptr_t)); + /* If there's only one MR, no need to replace LKey in WQE. */ + if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1)) + wqe->lkey = mlx5_rx_mb2mr(rxq, rep); + if (len > DATA_LEN(seg)) { + len -= DATA_LEN(seg); + ++NB_SEGS(pkt); + ++idx; + idx &= wqe_mask; + continue; + } + DATA_LEN(seg) = len; +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment bytes counter. */ + rxq->stats.ibytes += PKT_LEN(pkt); +#endif + /* Return packet. */ + *(pkts++) = pkt; + pkt = NULL; + ++i; + } while (i < pkts_n); + if (unlikely(i == 0 && rq_ci == rxq->rq_ci_ooo)) + return 0; + /* Update the consumer index. */ + rte_io_wmb(); + *rxq->cq_db = rte_cpu_to_be_32(rxq->cq_ci); + rte_io_wmb(); + *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci_ooo); +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment packets counter. */ + rxq->stats.ipackets += i; +#endif + return i; +} + /** * Update LRO packet TCP header. @@ -1220,5 +1492,5 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) } cqe = &(*rxq->cqes)[rxq->cq_ci & cq_mask]; - ret = mlx5_rx_poll_len(rxq, cqe, cqe_n, cq_mask, &mcqe, &skip_cnt, true); + ret = mlx5_rx_poll_len(rxq, cqe, cqe_n, cq_mask, &mcqe, &skip_cnt, true, NULL); if (unlikely(ret & MLX5_ERROR_CQE_MASK)) { if (ret == MLX5_CRITICAL_ERROR_CQE_RET) { diff --git a/drivers/net/mlx5/mlx5_rx.h b/drivers/net/mlx5/mlx5_rx.h index 6c48a37be7..6ec5f82022 100644 --- a/drivers/net/mlx5/mlx5_rx.h +++ b/drivers/net/mlx5/mlx5_rx.h @@ -23,4 +23,5 @@ /* Support tunnel matching. */ #define MLX5_FLOW_TUNNEL 10 +#define MLX5_WINOOO_BITS (sizeof(uint32_t) * CHAR_BIT) #define RXQ_PORT(rxq_ctrl) LIST_FIRST(&(rxq_ctrl)->owners)->priv @@ -47,4 +48,5 @@ struct rxq_zip { uint32_t na; /* Next array index. */ uint32_t cq_ci; /* The next CQE. */ + uint16_t wqe_idx; /* WQE index */ }; @@ -107,4 +109,5 @@ struct __rte_cache_aligned mlx5_rxq_data { uint32_t elts_ci; uint32_t rq_ci; + uint32_t rq_ci_ooo; uint16_t consumed_strd; /* Number of consumed strides in WQE. */ uint32_t rq_pi; @@ -147,4 +150,8 @@ struct __rte_cache_aligned mlx5_rxq_data { struct mlx5_eth_rxseg rxseg[MLX5_MAX_RXQ_NSEG]; /* Buffer split segment descriptions - sizes, offsets, pools. */ + uint16_t rq_win_cnt; /* Number of packets in the sliding window data. */ + uint16_t rq_win_idx_mask; /* Sliding window index wrapping mask. */ + uint16_t rq_win_idx; /* Index of the first element in sliding window. */ + uint32_t *rq_win_data; /* Out-of-Order completions sliding window. */ }; @@ -286,5 +293,6 @@ int mlx5_hrxq_modify(struct rte_eth_dev *dev, uint32_t hxrq_idx, uint16_t mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n); -void mlx5_rxq_initialize(struct mlx5_rxq_data *rxq); +uint16_t mlx5_rx_burst_out_of_order(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n); +int mlx5_rxq_initialize(struct mlx5_rxq_data *rxq); __rte_noinline int mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, uint16_t err_n, uint16_t *skip_cnt); @@ -312,4 +320,5 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t mlx5_rx_burst_mprq_vec(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n); +void rxq_sync_cq(struct mlx5_rxq_data *rxq); static int mlx5_rxq_mprq_enabled(struct mlx5_rxq_data *rxq); @@ -642,4 +651,21 @@ mlx5_mprq_enabled(struct rte_eth_dev *dev) } +/** + * Check whether Shared RQ is enabled for the device. + * + * @param dev + * Pointer to Ethernet device. + * + * @return + * 0 if disabled, otherwise enabled. + */ +static __rte_always_inline int +mlx5_shared_rq_enabled(struct rte_eth_dev *dev) +{ + struct mlx5_priv *priv = dev->data->dev_private; + + return !LIST_EMPTY(&priv->sh->shared_rxqs); +} + /** * Check whether given RxQ is external. diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 6047529535..75733339e4 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -421,5 +421,5 @@ mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx) /* Fetches and drops all SW-owned and error CQEs to synchronize CQ. */ -static void +void rxq_sync_cq(struct mlx5_rxq_data *rxq) { @@ -593,5 +593,11 @@ mlx5_rx_queue_start_primary(struct rte_eth_dev *dev, uint16_t idx) } /* Reinitialize RQ - set WQEs. */ - mlx5_rxq_initialize(rxq_data); + ret = mlx5_rxq_initialize(rxq_data); + if (ret) { + DRV_LOG(ERR, "Port %u Rx queue %u RQ initialization failure.", + priv->dev_data->port_id, rxq->idx); + rte_errno = ENOMEM; + return ret; + } rxq_data->err_state = MLX5_RXQ_ERR_STATE_NO_ERROR; /* Set actual queue state. */ @@ -2306,4 +2312,5 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx) LIST_REMOVE(rxq_ctrl, share_entry); LIST_REMOVE(rxq_ctrl, next); + mlx5_free(rxq_ctrl->rxq.rq_win_data); mlx5_free(rxq_ctrl); } -- 2.50.0 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2025-07-18 20:29:16.483936765 +0100 +++ 0157-net-mlx5-fix-out-of-order-completions-in-ordinary-Rx.patch 2025-07-18 20:29:11.154908017 +0100 @@ -1 +1 @@ -From 5f9223611f3570c974b9c8e6c0b62db605fb3076 Mon Sep 17 00:00:00 2001 +From f22dca1f87a4cc856e9221aae5de4df58b19a7b3 Mon Sep 17 00:00:00 2001 @@ -5,0 +6,2 @@ +[ upstream commit 5f9223611f3570c974b9c8e6c0b62db605fb3076 ] + @@ -18 +19,0 @@ -Cc: stable@dpdk.org @@ -48 +49 @@ -index 0ee16ba4f0..10bd93c29a 100644 +index b9d29ca7d5..f9081b0e30 100644 @@ -51 +52 @@ -@@ -684,5 +684,10 @@ mlx5_rxq_devx_obj_new(struct mlx5_rxq_priv *rxq) +@@ -710,5 +710,10 @@ mlx5_rxq_devx_obj_new(struct mlx5_rxq_priv *rxq) @@ -64 +65 @@ -index b7df39ace9..68d1c1bfa7 100644 +index f2ae75a8e1..ddfe968a99 100644 @@ -67 +68 @@ -@@ -649,4 +649,5 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev, size_t *no_of_elements) +@@ -618,4 +618,5 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev, size_t *no_of_elements) @@ -73 +74 @@ -@@ -719,5 +720,10 @@ mlx5_select_rx_function(struct rte_eth_dev *dev) +@@ -688,5 +689,10 @@ mlx5_select_rx_function(struct rte_eth_dev *dev) @@ -86 +87 @@ -index 5f4a93fe8c..5e8c312d00 100644 +index 5e58eb8bc9..0f2152fdb0 100644 @@ -89 +90 @@ -@@ -43,5 +43,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -42,5 +42,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -96 +97 @@ -@@ -222,4 +222,6 @@ mlx5_rx_burst_mode_get(struct rte_eth_dev *dev, +@@ -221,4 +221,6 @@ mlx5_rx_burst_mode_get(struct rte_eth_dev *dev, @@ -103 +104 @@ -@@ -360,4 +362,75 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -359,4 +361,75 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -179 +180 @@ -@@ -366,5 +439,5 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -365,5 +438,5 @@ rxq_cq_to_pkt_type(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -186 +187 @@ -@@ -415,6 +488,10 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq) +@@ -414,6 +487,10 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq) @@ -197 +198 @@ -@@ -525,4 +602,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, +@@ -524,4 +601,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, @@ -205 +206 @@ -@@ -584,5 +664,6 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, +@@ -583,5 +663,6 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, @@ -213 +214 @@ -@@ -614,4 +695,8 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, +@@ -613,4 +694,8 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec, @@ -222 +223 @@ -@@ -625,5 +710,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -624,5 +709,5 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -229 +230 @@ -@@ -641,4 +726,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -640,4 +725,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -236 +237 @@ -@@ -694,4 +781,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -693,4 +780,7 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -244 +245 @@ -@@ -738,4 +828,8 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -737,4 +827,8 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -253 +254 @@ -@@ -762,4 +856,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, +@@ -761,4 +855,6 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe, @@ -260 +261 @@ -@@ -977,5 +1073,6 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +@@ -976,5 +1072,6 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) @@ -268 +269 @@ -@@ -1063,4 +1160,179 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +@@ -1062,4 +1159,179 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) @@ -448 +449 @@ -@@ -1221,5 +1493,5 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +@@ -1220,5 +1492,5 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) @@ -456 +457 @@ -index 6380895502..4f3d73e3c4 100644 +index 6c48a37be7..6ec5f82022 100644 @@ -465 +466 @@ -@@ -65,4 +66,5 @@ struct rxq_zip { +@@ -47,4 +48,5 @@ struct rxq_zip { @@ -471 +472 @@ -@@ -125,4 +127,5 @@ struct __rte_cache_aligned mlx5_rxq_data { +@@ -107,4 +109,5 @@ struct __rte_cache_aligned mlx5_rxq_data { @@ -477 +478 @@ -@@ -165,4 +168,8 @@ struct __rte_cache_aligned mlx5_rxq_data { +@@ -147,4 +150,8 @@ struct __rte_cache_aligned mlx5_rxq_data { @@ -486 +487 @@ -@@ -306,5 +313,6 @@ int mlx5_hrxq_modify(struct rte_eth_dev *dev, uint32_t hxrq_idx, +@@ -286,5 +293,6 @@ int mlx5_hrxq_modify(struct rte_eth_dev *dev, uint32_t hxrq_idx, @@ -494 +495 @@ -@@ -332,4 +340,5 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_rxq, struct rte_mbuf **pkts, +@@ -312,4 +320,5 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_rxq, struct rte_mbuf **pkts, @@ -500 +501 @@ -@@ -662,4 +671,21 @@ mlx5_mprq_enabled(struct rte_eth_dev *dev) +@@ -642,4 +651,21 @@ mlx5_mprq_enabled(struct rte_eth_dev *dev) @@ -523 +524 @@ -index 2e9bcbea4d..77c5848c37 100644 +index 6047529535..75733339e4 100644 @@ -526 +527 @@ -@@ -422,5 +422,5 @@ mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx) +@@ -421,5 +421,5 @@ mlx5_rxq_releasable(struct rte_eth_dev *dev, uint16_t idx) @@ -533 +534 @@ -@@ -594,5 +594,11 @@ mlx5_rx_queue_start_primary(struct rte_eth_dev *dev, uint16_t idx) +@@ -593,5 +593,11 @@ mlx5_rx_queue_start_primary(struct rte_eth_dev *dev, uint16_t idx) @@ -546 +547 @@ -@@ -2361,4 +2367,5 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx) +@@ -2306,4 +2312,5 @@ mlx5_rxq_release(struct rte_eth_dev *dev, uint16_t idx)