DPDK patches and discussions
 help / color / mirror / Atom feed
From: Shai Brandes <shaibran@amazon.com>
To: <stephen@networkplumber.org>
Cc: <dev@dpdk.org>, Shai Brandes <shaibran@amazon.com>
Subject: [PATCH 02/21] net/ena/base: rework admin timeout handling
Date: Wed, 15 Oct 2025 10:06:48 +0300	[thread overview]
Message-ID: <20251015070707.340-3-shaibran@amazon.com> (raw)
In-Reply-To: <20251015070707.340-1-shaibran@amazon.com>

The admin queue (AQ) consists of a submission queue (SQ)
and a completion queue (CQ), sharing a command context
object per request.
Each command is identified by a unique ID, used by both the
driver and the device to track completion status.

In polling mode, the submitting thread polls the CQ until the
command completes or times out.
In interrupt mode, the thread sleeps until an IRQ signals
completion or timeout occurs.

If a timeout happens in interrupt mode, the thread begins
polling the CQ itself which might lead to:
- stale or NULL pointer access if resources are freed prematurely
- corrupted queue head counters due to concurrent updates

This patch introduces a manual spin lock using atomics to serialize
access to shared resources.
The lock avoids blocking in interrupt context and ensures safe command
completion handling.

The patch also defers updating the command status until completion
processing is done, preventing premature access to partially updated data.

The atomic operation were reworked as the DPDK has deprecated the legacy
rte_atomic32_xxx apis in favor of c11-compliant atomic operations.

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
Reviewed-by: Yosef Raisman <yraisman@amazon.com>
---
 drivers/net/ena/base/ena_com.c       | 74 ++++++++++++++++++----------
 drivers/net/ena/base/ena_com.h       |  1 +
 drivers/net/ena/base/ena_plat_dpdk.h | 22 +++++++--
 3 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ena/base/ena_com.c b/drivers/net/ena/base/ena_com.c
index 1a93d22b71..75145a0b3f 100644
--- a/drivers/net/ena/base/ena_com.c
+++ b/drivers/net/ena/base/ena_com.c
@@ -474,26 +474,38 @@ static void ena_com_handle_single_admin_completion(struct ena_com_admin_queue *a
 		return;
 	}
 
-	if (!comp_ctx->occupied)
-		return;
+	if (comp_ctx->user_cqe)
+		memcpy(comp_ctx->user_cqe, (void *)cqe, comp_ctx->comp_size);
 
-	comp_ctx->status = ENA_CMD_COMPLETED;
 	comp_ctx->comp_status = cqe->acq_common_descriptor.status;
 
-	if (comp_ctx->user_cqe)
-		memcpy(comp_ctx->user_cqe, (void *)cqe, comp_ctx->comp_size);
+	/* Make sure that the response is filled in before reporting completion */
+	smp_wmb();
+	comp_ctx->status = ENA_CMD_COMPLETED;
+	/* Ensure status is written before waking waiting thread */
+	smp_wmb();
 
 	if (!admin_queue->polling)
 		ENA_WAIT_EVENT_SIGNAL(comp_ctx->wait_event);
 }
 
-static void ena_com_handle_admin_completion(struct ena_com_admin_queue *admin_queue)
+static void ena_com_handle_admin_completion(struct ena_com_admin_queue *admin_queue,
+					    bool busy_poll_ownership)
 {
 	struct ena_admin_acq_entry *cqe = NULL;
 	u16 comp_num = 0;
 	u16 head_masked;
 	u8 phase;
 
+	/* Only try to acquire ownership once if busy_poll_ownership is false. This
+	 * is to prevent two threads fighting over ownership concurrently. The boolean
+	 * allows to distinguish the thread with the higher priority
+	 */
+	while (!ATOMIC32_CMP_EXCHANGE(&admin_queue->polling_for_completions, 0, 1)) {
+		if (!busy_poll_ownership)
+			return;
+	}
+
 	head_masked = admin_queue->cq.head & (admin_queue->q_depth - 1);
 	phase = admin_queue->cq.phase;
 
@@ -522,6 +534,8 @@ static void ena_com_handle_admin_completion(struct ena_com_admin_queue *admin_qu
 	admin_queue->cq.phase = phase;
 	admin_queue->sq.head += comp_num;
 	admin_queue->stats.completed_cmd += comp_num;
+
+	ATOMIC32_SET_RELEASE(&admin_queue->polling_for_completions, 0);
 }
 
 static int ena_com_comp_status_to_errno(struct ena_com_admin_queue *admin_queue,
@@ -570,7 +584,7 @@ static int ena_com_wait_and_process_admin_cq_polling(struct ena_comp_ctx *comp_c
 
 	while (1) {
 		ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
-		ena_com_handle_admin_completion(admin_queue);
+		ena_com_handle_admin_completion(admin_queue, true);
 		ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
 
 		if (comp_ctx->status != ENA_CMD_SUBMITTED)
@@ -793,32 +807,33 @@ static int ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
 	 */
 	if (unlikely(comp_ctx->status == ENA_CMD_SUBMITTED)) {
 		ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
-		ena_com_handle_admin_completion(admin_queue);
+		ena_com_handle_admin_completion(admin_queue, true);
 		admin_queue->stats.no_completion++;
 		ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);
 
-		if (comp_ctx->status == ENA_CMD_COMPLETED) {
-			admin_queue->is_missing_admin_interrupt = true;
-			ena_trc_err(admin_queue->ena_dev,
-				    "The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd %d), autopolling mode is %s\n",
-				    comp_ctx->cmd_opcode, admin_queue->auto_polling ? "ON" : "OFF");
-			/* Check if fallback to polling is enabled */
-			if (admin_queue->auto_polling)
-				admin_queue->polling = true;
-		} else {
+		ret = ENA_COM_TIMER_EXPIRED;
+		/* Now that the admin queue has been polled, check whether the
+		 * request was fulfilled by the device
+		 */
+		if (comp_ctx->status != ENA_CMD_COMPLETED) {
 			ena_trc_err(admin_queue->ena_dev,
 				    "The ena device didn't send a completion for the admin cmd %d status %d\n",
 				    comp_ctx->cmd_opcode, comp_ctx->status);
+			goto close_aq;
 		}
-		/* Check if shifted to polling mode.
-		 * This will happen if there is a completion without an interrupt
-		 * and autopolling mode is enabled. Continuing normal execution in such case
+
+		admin_queue->is_missing_admin_interrupt = true;
+
+		ena_trc_err(admin_queue->ena_dev,
+			    "The ena device sent a completion but the driver didn't receive a MSI-X interrupt (cmd %d), autopolling mode is %s\n",
+			    comp_ctx->cmd_opcode, admin_queue->auto_polling ? "ON" : "OFF");
+		/* If fallback to polling is not enabled, missing an interrupt
+		 * is considered an error
 		 */
-		if (!admin_queue->polling) {
-			admin_queue->running_state = false;
-			ret = ENA_COM_TIMER_EXPIRED;
-			goto err;
-		}
+		if (!admin_queue->auto_polling)
+			goto close_aq;
+
+		ena_com_set_admin_polling_mode(admin_queue->ena_dev, true);
 	} else if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) {
 		ena_trc_err(admin_queue->ena_dev, "Command was aborted\n");
 		ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags);
@@ -829,8 +844,14 @@ static int ena_com_wait_and_process_admin_cq_interrupts(struct ena_comp_ctx *com
 	}
 
 	ret = ena_com_comp_status_to_errno(admin_queue, comp_ctx->comp_status);
+	comp_ctxt_release(admin_queue, comp_ctx);
+
+	return ret;
+close_aq:
+	admin_queue->running_state = false;
 err:
 	comp_ctxt_release(admin_queue, comp_ctx);
+
 	return ret;
 }
 
@@ -2107,6 +2128,7 @@ int ena_com_admin_init(struct ena_com_dev *ena_dev,
 	admin_queue->curr_cmd_id = 0;
 
 	ATOMIC32_SET(&admin_queue->outstanding_cmds, 0);
+	ATOMIC32_SET(&admin_queue->polling_for_completions, 0);
 
 	ENA_SPINLOCK_INIT(admin_queue->q_lock);
 
@@ -2394,7 +2416,7 @@ int ena_com_get_dev_attr_feat(struct ena_com_dev *ena_dev,
 
 void ena_com_admin_q_comp_intr_handler(struct ena_com_dev *ena_dev)
 {
-	ena_com_handle_admin_completion(&ena_dev->admin_queue);
+	ena_com_handle_admin_completion(&ena_dev->admin_queue, false);
 }
 
 /* ena_handle_specific_aenq_event:
diff --git a/drivers/net/ena/base/ena_com.h b/drivers/net/ena/base/ena_com.h
index 1c752327dc..6b9a780755 100644
--- a/drivers/net/ena/base/ena_com.h
+++ b/drivers/net/ena/base/ena_com.h
@@ -226,6 +226,7 @@ struct ena_com_admin_queue {
 
 	/* Indicate if the admin queue should poll for completion */
 	bool polling;
+	ena_atomic32_t polling_for_completions;
 
 	/* Define if fallback to polling mode should occur */
 	bool auto_polling;
diff --git a/drivers/net/ena/base/ena_plat_dpdk.h b/drivers/net/ena/base/ena_plat_dpdk.h
index 3b574f888a..18c6837566 100644
--- a/drivers/net/ena/base/ena_plat_dpdk.h
+++ b/drivers/net/ena/base/ena_plat_dpdk.h
@@ -74,6 +74,10 @@ typedef uint64_t dma_addr_t;
 #define mmiowb rte_io_wmb
 #define __iomem
 
+#define smp_wmb rte_smp_wmb
+#define smp_rmb rte_smp_rmb
+#define smp_mb rte_smp_mb
+
 #ifndef READ_ONCE
 #define READ_ONCE(var) (*((volatile typeof(var) *)(&(var))))
 #endif
@@ -267,10 +271,20 @@ ena_mem_alloc_coherent(struct rte_eth_dev_data *data, size_t size,
 #define ENA_REG_READ32(bus, reg)					       \
 	__extension__ ({ (void)(bus); rte_read32_relaxed((reg)); })
 
-#define ATOMIC32_INC(i32_ptr) rte_atomic32_inc(i32_ptr)
-#define ATOMIC32_DEC(i32_ptr) rte_atomic32_dec(i32_ptr)
-#define ATOMIC32_SET(i32_ptr, val) rte_atomic32_set(i32_ptr, val)
-#define ATOMIC32_READ(i32_ptr) rte_atomic32_read(i32_ptr)
+#define ATOMIC32_INC(i32_ptr)									\
+	rte_atomic_fetch_add_explicit(&(i32_ptr)->cnt, 1, rte_memory_order_seq_cst)
+#define ATOMIC32_DEC(i32_ptr)									\
+	rte_atomic_fetch_sub_explicit(&(i32_ptr)->cnt, 1, rte_memory_order_seq_cst)
+#define ATOMIC32_SET(i32_ptr, val)								\
+	rte_atomic_store_explicit(&(i32_ptr)->cnt, val, rte_memory_order_seq_cst)
+#define ATOMIC32_SET_RELEASE(i32_ptr, val)							\
+	do {											\
+		rte_atomic_thread_fence(rte_memory_order_release);				\
+		rte_atomic_store_explicit(&(i32_ptr)->cnt, val, rte_memory_order_seq_cst);	\
+	} while (0)
+#define ATOMIC32_READ(i32_ptr) rte_atomic_load_explicit(&(i32_ptr)->cnt, rte_memory_order_seq_cst)
+#define ATOMIC32_CMP_EXCHANGE(I32_PTR, OLD, NEW) \
+	rte_atomic32_cmpset((volatile uint32_t *)(I32_PTR), OLD, NEW)
 
 #define msleep(x) rte_delay_us(x * 1000)
 #define udelay(x) rte_delay_us(x)
-- 
2.17.1


  parent reply	other threads:[~2025-10-15  7:07 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15  7:06 [PATCH 00/21] net/ena: Release 2.14.0 Shai Brandes
2025-10-15  7:06 ` [PATCH 01/21] net/ena/base: optimize Tx desc fields setting Shai Brandes
2025-10-15  7:06 ` Shai Brandes [this message]
2025-10-15  7:06 ` [PATCH 03/21] net/ena/base: add extended Tx cdesc support Shai Brandes
2025-10-15  7:06 ` [PATCH 04/21] net/ena/base: add IO ring helper functions Shai Brandes
2025-10-15  8:50 ` [PATCH 00/21] net/ena: Release 2.14.0 Brandes, Shai
2025-10-15 18:12 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251015070707.340-3-shaibran@amazon.com \
    --to=shaibran@amazon.com \
    --cc=dev@dpdk.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).