[PATCH v1 09/12] net/ice/base: improve global config lock behavior

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Anatoly Burakov <anatoly.burakov@intel.com>
To: dev@dpdk.org, Bruce Richardson <bruce.richardson@intel.com>
Subject: [PATCH v1 09/12] net/ice/base: improve global config lock behavior
Date: Tue,  2 Sep 2025 18:26:59 +0100	[thread overview]
Message-ID: <890cfe97d9f716a7a65c028578bd1fc90ff04c4b.1756833701.git.anatoly.burakov@intel.com> (raw)
In-Reply-To: <cover.1756833701.git.anatoly.burakov@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

The ice_cfg_tx_topo function attempts to apply Tx scheduler topology
configuration based on NVM parameters, selecting either a 5 or 9 layer
topology.

As part of this flow, the driver acquires the "Global Configuration
Lock", which is a hardware resource associated with programming the DDP
package to the device. This "lock" is implemented by firmware as a way to
guarantee that only one PF can program the DDP for a device. Unlike a
traditional lock, once a PF has acquired this lock, no other PF will be
able to acquire it again (including that PF) until a core reset of the
device. Future requests to acquire the lock report that global
configuration has already completed.

The following flow is used to program the Tx topology:

 * Read the DDP package for scheduler configuration data
 * Acquire the global configuration lock
 * Program Tx scheduler topology according to DDP package data
 * Trigger a core reset which clears the global configuration lock

This is followed by the flow for programming the DDP package:

 * Acquire the global configuration lock (again)
 * Download the DDP package to the device
 * Release the global configuration lock.

However, if configuration of the Tx topology fails, (i.e.
ice_get_set_tx_topo() returns an error code), the driver exits
ice_cfg_tx_topo() immediately, and fails to trigger core reset.

While the global configuration lock is held, the firmware rejects most
AdminQ commands, as it is waiting for the DDP package download (or Tx
scheduler topology programming) to occur.

The current driver flows assume that the global configuration lock has
been reset after programming the Tx topology. Thus, the same PF attempts
to acquire the global lock again, and fails. This results in the driver
reporting "an unknown error occurred when loading the DDP package". It
then attempts to enter safe mode, but ultimately fails to finish
ice_probe() since nearly all AdminQ command report error codes, and the
driver stops loading the device at some point during its initialization.

We cannot simply release the global lock after a failed call to
ice_get_set_tx_topo(). Releasing the lock indicates to firmware that
global configuration (downloading of the DDP) has completed. Future
attempts by this or other PFs to load the DDP will fail with a report
that the DDP package has already been downloaded. Then, PFs will enter
safe mode as they realize that the package on the device does not meet
the minimum version requirement to load. The reported error messages are
confusing, as they indicate the version of the default "safe mode"
package in the NVM, rather than the version of the DDP package loaded
from the filesystem.

Instead, we need to trigger core reset to clear global configuration.
This is the lowest level of hardware reset which clears the global
configuration lock and related state. It also clears any already
downloaded DDP. Crucially, it does *not* clear the Tx scheduler topology
configuration.

Refactor ice_cfg_tx_topo() to always trigger a core reset after acquiring
the global lock, regardless of success or failure of the topology
configuration.

We need to re-initialize the HW structure when we trigger the core reset.
Previously, this was the responsibility of the core driver to cleanup
after the core reset. Instead, make it the responsibility of this
function. This avoids needless re-initialization for the cases where no
reset occurred.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 drivers/net/intel/ice/base/ice_ddp.c | 34 ++++++++++++++++++----------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/net/intel/ice/base/ice_ddp.c b/drivers/net/intel/ice/base/ice_ddp.c
index 850c722a3f..68e75be4d2 100644
--- a/drivers/net/intel/ice/base/ice_ddp.c
+++ b/drivers/net/intel/ice/base/ice_ddp.c
@@ -2370,7 +2370,7 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 	struct ice_buf_hdr *section;
 	struct ice_pkg_hdr *pkg_hdr;
 	enum ice_ddp_state state;
-	u16 i, size = 0, offset;
+	u16 size = 0, offset;
 	u32 reg = 0;
 	int status;
 	u8 flags;
@@ -2457,25 +2457,35 @@ int ice_cfg_tx_topo(struct ice_hw *hw, u8 *buf, u32 len)
 	/* check reset was triggered already or not */
 	reg = rd32(hw, GLGEN_RSTAT);
 	if (reg & GLGEN_RSTAT_DEVSTATE_M) {
-		/* Reset is in progress, re-init the hw again */
 		ice_debug(hw, ICE_DBG_INIT, "Reset is in progress. layer topology might be applied already\n");
 		ice_check_reset(hw);
-		return 0;
+		/* Reset is in progress, re-init the hw again */
+		goto reinit_hw;
 	}

 	/* set new topology */
 	status = ice_get_set_tx_topo(hw, new_topo, size, NULL, NULL, true);
 	if (status) {
-		ice_debug(hw, ICE_DBG_INIT, "Set tx topology is failed\n");
-		return status;
+		ice_debug(hw, ICE_DBG_INIT, "Failed setting Tx topology, status %d\n",
+			  status);
+		status = ICE_ERR_CFG;
 	}

-	/* new topology is updated, delay 1 second before issuing the CORRER */
-	for (i = 0; i < 10; i++)
-		ice_msec_delay(100, true);
+	/* Even if Tx topology config failed, we need to CORE reset here to
+	 * clear the global configuration lock. Delay 1 second to allow
+	 * hardware to settle then issue a CORER
+	 */
+	ice_msec_delay(1000, true);
 	ice_reset(hw, ICE_RESET_CORER);
-	/* CORER will clear the global lock, so no explicit call
-	 * required for release
-	 */
-	return 0;
+	ice_check_reset(hw);
+
+reinit_hw:
+	/* Since we triggered a CORER, re-initialize hardware */
+	ice_deinit_hw(hw);
+	if (ice_init_hw(hw)) {
+		ice_debug(hw, ICE_DBG_INIT, "Failed to re-init hardware after setting Tx topology\n");
+		return ICE_ERR_RESET_FAILED;
+	}
+
+	return status;
 }
-- 
2.47.3

next prev parent reply	other threads:[~2025-09-02 17:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 17:26 [PATCH v1 00/12] net/ice: update to latest version Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 01/12] net/ice/base: add direction metadata Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 02/12] net/ice/base: fix adding special words Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 03/12] net/ice/base: fix memory leak in HW profile handling Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 04/12] net/ice/base: fix memory leak in recipe handling Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 05/12] net/ice/base: clean up RSS LUT selection Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 06/12] net/ice/base: add 40G speed Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 07/12] net/ice/base: allow overriding recipe ID Anatoly Burakov
2025-09-02 17:26 ` [PATCH v1 08/12] net/ice/base: add missing health status defines Anatoly Burakov
2025-09-02 17:26 ` Anatoly Burakov [this message]
2025-09-02 17:27 ` [PATCH v1 10/12] net/ice: count drop-all filter stats in Rx stats Anatoly Burakov
2025-09-02 17:27 ` [PATCH v1 11/12] net/ice/base: add E835 device ID's Anatoly Burakov
2025-09-02 17:27 ` [PATCH v1 12/12] net/ice: update README Anatoly Burakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=890cfe97d9f716a7a65c028578bd1fc90ff04c4b.1756833701.git.anatoly.burakov@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).