DPDK patches and discussions
 help / color / mirror / Atom feed
From: <shaibran@amazon.com>
To: <ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, Shai Brandes <shaibran@amazon.com>
Subject: [PATCH 32/33] net/ena: control path pure polling mode
Date: Mon, 4 Mar 2024 11:01:35 +0200	[thread overview]
Message-ID: <20240304090136.861-33-shaibran@amazon.com> (raw)
In-Reply-To: <20240304090136.861-1-shaibran@amazon.com>

From: Shai Brandes <shaibran@amazon.com>

This commit implements a new operation mode that enables purely
polling-based functionality, eliminating the need for interrupts in
the control path. This mode is not activated by default and can be
toggled using the "control_poll_interval" devarg. When operating in
this mode, periodic alarms are used to monitor the control queues.

A non-zero value for this devarg is mandatory for control path
functionality when binding ports to uio_pci_generic kernel module which
lacks interrupt support.

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
---
 doc/guides/nics/ena.rst                |  49 ++++++++---
 doc/guides/rel_notes/release_24_03.rst |   2 +
 drivers/net/ena/ena_ethdev.c           | 108 ++++++++++++++++++++-----
 drivers/net/ena/ena_ethdev.h           |   5 ++
 4 files changed, 130 insertions(+), 34 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 53c9341859..a94397f9d3 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -109,12 +109,16 @@ Runtime Configuration
 
    * **llq_policy** (default 1)
 
-     Controls whether use device recommended header policy or override it.
+     Controls whether use device recommended header policy or override it:
+
      0 - Disable LLQ.
-         **Use with extreme caution as it leads to a huge performance
-         degradation on AWS instances from 6th generation onwards.**
+     **Use with extreme caution as it leads to a huge performance
+     degradation on AWS instances from 6th generation onwards.**
+
      1 - Accept device recommended LLQ policy (Default).
+
      2 - Enforce normal LLQ policy.
+
      3 - Enforce large LLQ policy.
 
    * **miss_txc_to** (default 5)
@@ -126,6 +130,18 @@ Runtime Configuration
      timer service. Setting this parameter to 0 disables this feature. Maximum
      allowed value is 60 seconds.
 
+   * **control_poll_interval** (default 0)
+
+     Enable polling-based functionality of the admin queues, eliminating the
+     need for interrupts in the control-path:
+
+     0 - Disable (Admin queue will work in interrupt mode).
+
+     [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues.
+
+     **A non-zero value for this devarg is mandatory for control path functionality
+     when binding ports to uio_pci_generic kernel module which lacks interrupt support.**
+
 ENA Configuration Parameters
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -164,23 +180,23 @@ Prerequisites
 #. Prepare the system as recommended by DPDK suite.  This includes environment
    variables, hugepages configuration, tool-chains and configuration.
 
-#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver.
+#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver.
 
    (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature
    reduces the latency of the packets by pushing the header directly through
    the PCI to the device, before the DMA is even triggered. For proper work
-   kernel PCI driver must support write combining (WC).
+   kernel PCI driver must support write-combining (WC).
    In DPDK ``igb_uio`` it must be enabled by loading module with
    ``wc_activate=1`` flag (example below). However, mainline's vfio-pci
-   driver in kernel doesn't have WC support yet (planed to be added).
+   driver in kernel doesn't have WC support yet (planned to be added).
    If vfio-pci is used user should follow `AWS ENA PMD documentation
    <https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk/README.md>`_.
 
-#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command
-   ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1``
-   respectively.
+#. For ``igb_uio``:
+   Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1``
 
-#. For ``vfio-pci`` users only:
+#. For ``vfio-pci``:
+   Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci``
    Please make sure that ``IOMMU`` is enabled in your system,
    or use ``vfio`` driver in ``noiommu`` mode::
 
@@ -189,7 +205,14 @@ Prerequisites
    To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag
    ``CONFIG_VFIO_NOIOMMU``.
 
-#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module.
+#. For ``uio_pci_generic``:
+   Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``.
+
+   Note that when launching the application, the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended)
+   as ``uio_pci_generic`` lacks interrupt support. The control-path (admin queues) of the ENA require poll-mode
+   to process command completion and asyncronous notification from the device.
+
+#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module.
 
 At this point the system should be ready to run DPDK applications. Once the
 application runs to completion, the ENA can be detached from attached module if
@@ -198,7 +221,7 @@ necessary.
 **Rx interrupts support**
 
 ENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for
-input. Please note that it won't work with ``igb_uio``, so to use this feature,
+input. Please note that it won't work with ``igb_uio`` and ``uio_pci_generic`` so to use this feature,
 the ``vfio-pci`` should be used.
 
 ENA handles admin interrupts and AENQ notifications on separate interrupt.
@@ -209,7 +232,7 @@ will fail.
 **Note about usage on \*.metal instances**
 
 On AWS, the metal instances are supporting IOMMU for both arm64 and x86_64
-hosts.
+hosts. Note that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances.
 
 * x86_64 (e.g. c5.metal, i3.metal):
    IOMMU should be disabled by default. In that situation, the ``igb_uio`` can
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 9823616eeb..d01236097a 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -109,6 +109,8 @@ New Features
   * Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg `llq_policy`.
   * Added support for LLQ header size recommendation from the device.
   * Allowed large LLQ with 1024 entries when the device supports enlarged memory BAR.
+  * Added `control_poll_interval` devarg that configure control-path to work in poll-mode.
+  * Added support for binding ports to `uio_pci_generic` kernel module.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 43693ee2ee..af1f6d6d05 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3,6 +3,7 @@
  * All rights reserved.
  */
 
+#include <rte_alarm.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_version.h>
@@ -36,6 +37,8 @@
 
 #define ENA_MIN_RING_DESC	128
 
+#define USEC_PER_MSEC		1000UL
+
 #define BITS_PER_BYTE 8
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
@@ -95,6 +98,14 @@ struct ena_stats {
  * considered as a missing.
  */
 #define ENA_DEVARG_MISS_TXC_TO "miss_txc_to"
+/*
+ * Controls the period of time (in milliseconds) between two consecutive inspections of
+ * the control queues when the driver is in poll mode and not using interrupts.
+ * By default, this value is zero, indicating that the driver will not be in poll mode and will
+ * use interrupts. A non-zero value for this argument is mandatory when using uio_pci_generic
+ * driver.
+ */
+#define ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "control_path_poll_interval"
 
 /*
  * Each rte_memzone should have unique name.
@@ -271,7 +282,8 @@ static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter);
 static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter);
 static int ena_infos_get(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_info *dev_info);
-static void ena_interrupt_handler_rte(void *cb_arg);
+static void ena_control_path_handler(void *cb_arg);
+static void ena_control_path_poll_handler(void *cb_arg);
 static void ena_timer_wd_callback(struct rte_timer *timer, void *arg);
 static void ena_destroy_device(struct rte_eth_dev *eth_dev);
 static int eth_ena_dev_init(struct rte_eth_dev *eth_dev);
@@ -882,10 +894,14 @@ static int ena_close(struct rte_eth_dev *dev)
 		ret = ena_stop(dev);
 	adapter->state = ENA_ADAPTER_STATE_CLOSED;
 
-	rte_intr_disable(intr_handle);
-	rc = rte_intr_callback_unregister_sync(intr_handle, ena_interrupt_handler_rte, dev);
-	if (unlikely(rc != 0))
-		PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	if (!adapter->control_path_poll_interval) {
+		rte_intr_disable(intr_handle);
+		rc = rte_intr_callback_unregister_sync(intr_handle, ena_control_path_handler, dev);
+		if (unlikely(rc != 0))
+			PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	} else {
+		rte_eal_alarm_cancel(ena_control_path_poll_handler, dev);
+	}
 
 	ena_rx_queue_release_all(dev);
 	ena_tx_queue_release_all(dev);
@@ -1889,15 +1905,33 @@ static int ena_device_init(struct ena_adapter *adapter,
 	return rc;
 }
 
-static void ena_interrupt_handler_rte(void *cb_arg)
+static void ena_control_path_handler(void *cb_arg)
 {
 	struct rte_eth_dev *dev = cb_arg;
 	struct ena_adapter *adapter = dev->data->dev_private;
 	struct ena_com_dev *ena_dev = &adapter->ena_dev;
 
-	ena_com_admin_q_comp_intr_handler(ena_dev);
-	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED))
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_com_admin_q_comp_intr_handler(ena_dev);
 		ena_com_aenq_intr_handler(ena_dev, dev);
+	}
+}
+
+static void ena_control_path_poll_handler(void *cb_arg)
+{
+	struct rte_eth_dev *dev = cb_arg;
+	struct ena_adapter *adapter = dev->data->dev_private;
+	int rc;
+
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_control_path_handler(cb_arg);
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, cb_arg);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to retrigger control path alarm\n");
+			ena_trigger_reset(adapter, ENA_REGS_RESET_GENERIC);
+		}
+	}
 }
 
 static void check_for_missing_keep_alive(struct ena_adapter *adapter)
@@ -2362,20 +2396,28 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_spinlock_init(&adapter->admin_lock);
 
-	rte_intr_callback_register(intr_handle,
-				   ena_interrupt_handler_rte,
-				   eth_dev);
+	if (!adapter->control_path_poll_interval) {
+		/* Control path interrupt mode */
+		rte_intr_callback_register(intr_handle, ena_control_path_handler, eth_dev);
 	rte_intr_enable(intr_handle);
 	ena_com_set_admin_polling_mode(ena_dev, false);
 	ena_com_admin_aenq_enable(ena_dev);
-
+	} else {  /* Control path polling mode */
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, eth_dev);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to set control path alarm\n");
+			goto err_control_path_destroy;
+		}
+	}
 	rte_timer_init(&adapter->timer_wd);
 
 	adapters_found++;
 	adapter->state = ENA_ADAPTER_STATE_INIT;
 
 	return 0;
-
+err_control_path_destroy:
+	rte_free(adapter->drv_stats);
 err_rss_destroy:
 	ena_com_rss_destroy(ena_dev);
 err_delete_debug_area:
@@ -3656,9 +3698,9 @@ static int ena_process_uint_devarg(const char *key,
 {
 	struct ena_adapter *adapter = opaque;
 	char *str_end;
-	uint64_t uint_value;
+	uint64_t uint64_value;
 
-	uint_value = strtoull(value, &str_end, DECIMAL_BASE);
+	uint64_value = strtoull(value, &str_end, DECIMAL_BASE);
 	if (value == str_end) {
 		PMD_INIT_LOG(ERR,
 			"Invalid value for key '%s'. Only uint values are accepted.\n",
@@ -3667,12 +3709,12 @@ static int ena_process_uint_devarg(const char *key,
 	}
 
 	if (strcmp(key, ENA_DEVARG_MISS_TXC_TO) == 0) {
-		if (uint_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
+		if (uint64_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
 			PMD_INIT_LOG(ERR,
 				"Tx timeout too high: %" PRIu64 " sec. Maximum allowed: %d sec.\n",
-				uint_value, ENA_MAX_TX_TIMEOUT_SECONDS);
+				uint64_value, ENA_MAX_TX_TIMEOUT_SECONDS);
 			return -EINVAL;
-		} else if (uint_value == 0) {
+		} else if (uint64_value == 0) {
 			PMD_INIT_LOG(INFO,
 				"Check for missing Tx completions has been disabled.\n");
 			adapter->missing_tx_completion_to =
@@ -3680,9 +3722,27 @@ static int ena_process_uint_devarg(const char *key,
 		} else {
 			PMD_INIT_LOG(INFO,
 				"Tx packet completion timeout set to %" PRIu64 " seconds.\n",
-				uint_value);
+				uint64_value);
 			adapter->missing_tx_completion_to =
-				uint_value * rte_get_timer_hz();
+				uint64_value * rte_get_timer_hz();
+		}
+	} else if (strcmp(key, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL) == 0) {
+		if (uint64_value > ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC) {
+			PMD_INIT_LOG(ERR,
+				"Control path polling interval is too long: %" PRIu64 " msecs. "
+				"Maximum allowed: %d msecs.\n",
+				uint64_value, ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC);
+			return -EINVAL;
+		} else if (uint64_value == 0) {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to zero. Operating in "
+				"interrupt mode.\n");
+				adapter->control_path_poll_interval = 0;
+		} else {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to %" PRIu64 " msecs.\n",
+				uint64_value);
+				adapter->control_path_poll_interval = uint64_value * USEC_PER_MSEC;
 		}
 	}
 
@@ -3712,6 +3772,7 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de
 	static const char * const allowed_args[] = {
 		ENA_DEVARG_LLQ_POLICY,
 		ENA_DEVARG_MISS_TXC_TO,
+		ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
 		NULL,
 	};
 	struct rte_kvargs *kvlist;
@@ -3734,6 +3795,10 @@ static int ena_parse_devargs(struct ena_adapter *adapter, struct rte_devargs *de
 		ena_process_uint_devarg, adapter);
 	if (rc != 0)
 		goto exit;
+	rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
+		ena_process_uint_devarg, adapter);
+	if (rc != 0)
+		goto exit;
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -3954,7 +4019,8 @@ RTE_PMD_REGISTER_PCI_TABLE(net_ena, pci_id_ena_map);
 RTE_PMD_REGISTER_KMOD_DEP(net_ena, "* igb_uio | uio_pci_generic | vfio-pci");
 RTE_PMD_REGISTER_PARAM_STRING(net_ena,
 	ENA_DEVARG_LLQ_POLICY "=<0|1|2|3> "
-	ENA_DEVARG_MISS_TXC_TO "=<uint>");
+	ENA_DEVARG_MISS_TXC_TO "=<uint>"
+	ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>");
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE);
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_driver, driver, NOTICE);
 #ifdef RTE_ETHDEV_DEBUG_RX
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 6716f01ba5..85e816ae72 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -44,6 +44,8 @@
 #define ENA_MONITORED_TX_QUEUES		3
 #define ENA_DEFAULT_MISSING_COMP	256U
 
+#define ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC 1000
+
 /* While processing submitted and completed descriptors (rx and tx path
  * respectively) in a loop it is desired to:
  *  - perform batch submissions while populating submission queue
@@ -346,6 +348,9 @@ struct ena_adapter {
 
 	uint64_t memzone_cnt;
 
+	/* Time (in microseconds) of the control path queues monitoring interval */
+	uint64_t control_path_poll_interval;
+
 	/*
 	 * Helper variables for holding the information about the supported
 	 * metrics.
-- 
2.17.1


  parent reply	other threads:[~2024-03-04  9:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04  9:01 [PATCH 00/33] net/ena: v2.9.0 driver release shaibran
2024-03-04  9:01 ` [PATCH 01/33] net/ena: rework the metrics multi-process functions shaibran
2024-03-04  9:01 ` [PATCH 02/33] net/ena: report new supported link speed capabilities shaibran
2024-03-04  9:01 ` [PATCH 03/33] net/ena: update imissed stat with Rx overruns shaibran
2024-03-04  9:01 ` [PATCH 04/33] net/ena: sub-optimal configuration notifications support shaibran
2024-03-04  9:01 ` [PATCH 05/33] net/ena: fix fast mbuf free shaibran
2024-03-04  9:01 ` [PATCH 06/33] net/ena: rename base folder to hal shaibran
2024-03-04  9:01 ` [PATCH 07/33] net/ena: restructure the llq policy setting process shaibran
2024-03-04  9:01 ` [PATCH 08/33] net/ena/hal: exponential backoff exp limit shaibran
2024-03-04  9:01 ` [PATCH 09/33] net/ena/hal: add a new csum offload bit shaibran
2024-03-04  9:01 ` [PATCH 10/33] net/ena/hal: added a bus parameter to ena memcpy macro shaibran
2024-03-04  9:01 ` [PATCH 11/33] net/ena/hal: optimize Rx ring submission queue shaibran
2024-03-04  9:01 ` [PATCH 12/33] net/ena/hal: rename fields in completion descriptors shaibran
2024-03-04  9:01 ` [PATCH 13/33] net/ena/hal: use correct read once on u8 field shaibran
2024-03-04  9:01 ` [PATCH 14/33] net/ena/hal: add completion descriptor corruption check shaibran
2024-03-04  9:01 ` [PATCH 15/33] net/ena/hal: malformed Tx descriptor error reason shaibran
2024-03-04  9:01 ` [PATCH 16/33] net/ena/hal: phc feature modifications shaibran
2024-03-04  9:01 ` [PATCH 17/33] net/ena/hal: restructure interrupt handling shaibran
2024-03-04  9:01 ` [PATCH 18/33] net/ena/hal: add unlikely to error checks shaibran
2024-03-04  9:01 ` [PATCH 19/33] net/ena/hal: missing admin interrupt reset reason shaibran
2024-03-04  9:01 ` [PATCH 20/33] net/ena/hal: check for existing keep alive notification shaibran
2024-03-04  9:01 ` [PATCH 21/33] net/ena/hal: modify memory barrier comment shaibran
2024-03-04  9:01 ` [PATCH 22/33] net/ena/hal: rework Rx ring submission queue shaibran
2024-03-04  9:01 ` [PATCH 23/33] net/ena/hal: remove operating system type enum shaibran
2024-03-04  9:01 ` [PATCH 24/33] net/ena/hal: handle command abort shaibran
2024-03-04  9:01 ` [PATCH 25/33] net/ena/hal: add support for device reset request shaibran
2024-03-04  9:01 ` [PATCH 26/33] net/ena: cosmetic changes shaibran
2024-03-04  9:01 ` [PATCH 27/33] net/ena/hal: modify customer metrics memory management shaibran
2024-03-04  9:01 ` [PATCH 28/33] net/ena/hal: cosmetic changes shaibran
2024-03-04  9:01 ` [PATCH 29/33] net/ena: update device-preferred size of rings shaibran
2024-03-04  9:01 ` [PATCH 30/33] net/ena: exhaust interrupt callbacks in device close shaibran
2024-03-04  9:01 ` [PATCH 31/33] net/ena: support max large llq depth from the device shaibran
2024-03-04  9:01 ` shaibran [this message]
2024-03-04  9:01 ` [PATCH 33/33] net/ena: upgrade driver version to 2.9.0 shaibran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240304090136.861-33-shaibran@amazon.com \
    --to=shaibran@amazon.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).