DPDK patches and discussions
 help / color / mirror / Atom feed
From: <shaibran@amazon.com>
To: <ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, Shai Brandes <shaibran@amazon.com>
Subject: [PATCH v4 30/31] net/ena: control path pure polling mode
Date: Tue, 12 Mar 2024 20:07:15 +0200	[thread overview]
Message-ID: <20240312180716.8515-31-shaibran@amazon.com> (raw)
In-Reply-To: <20240312180716.8515-1-shaibran@amazon.com>

From: Shai Brandes <shaibran@amazon.com>

This commit implements a new operation mode that enables purely
polling-based functionality, eliminating the need for interrupts in
the control path. This mode is not activated by default and can be
toggled using the "control_poll_interval" devarg. When operating in
this mode, periodic alarms are used to monitor the control queues.

A non-zero value for this devarg is mandatory for control path
functionality when binding ports to uio_pci_generic kernel module which
lacks interrupt support.

Signed-off-by: Shai Brandes <shaibran@amazon.com>
Reviewed-by: Amit Bernstein <amitbern@amazon.com>
---
 doc/guides/nics/ena.rst                |  43 ++++++---
 doc/guides/rel_notes/release_24_03.rst |   2 +
 drivers/net/ena/ena_ethdev.c           | 115 ++++++++++++++++++++-----
 drivers/net/ena/ena_ethdev.h           |   5 ++
 4 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst
index 725215b36d..f1dc6996ca 100644
--- a/doc/guides/nics/ena.rst
+++ b/doc/guides/nics/ena.rst
@@ -135,6 +135,19 @@ Runtime Configuration
      huge performance degradation. In general disabling LLQ is highly not
      recommended!**
 
+   * **control_poll_interval** (default 0)
+
+     Enable polling-based functionality of the admin queues, eliminating the
+     need for interrupts in the control-path:
+
+     0 - Disable (Admin queue will work in interrupt mode).
+
+     [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues.
+
+     **A non-zero value for this devarg is mandatory for control path functionality
+     when binding ports to uio_pci_generic kernel module which lacks interrupt support.**
+
+
 ENA Configuration Parameters
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -173,23 +186,23 @@ Prerequisites
 #. Prepare the system as recommended by DPDK suite.  This includes environment
    variables, hugepages configuration, tool-chains and configuration.
 
-#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver.
+#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver.
 
    (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature
    reduces the latency of the packets by pushing the header directly through
    the PCI to the device, before the DMA is even triggered. For proper work
-   kernel PCI driver must support write combining (WC).
+   kernel PCI driver must support write-combining (WC).
    In DPDK ``igb_uio`` it must be enabled by loading module with
    ``wc_activate=1`` flag (example below). However, mainline's vfio-pci
-   driver in kernel doesn't have WC support yet (planed to be added).
+   driver in kernel doesn't have WC support yet (planned to be added).
    If vfio-pci is used user should follow `AWS ENA PMD documentation
    <https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk/README.md>`_.
 
-#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command
-   ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1``
-   respectively.
+#. For ``igb_uio``:
+   Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1``
 
-#. For ``vfio-pci`` users only:
+#. For ``vfio-pci``:
+   Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci``
    Please make sure that ``IOMMU`` is enabled in your system,
    or use ``vfio`` driver in ``noiommu`` mode::
 
@@ -198,7 +211,17 @@ Prerequisites
    To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag
    ``CONFIG_VFIO_NOIOMMU``.
 
-#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module.
+#. For ``uio_pci_generic``:
+   Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``.
+   Make sure that the IOMMU is disabled or is in passthrough mode.
+   For example: ``modprobe uio_pci_generic intel_iommu=off``.
+
+   Note that when launching the application, the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended)
+   as ``uio_pci_generic`` lacks interrupt support. The control-path (admin queues) of the ENA require poll-mode
+   to process command completion and asynchronous notification from the device.
+   For example: ``dpdk-app -a "00:06.0,control_path_poll_interval=1000"``.
+
+#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module.
 
 At this point the system should be ready to run DPDK applications. Once the
 application runs to completion, the ENA can be detached from attached module if
@@ -207,7 +230,7 @@ necessary.
 **Rx interrupts support**
 
 ENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for
-input. Please note that it won't work with ``igb_uio``, so to use this feature,
+input. Please note that it won't work with ``igb_uio`` and ``uio_pci_generic`` so to use this feature,
 the ``vfio-pci`` should be used.
 
 ENA handles admin interrupts and AENQ notifications on separate interrupt.
@@ -218,7 +241,7 @@ will fail.
 **Note about usage on \*.metal instances**
 
 On AWS, the metal instances are supporting IOMMU for both arm64 and x86_64
-hosts.
+hosts. Note that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances.
 
 * x86_64 (e.g. c5.metal, i3.metal):
    IOMMU should be disabled by default. In that situation, the ``igb_uio`` can
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index bee2429ba0..33d094a645 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -111,6 +111,8 @@ New Features
   * Added `normal_llq_hdr` devarg that enforce normal llq header policy.
   * Added support for LLQ header size recommendation from the device.
   * Allowed large LLQ with 1024 entries when the device supports enlarged memory BAR.
+  * Added `control_poll_interval` devarg that configure control-path to work in poll-mode.
+  * Added support for binding ports to `uio_pci_generic` kernel module.
 
 * **Updated Atomic Rules' Arkville driver.**
 
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index c7c2eef92f..1707d5f2c2 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3,6 +3,7 @@
  * All rights reserved.
  */
 
+#include <rte_alarm.h>
 #include <rte_string_fns.h>
 #include <rte_errno.h>
 #include <rte_version.h>
@@ -36,6 +37,8 @@
 
 #define ENA_MIN_RING_DESC	128
 
+#define USEC_PER_MSEC		1000UL
+
 #define BITS_PER_BYTE 8
 
 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE)
@@ -90,6 +93,14 @@ struct ena_stats {
  * huge performance degradation on 6th generation AWS instances.
  */
 #define ENA_DEVARG_ENABLE_LLQ "enable_llq"
+/*
+ * Controls the period of time (in milliseconds) between two consecutive inspections of
+ * the control queues when the driver is in poll mode and not using interrupts.
+ * By default, this value is zero, indicating that the driver will not be in poll mode and will
+ * use interrupts. A non-zero value for this argument is mandatory when using uio_pci_generic
+ * driver.
+ */
+#define ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "control_path_poll_interval"
 
 /*
  * Each rte_memzone should have unique name.
@@ -266,7 +277,8 @@ static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter);
 static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter);
 static int ena_infos_get(struct rte_eth_dev *dev,
 			 struct rte_eth_dev_info *dev_info);
-static void ena_interrupt_handler_rte(void *cb_arg);
+static void ena_control_path_handler(void *cb_arg);
+static void ena_control_path_poll_handler(void *cb_arg);
 static void ena_timer_wd_callback(struct rte_timer *timer, void *arg);
 static void ena_destroy_device(struct rte_eth_dev *eth_dev);
 static int eth_ena_dev_init(struct rte_eth_dev *eth_dev);
@@ -878,10 +890,14 @@ static int ena_close(struct rte_eth_dev *dev)
 		ret = ena_stop(dev);
 	adapter->state = ENA_ADAPTER_STATE_CLOSED;
 
-	rte_intr_disable(intr_handle);
-	rc = rte_intr_callback_unregister_sync(intr_handle, ena_interrupt_handler_rte, dev);
-	if (unlikely(rc != 0))
-		PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	if (!adapter->control_path_poll_interval) {
+		rte_intr_disable(intr_handle);
+		rc = rte_intr_callback_unregister_sync(intr_handle, ena_control_path_handler, dev);
+		if (unlikely(rc != 0))
+			PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n");
+	} else {
+		rte_eal_alarm_cancel(ena_control_path_poll_handler, dev);
+	}
 
 	ena_rx_queue_release_all(dev);
 	ena_tx_queue_release_all(dev);
@@ -1885,15 +1901,33 @@ static int ena_device_init(struct ena_adapter *adapter,
 	return rc;
 }
 
-static void ena_interrupt_handler_rte(void *cb_arg)
+static void ena_control_path_handler(void *cb_arg)
 {
 	struct rte_eth_dev *dev = cb_arg;
 	struct ena_adapter *adapter = dev->data->dev_private;
 	struct ena_com_dev *ena_dev = &adapter->ena_dev;
 
-	ena_com_admin_q_comp_intr_handler(ena_dev);
-	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED))
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_com_admin_q_comp_intr_handler(ena_dev);
 		ena_com_aenq_intr_handler(ena_dev, dev);
+	}
+}
+
+static void ena_control_path_poll_handler(void *cb_arg)
+{
+	struct rte_eth_dev *dev = cb_arg;
+	struct ena_adapter *adapter = dev->data->dev_private;
+	int rc;
+
+	if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) {
+		ena_control_path_handler(cb_arg);
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, cb_arg);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to retrigger control path alarm\n");
+			ena_trigger_reset(adapter, ENA_REGS_RESET_GENERIC);
+		}
+	}
 }
 
 static void check_for_missing_keep_alive(struct ena_adapter *adapter)
@@ -2363,20 +2397,29 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
 
 	rte_spinlock_init(&adapter->admin_lock);
 
-	rte_intr_callback_register(intr_handle,
-				   ena_interrupt_handler_rte,
-				   eth_dev);
-	rte_intr_enable(intr_handle);
-	ena_com_set_admin_polling_mode(ena_dev, false);
+	if (!adapter->control_path_poll_interval) {
+		/* Control path interrupt mode */
+		rte_intr_callback_register(intr_handle, ena_control_path_handler, eth_dev);
+		rte_intr_enable(intr_handle);
+		ena_com_set_admin_polling_mode(ena_dev, false);
+	} else {
+		/* Control path polling mode */
+		rc = rte_eal_alarm_set(adapter->control_path_poll_interval,
+				       ena_control_path_poll_handler, eth_dev);
+		if (unlikely(rc != 0)) {
+			PMD_DRV_LOG(ERR, "Failed to set control path alarm\n");
+			goto err_control_path_destroy;
+		}
+	}
 	ena_com_admin_aenq_enable(ena_dev);
-
 	rte_timer_init(&adapter->timer_wd);
 
 	adapters_found++;
 	adapter->state = ENA_ADAPTER_STATE_INIT;
 
 	return 0;
-
+err_control_path_destroy:
+	rte_free(adapter->drv_stats);
 err_rss_destroy:
 	ena_com_rss_destroy(ena_dev);
 err_delete_debug_area:
@@ -3657,9 +3700,9 @@ static int ena_process_uint_devarg(const char *key,
 {
 	struct ena_adapter *adapter = opaque;
 	char *str_end;
-	uint64_t uint_value;
+	uint64_t uint64_value;
 
-	uint_value = strtoull(value, &str_end, DECIMAL_BASE);
+	uint64_value = strtoull(value, &str_end, DECIMAL_BASE);
 	if (value == str_end) {
 		PMD_INIT_LOG(ERR,
 			"Invalid value for key '%s'. Only uint values are accepted.\n",
@@ -3668,12 +3711,12 @@ static int ena_process_uint_devarg(const char *key,
 	}
 
 	if (strcmp(key, ENA_DEVARG_MISS_TXC_TO) == 0) {
-		if (uint_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
+		if (uint64_value > ENA_MAX_TX_TIMEOUT_SECONDS) {
 			PMD_INIT_LOG(ERR,
 				"Tx timeout too high: %" PRIu64 " sec. Maximum allowed: %d sec.\n",
-				uint_value, ENA_MAX_TX_TIMEOUT_SECONDS);
+				uint64_value, ENA_MAX_TX_TIMEOUT_SECONDS);
 			return -EINVAL;
-		} else if (uint_value == 0) {
+		} else if (uint64_value == 0) {
 			PMD_INIT_LOG(INFO,
 				"Check for missing Tx completions has been disabled.\n");
 			adapter->missing_tx_completion_to =
@@ -3681,9 +3724,27 @@ static int ena_process_uint_devarg(const char *key,
 		} else {
 			PMD_INIT_LOG(INFO,
 				"Tx packet completion timeout set to %" PRIu64 " seconds.\n",
-				uint_value);
+				uint64_value);
 			adapter->missing_tx_completion_to =
-				uint_value * rte_get_timer_hz();
+				uint64_value * rte_get_timer_hz();
+		}
+	} else if (strcmp(key, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL) == 0) {
+		if (uint64_value > ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC) {
+			PMD_INIT_LOG(ERR,
+				"Control path polling interval is too long: %" PRIu64 " msecs. "
+				"Maximum allowed: %d msecs.\n",
+				uint64_value, ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC);
+			return -EINVAL;
+		} else if (uint64_value == 0) {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to zero. Operating in "
+				"interrupt mode.\n");
+				adapter->control_path_poll_interval = 0;
+		} else {
+			PMD_INIT_LOG(INFO,
+				"Control path polling interval is set to %" PRIu64 " msecs.\n",
+				uint64_value);
+				adapter->control_path_poll_interval = uint64_value * USEC_PER_MSEC;
 		}
 	}
 
@@ -3728,6 +3789,7 @@ static int ena_parse_devargs(struct ena_adapter *adapter,
 		ENA_DEVARG_NORMAL_LLQ_HDR,
 		ENA_DEVARG_MISS_TXC_TO,
 		ENA_DEVARG_ENABLE_LLQ,
+		ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
 		NULL,
 	};
 	struct rte_kvargs *kvlist;
@@ -3757,6 +3819,12 @@ static int ena_parse_devargs(struct ena_adapter *adapter,
 		goto exit;
 	rc = rte_kvargs_process(kvlist, ENA_DEVARG_ENABLE_LLQ,
 		ena_process_bool_devarg, adapter);
+	if (rc != 0)
+		goto exit;
+	rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL,
+		ena_process_uint_devarg, adapter);
+	if (rc != 0)
+		goto exit;
 
 exit:
 	rte_kvargs_free(kvlist);
@@ -3979,7 +4047,8 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ena,
 	ENA_DEVARG_LARGE_LLQ_HDR "=<0|1> "
 	ENA_DEVARG_NORMAL_LLQ_HDR "=<0|1> "
 	ENA_DEVARG_ENABLE_LLQ "=<0|1> "
-	ENA_DEVARG_MISS_TXC_TO "=<uint>");
+	ENA_DEVARG_MISS_TXC_TO "=<uint>"
+	ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>");
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE);
 RTE_LOG_REGISTER_SUFFIX(ena_logtype_driver, driver, NOTICE);
 #ifdef RTE_ETHDEV_DEBUG_RX
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 7358f28caf..7513a3f6d5 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -44,6 +44,8 @@
 #define ENA_MONITORED_TX_QUEUES		3
 #define ENA_DEFAULT_MISSING_COMP	256U
 
+#define ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC 1000
+
 /* While processing submitted and completed descriptors (rx and tx path
  * respectively) in a loop it is desired to:
  *  - perform batch submissions while populating submission queue
@@ -348,6 +350,9 @@ struct ena_adapter {
 
 	uint64_t memzone_cnt;
 
+	/* Time (in microseconds) of the control path queues monitoring interval */
+	uint64_t control_path_poll_interval;
+
 	/*
 	 * Helper variables for holding the information about the supported
 	 * metrics.
-- 
2.17.1


  parent reply	other threads:[~2024-03-12 18:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12 18:06 [PATCH v4 00/31] net/ena: v2.9.0 driver release shaibran
2024-03-12 18:06 ` [PATCH v4 01/31] net/ena: rework the metrics multi-process functions shaibran
2024-03-12 18:06 ` [PATCH v4 02/31] net/ena: report new supported link speed capabilities shaibran
2024-03-12 18:06 ` [PATCH v4 03/31] net/ena: update imissed stat with Rx overruns shaibran
2024-03-13 15:59   ` Ferruh Yigit
2024-03-12 18:06 ` [PATCH v4 04/31] net/ena: sub-optimal configuration notifications support shaibran
2024-03-12 18:06 ` [PATCH v4 05/31] net/ena: fix fast mbuf free shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:06 ` [PATCH v4 06/31] net/ena: restructure the llq policy setting process shaibran
2024-03-12 18:06 ` [PATCH v4 07/31] net/ena/base: limit exponential backoff exp shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:06 ` [PATCH v4 08/31] net/ena/base: add a new csum offload bit shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:06 ` [PATCH v4 09/31] net/ena/base: optimize Rx ring submission queue shaibran
2024-03-12 18:06 ` [PATCH v4 10/31] net/ena/base: rename fields in completion descriptors shaibran
2024-03-12 18:06 ` [PATCH v4 11/31] net/ena/base: use correct read once on u8 field shaibran
2024-03-12 18:06 ` [PATCH v4 12/31] net/ena/base: add completion descriptor corruption check shaibran
2024-03-12 18:06 ` [PATCH v4 13/31] net/ena/base: malformed Tx descriptor error reason shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:06 ` [PATCH v4 14/31] net/ena/base: phc feature modifications shaibran
2024-03-12 18:07 ` [PATCH v4 15/31] net/ena/base: restructure interrupt handling shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:07 ` [PATCH v4 16/31] net/ena/base: add unlikely to error checks shaibran
2024-03-12 18:07 ` [PATCH v4 17/31] net/ena/base: missing admin interrupt reset reason shaibran
2024-03-12 18:07 ` [PATCH v4 18/31] net/ena/base: check for existing keep alive notification shaibran
2024-03-12 18:07 ` [PATCH v4 19/31] net/ena/base: modify memory barrier comment shaibran
2024-03-12 18:07 ` [PATCH v4 20/31] net/ena/base: rework Rx ring submission queue shaibran
2024-03-12 18:07 ` [PATCH v4 21/31] net/ena/base: remove operating system type enum shaibran
2024-03-12 18:07 ` [PATCH v4 22/31] net/ena/base: handle command abort shaibran
2024-03-12 18:07 ` [PATCH v4 23/31] net/ena/base: add support for device reset request shaibran
2024-03-12 18:07 ` [PATCH v4 24/31] net/ena: cosmetic changes shaibran
2024-03-13 15:58   ` Ferruh Yigit
2024-03-12 18:07 ` [PATCH v4 25/31] net/ena/base: modify customer metrics memory management shaibran
2024-03-12 18:07 ` [PATCH v4 26/31] net/ena/base: modify logs to use unsigned format specifier shaibran
2024-03-12 18:07 ` [PATCH v4 27/31] net/ena: update device-preferred size of rings shaibran
2024-03-12 18:07 ` [PATCH v4 28/31] net/ena: exhaust interrupt callbacks in device close shaibran
2024-03-12 18:07 ` [PATCH v4 29/31] net/ena: support max large llq depth from the device shaibran
2024-03-12 18:07 ` shaibran [this message]
2024-03-12 18:07 ` [PATCH v4 31/31] net/ena: upgrade driver version to 2.9.0 shaibran
2024-03-13 16:00 ` [PATCH v4 00/31] net/ena: v2.9.0 driver release Ferruh Yigit
2024-03-13 17:07   ` Brandes, Shai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240312180716.8515-31-shaibran@amazon.com \
    --to=shaibran@amazon.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).