From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 57F3843C94; Tue, 12 Mar 2024 19:10:32 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7746C42E94; Tue, 12 Mar 2024 19:08:34 +0100 (CET) Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by mails.dpdk.org (Postfix) with ESMTP id 5CD2742E9D for ; Tue, 12 Mar 2024 19:08:32 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1710266912; x=1741802912; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=ja3svov/9/rsXrw/sAw2OMYUt04O3A9qcwdHAxw/1EM=; b=s1onGYFjNiAY9X1HL73OB9jIPdonxBzG7wyzKqg4HuvGsEPO249D+5mP iuIL1+hKTMw4vWhR1LtiwNUQgvBfPB4ovPa3ee5GriKDI/vWJ8mnNari6 Z8/mlFRB9NzrwHqO1f45Fsum647hqPheRM1aXeum3Yky8/WQicHonJ7Bf E=; X-IronPort-AV: E=Sophos;i="6.07,119,1708387200"; d="scan'208";a="72647620" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Mar 2024 18:08:31 +0000 Received: from EX19MTAEUB001.ant.amazon.com [10.0.10.100:29174] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.41.28:2525] with esmtp (Farcaster) id dceba6e2-d83c-4760-bffe-f54f1597027f; Tue, 12 Mar 2024 18:08:29 +0000 (UTC) X-Farcaster-Flow-ID: dceba6e2-d83c-4760-bffe-f54f1597027f Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUB001.ant.amazon.com (10.252.51.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 12 Mar 2024 18:08:29 +0000 Received: from EX19MTAUWA001.ant.amazon.com (10.250.64.204) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.28; Tue, 12 Mar 2024 18:08:28 +0000 Received: from HFA15-CG15235BS.amazon.com (10.85.143.174) by mail-relay.amazon.com (10.250.64.204) with Microsoft SMTP Server id 15.2.1258.28 via Frontend Transport; Tue, 12 Mar 2024 18:08:27 +0000 From: To: CC: , Shai Brandes Subject: [PATCH v4 30/31] net/ena: control path pure polling mode Date: Tue, 12 Mar 2024 20:07:15 +0200 Message-ID: <20240312180716.8515-31-shaibran@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240312180716.8515-1-shaibran@amazon.com> References: <20240312180716.8515-1-shaibran@amazon.com> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Shai Brandes This commit implements a new operation mode that enables purely polling-based functionality, eliminating the need for interrupts in the control path. This mode is not activated by default and can be toggled using the "control_poll_interval" devarg. When operating in this mode, periodic alarms are used to monitor the control queues. A non-zero value for this devarg is mandatory for control path functionality when binding ports to uio_pci_generic kernel module which lacks interrupt support. Signed-off-by: Shai Brandes Reviewed-by: Amit Bernstein --- doc/guides/nics/ena.rst | 43 ++++++--- doc/guides/rel_notes/release_24_03.rst | 2 + drivers/net/ena/ena_ethdev.c | 115 ++++++++++++++++++++----- drivers/net/ena/ena_ethdev.h | 5 ++ 4 files changed, 132 insertions(+), 33 deletions(-) diff --git a/doc/guides/nics/ena.rst b/doc/guides/nics/ena.rst index 725215b36d..f1dc6996ca 100644 --- a/doc/guides/nics/ena.rst +++ b/doc/guides/nics/ena.rst @@ -135,6 +135,19 @@ Runtime Configuration huge performance degradation. In general disabling LLQ is highly not recommended!** + * **control_poll_interval** (default 0) + + Enable polling-based functionality of the admin queues, eliminating the + need for interrupts in the control-path: + + 0 - Disable (Admin queue will work in interrupt mode). + + [1..1000] - Number of milliseconds to wait between periodic inspection of the admin queues. + + **A non-zero value for this devarg is mandatory for control path functionality + when binding ports to uio_pci_generic kernel module which lacks interrupt support.** + + ENA Configuration Parameters ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -173,23 +186,23 @@ Prerequisites #. Prepare the system as recommended by DPDK suite. This includes environment variables, hugepages configuration, tool-chains and configuration. -#. ENA PMD can operate with ``vfio-pci``(*) or ``igb_uio`` driver. +#. ENA PMD can operate with ``vfio-pci`` (*), ``igb_uio``, or ``uio_pci_generic`` driver. (*) ENAv2 hardware supports Low Latency Queue v2 (LLQv2). This feature reduces the latency of the packets by pushing the header directly through the PCI to the device, before the DMA is even triggered. For proper work - kernel PCI driver must support write combining (WC). + kernel PCI driver must support write-combining (WC). In DPDK ``igb_uio`` it must be enabled by loading module with ``wc_activate=1`` flag (example below). However, mainline's vfio-pci - driver in kernel doesn't have WC support yet (planed to be added). + driver in kernel doesn't have WC support yet (planned to be added). If vfio-pci is used user should follow `AWS ENA PMD documentation `_. -#. Insert ``vfio-pci`` or ``igb_uio`` kernel module using the command - ``modprobe vfio-pci`` or ``modprobe uio; insmod igb_uio.ko wc_activate=1`` - respectively. +#. For ``igb_uio``: + Insert ``igb_uio`` kernel module using the command ``modprobe uio; insmod igb_uio.ko wc_activate=1`` -#. For ``vfio-pci`` users only: +#. For ``vfio-pci``: + Insert ``vfio-pci`` kernel module using the command ``modprobe vfio-pci`` Please make sure that ``IOMMU`` is enabled in your system, or use ``vfio`` driver in ``noiommu`` mode:: @@ -198,7 +211,17 @@ Prerequisites To use ``noiommu`` mode, the ``vfio-pci`` must be built with flag ``CONFIG_VFIO_NOIOMMU``. -#. Bind the intended ENA device to ``vfio-pci`` or ``igb_uio`` module. +#. For ``uio_pci_generic``: + Insert ``uio_pci_generic`` kernel module using the command ``modprobe uio_pci_generic``. + Make sure that the IOMMU is disabled or is in passthrough mode. + For example: ``modprobe uio_pci_generic intel_iommu=off``. + + Note that when launching the application, the ``control_poll_interval`` devarg must be used with a non-zero value (1000 is recommended) + as ``uio_pci_generic`` lacks interrupt support. The control-path (admin queues) of the ENA require poll-mode + to process command completion and asynchronous notification from the device. + For example: ``dpdk-app -a "00:06.0,control_path_poll_interval=1000"``. + +#. Bind the intended ENA device to ``vfio-pci``, ``igb_uio``, or ``uio_pci_generic`` module. At this point the system should be ready to run DPDK applications. Once the application runs to completion, the ENA can be detached from attached module if @@ -207,7 +230,7 @@ necessary. **Rx interrupts support** ENA PMD supports Rx interrupts, which can be used to wake up lcores waiting for -input. Please note that it won't work with ``igb_uio``, so to use this feature, +input. Please note that it won't work with ``igb_uio`` and ``uio_pci_generic`` so to use this feature, the ``vfio-pci`` should be used. ENA handles admin interrupts and AENQ notifications on separate interrupt. @@ -218,7 +241,7 @@ will fail. **Note about usage on \*.metal instances** On AWS, the metal instances are supporting IOMMU for both arm64 and x86_64 -hosts. +hosts. Note that ``uio_pci_generic`` lacks IOMMU support and cannot be used for metal instances. * x86_64 (e.g. c5.metal, i3.metal): IOMMU should be disabled by default. In that situation, the ``igb_uio`` can diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst index bee2429ba0..33d094a645 100644 --- a/doc/guides/rel_notes/release_24_03.rst +++ b/doc/guides/rel_notes/release_24_03.rst @@ -111,6 +111,8 @@ New Features * Added `normal_llq_hdr` devarg that enforce normal llq header policy. * Added support for LLQ header size recommendation from the device. * Allowed large LLQ with 1024 entries when the device supports enlarged memory BAR. + * Added `control_poll_interval` devarg that configure control-path to work in poll-mode. + * Added support for binding ports to `uio_pci_generic` kernel module. * **Updated Atomic Rules' Arkville driver.** diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index c7c2eef92f..1707d5f2c2 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -3,6 +3,7 @@ * All rights reserved. */ +#include #include #include #include @@ -36,6 +37,8 @@ #define ENA_MIN_RING_DESC 128 +#define USEC_PER_MSEC 1000UL + #define BITS_PER_BYTE 8 #define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) @@ -90,6 +93,14 @@ struct ena_stats { * huge performance degradation on 6th generation AWS instances. */ #define ENA_DEVARG_ENABLE_LLQ "enable_llq" +/* + * Controls the period of time (in milliseconds) between two consecutive inspections of + * the control queues when the driver is in poll mode and not using interrupts. + * By default, this value is zero, indicating that the driver will not be in poll mode and will + * use interrupts. A non-zero value for this argument is mandatory when using uio_pci_generic + * driver. + */ +#define ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "control_path_poll_interval" /* * Each rte_memzone should have unique name. @@ -266,7 +277,8 @@ static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter); static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter); static int ena_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info); -static void ena_interrupt_handler_rte(void *cb_arg); +static void ena_control_path_handler(void *cb_arg); +static void ena_control_path_poll_handler(void *cb_arg); static void ena_timer_wd_callback(struct rte_timer *timer, void *arg); static void ena_destroy_device(struct rte_eth_dev *eth_dev); static int eth_ena_dev_init(struct rte_eth_dev *eth_dev); @@ -878,10 +890,14 @@ static int ena_close(struct rte_eth_dev *dev) ret = ena_stop(dev); adapter->state = ENA_ADAPTER_STATE_CLOSED; - rte_intr_disable(intr_handle); - rc = rte_intr_callback_unregister_sync(intr_handle, ena_interrupt_handler_rte, dev); - if (unlikely(rc != 0)) - PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n"); + if (!adapter->control_path_poll_interval) { + rte_intr_disable(intr_handle); + rc = rte_intr_callback_unregister_sync(intr_handle, ena_control_path_handler, dev); + if (unlikely(rc != 0)) + PMD_INIT_LOG(ERR, "Failed to unregister interrupt handler\n"); + } else { + rte_eal_alarm_cancel(ena_control_path_poll_handler, dev); + } ena_rx_queue_release_all(dev); ena_tx_queue_release_all(dev); @@ -1885,15 +1901,33 @@ static int ena_device_init(struct ena_adapter *adapter, return rc; } -static void ena_interrupt_handler_rte(void *cb_arg) +static void ena_control_path_handler(void *cb_arg) { struct rte_eth_dev *dev = cb_arg; struct ena_adapter *adapter = dev->data->dev_private; struct ena_com_dev *ena_dev = &adapter->ena_dev; - ena_com_admin_q_comp_intr_handler(ena_dev); - if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) + if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) { + ena_com_admin_q_comp_intr_handler(ena_dev); ena_com_aenq_intr_handler(ena_dev, dev); + } +} + +static void ena_control_path_poll_handler(void *cb_arg) +{ + struct rte_eth_dev *dev = cb_arg; + struct ena_adapter *adapter = dev->data->dev_private; + int rc; + + if (likely(adapter->state != ENA_ADAPTER_STATE_CLOSED)) { + ena_control_path_handler(cb_arg); + rc = rte_eal_alarm_set(adapter->control_path_poll_interval, + ena_control_path_poll_handler, cb_arg); + if (unlikely(rc != 0)) { + PMD_DRV_LOG(ERR, "Failed to retrigger control path alarm\n"); + ena_trigger_reset(adapter, ENA_REGS_RESET_GENERIC); + } + } } static void check_for_missing_keep_alive(struct ena_adapter *adapter) @@ -2363,20 +2397,29 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev) rte_spinlock_init(&adapter->admin_lock); - rte_intr_callback_register(intr_handle, - ena_interrupt_handler_rte, - eth_dev); - rte_intr_enable(intr_handle); - ena_com_set_admin_polling_mode(ena_dev, false); + if (!adapter->control_path_poll_interval) { + /* Control path interrupt mode */ + rte_intr_callback_register(intr_handle, ena_control_path_handler, eth_dev); + rte_intr_enable(intr_handle); + ena_com_set_admin_polling_mode(ena_dev, false); + } else { + /* Control path polling mode */ + rc = rte_eal_alarm_set(adapter->control_path_poll_interval, + ena_control_path_poll_handler, eth_dev); + if (unlikely(rc != 0)) { + PMD_DRV_LOG(ERR, "Failed to set control path alarm\n"); + goto err_control_path_destroy; + } + } ena_com_admin_aenq_enable(ena_dev); - rte_timer_init(&adapter->timer_wd); adapters_found++; adapter->state = ENA_ADAPTER_STATE_INIT; return 0; - +err_control_path_destroy: + rte_free(adapter->drv_stats); err_rss_destroy: ena_com_rss_destroy(ena_dev); err_delete_debug_area: @@ -3657,9 +3700,9 @@ static int ena_process_uint_devarg(const char *key, { struct ena_adapter *adapter = opaque; char *str_end; - uint64_t uint_value; + uint64_t uint64_value; - uint_value = strtoull(value, &str_end, DECIMAL_BASE); + uint64_value = strtoull(value, &str_end, DECIMAL_BASE); if (value == str_end) { PMD_INIT_LOG(ERR, "Invalid value for key '%s'. Only uint values are accepted.\n", @@ -3668,12 +3711,12 @@ static int ena_process_uint_devarg(const char *key, } if (strcmp(key, ENA_DEVARG_MISS_TXC_TO) == 0) { - if (uint_value > ENA_MAX_TX_TIMEOUT_SECONDS) { + if (uint64_value > ENA_MAX_TX_TIMEOUT_SECONDS) { PMD_INIT_LOG(ERR, "Tx timeout too high: %" PRIu64 " sec. Maximum allowed: %d sec.\n", - uint_value, ENA_MAX_TX_TIMEOUT_SECONDS); + uint64_value, ENA_MAX_TX_TIMEOUT_SECONDS); return -EINVAL; - } else if (uint_value == 0) { + } else if (uint64_value == 0) { PMD_INIT_LOG(INFO, "Check for missing Tx completions has been disabled.\n"); adapter->missing_tx_completion_to = @@ -3681,9 +3724,27 @@ static int ena_process_uint_devarg(const char *key, } else { PMD_INIT_LOG(INFO, "Tx packet completion timeout set to %" PRIu64 " seconds.\n", - uint_value); + uint64_value); adapter->missing_tx_completion_to = - uint_value * rte_get_timer_hz(); + uint64_value * rte_get_timer_hz(); + } + } else if (strcmp(key, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL) == 0) { + if (uint64_value > ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC) { + PMD_INIT_LOG(ERR, + "Control path polling interval is too long: %" PRIu64 " msecs. " + "Maximum allowed: %d msecs.\n", + uint64_value, ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC); + return -EINVAL; + } else if (uint64_value == 0) { + PMD_INIT_LOG(INFO, + "Control path polling interval is set to zero. Operating in " + "interrupt mode.\n"); + adapter->control_path_poll_interval = 0; + } else { + PMD_INIT_LOG(INFO, + "Control path polling interval is set to %" PRIu64 " msecs.\n", + uint64_value); + adapter->control_path_poll_interval = uint64_value * USEC_PER_MSEC; } } @@ -3728,6 +3789,7 @@ static int ena_parse_devargs(struct ena_adapter *adapter, ENA_DEVARG_NORMAL_LLQ_HDR, ENA_DEVARG_MISS_TXC_TO, ENA_DEVARG_ENABLE_LLQ, + ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL, NULL, }; struct rte_kvargs *kvlist; @@ -3757,6 +3819,12 @@ static int ena_parse_devargs(struct ena_adapter *adapter, goto exit; rc = rte_kvargs_process(kvlist, ENA_DEVARG_ENABLE_LLQ, ena_process_bool_devarg, adapter); + if (rc != 0) + goto exit; + rc = rte_kvargs_process(kvlist, ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL, + ena_process_uint_devarg, adapter); + if (rc != 0) + goto exit; exit: rte_kvargs_free(kvlist); @@ -3979,7 +4047,8 @@ RTE_PMD_REGISTER_PARAM_STRING(net_ena, ENA_DEVARG_LARGE_LLQ_HDR "=<0|1> " ENA_DEVARG_NORMAL_LLQ_HDR "=<0|1> " ENA_DEVARG_ENABLE_LLQ "=<0|1> " - ENA_DEVARG_MISS_TXC_TO "="); + ENA_DEVARG_MISS_TXC_TO "=" + ENA_DEVARG_CONTROL_PATH_POLL_INTERVAL "=<0-1000>"); RTE_LOG_REGISTER_SUFFIX(ena_logtype_init, init, NOTICE); RTE_LOG_REGISTER_SUFFIX(ena_logtype_driver, driver, NOTICE); #ifdef RTE_ETHDEV_DEBUG_RX diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h index 7358f28caf..7513a3f6d5 100644 --- a/drivers/net/ena/ena_ethdev.h +++ b/drivers/net/ena/ena_ethdev.h @@ -44,6 +44,8 @@ #define ENA_MONITORED_TX_QUEUES 3 #define ENA_DEFAULT_MISSING_COMP 256U +#define ENA_MAX_CONTROL_PATH_POLL_INTERVAL_MSEC 1000 + /* While processing submitted and completed descriptors (rx and tx path * respectively) in a loop it is desired to: * - perform batch submissions while populating submission queue @@ -348,6 +350,9 @@ struct ena_adapter { uint64_t memzone_cnt; + /* Time (in microseconds) of the control path queues monitoring interval */ + uint64_t control_path_poll_interval; + /* * Helper variables for holding the information about the supported * metrics. -- 2.17.1