From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E0A08A2EFC for ; Thu, 19 Sep 2019 16:17:18 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AAA971F05A; Thu, 19 Sep 2019 16:17:05 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id A61C41F04C for ; Thu, 19 Sep 2019 16:17:02 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Sep 2019 07:17:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,523,1559545200"; d="scan'208";a="271228793" Received: from silpixa00399839.ir.intel.com (HELO localhost.localdomain) ([10.237.223.65]) by orsmga001.jf.intel.com with ESMTP; 19 Sep 2019 07:16:59 -0700 From: Ciara Loftus To: dev@dpdk.org, xiaolong.ye@intel.com, kevin.laatz@intel.com, bruce.richardson@intel.com, ciara.loftus@intel.com Date: Thu, 19 Sep 2019 14:15:19 +0000 Message-Id: <20190919141520.4227-3-ciara.loftus@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190919141520.4227-1-ciara.loftus@intel.com> References: <20190919141520.4227-1-ciara.loftus@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Network devices using the AF_XDP PMD will trigger interrupts on reception of packets. The new PMD argument 'queue_irq' allows the user to specify a core on which to pin interrupts for a given queue. Multiple queue_irq arguments can be specified. For example: --vdev=net_af_xdp1,iface=eth0,queue_count=2, queue_irq=0:2,queue_irq=1:5 ..will pin queue 0 interrupts to core 2 and queue 1 interrupts to core 5. The queue argument refers to the ethdev queue as opposed to the netdev queue. These values are the same unless a value greater than 0 is specified in a start_queue argument. The drivers supported for this feature are those with support for AF_XDP zero copy in the kernel, namely ixgbe, i40e and mlx5_core. Signed-off-by: Ciara Loftus Signed-off-by: Bruce Richardson --- doc/guides/nics/af_xdp.rst | 15 ++ doc/guides/rel_notes/release_19_11.rst | 7 + drivers/net/af_xdp/rte_eth_af_xdp.c | 357 ++++++++++++++++++++++++- 3 files changed, 374 insertions(+), 5 deletions(-) diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst index ec46f08f0..a255ba4e7 100644 --- a/doc/guides/nics/af_xdp.rst +++ b/doc/guides/nics/af_xdp.rst @@ -36,6 +36,11 @@ The following options can be provided to set up an af_xdp port in DPDK. * ``start_queue`` - starting netdev queue id (optional, default 0); * ``queue_count`` - total netdev queue number (optional, default 1); * ``pmd_zero_copy`` - enable zero copy or not (optional, default 0); +* ``queue_irq`` - pin queue irqs to specified core (optional, + default no pinning). The queue argument refers to the ethdev queue as + opposed to the netdev queue. These values are the same unless a value + greater than 0 is specified for start_queue. ixgbe, i40e and mlx5 drivers + supported; Prerequisites ------------- @@ -57,3 +62,13 @@ The following example will set up an af_xdp interface in DPDK: .. code-block:: console --vdev net_af_xdp,iface=ens786f1 + +Pin queue IRQs +-------------- +The following example will pin queue 0 interrupts to core 2 and queue 1 +interrupts to core 5. + +.. code-block:: console + + --vdev=net_af_xdp1,iface=eth0,queue_count=2, + queue_irq=0:2,queue_irq=1:5 diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst index 27cfbd9e3..06bf57c42 100644 --- a/doc/guides/rel_notes/release_19_11.rst +++ b/doc/guides/rel_notes/release_19_11.rst @@ -56,6 +56,13 @@ New Features Also, make sure to start the actual text at the margin. ========================================================= +* **Updated the AF_XDP PMD.** + + Updated the AF_XDP PMD. The new features include: + + * Support for pinning netdev queue IRQs to cores specified by the user. + Available for ixgbe, i40e and mlx5 drivers. + Removed Items ------------- diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c index e496e9aaa..a00eb6460 100644 --- a/drivers/net/af_xdp/rte_eth_af_xdp.c +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c @@ -3,6 +3,7 @@ */ #include #include +#include #include #include #include @@ -10,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +19,8 @@ #include #include "af_xdp_deps.h" #include +#include +#include #include #include @@ -116,6 +120,7 @@ struct pmd_internals { int queue_cnt; int max_queue_cnt; int combined_queue_cnt; + int queue_irqs[RTE_MAX_QUEUES_PER_PORT]; int pmd_zc; struct rte_ether_addr eth_addr; @@ -128,12 +133,14 @@ struct pmd_internals { #define ETH_AF_XDP_START_QUEUE_ARG "start_queue" #define ETH_AF_XDP_QUEUE_COUNT_ARG "queue_count" #define ETH_AF_XDP_PMD_ZC_ARG "pmd_zero_copy" +#define ETH_AF_XDP_QUEUE_IRQ_ARG "queue_irq" static const char * const valid_arguments[] = { ETH_AF_XDP_IFACE_ARG, ETH_AF_XDP_START_QUEUE_ARG, ETH_AF_XDP_QUEUE_COUNT_ARG, ETH_AF_XDP_PMD_ZC_ARG, + ETH_AF_XDP_QUEUE_IRQ_ARG, NULL }; @@ -144,6 +151,21 @@ static const struct rte_eth_link pmd_link = { .link_autoneg = ETH_LINK_AUTONEG }; +/* drivers supported for the queue_irq option */ +enum {I40E_DRIVER, IXGBE_DRIVER, MLX5_DRIVER, NUM_DRIVERS}; +char driver_array[NUM_DRIVERS][NAME_MAX] = {"i40e", "ixgbe", "mlx5_core"}; + +/* + * function pointer template to be implemented for each driver in 'driver_array' + * to generate the appropriate regular expression to search for in + * /proc/interrupts in order to identify the IRQ number for the netdev_qid of + * the given interface. + */ +typedef +int (*generate_driver_regex_func)(char *iface_regex_str, + struct pmd_internals *internals, + uint16_t netdev_qid); + static inline int reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size) { @@ -660,6 +682,283 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq, return ret; } +/** get interface's driver name to determine /proc/interrupts entry format */ +static int +get_driver_name(struct pmd_internals *internals, char *driver) +{ + char driver_path[PATH_MAX]; + struct stat s; + char link[PATH_MAX]; + int len; + + snprintf(driver_path, sizeof(driver_path), + "/sys/class/net/%s/device/driver", internals->if_name); + if (lstat(driver_path, &s)) { + AF_XDP_LOG(ERR, "Error reading %s: %s\n", + driver_path, strerror(errno)); + return -errno; + } + + /* driver_path should link to /sys/bus/pci/drivers/ */ + len = readlink(driver_path, link, PATH_MAX - 1); + if (len == -1) { + AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n", + driver_path, strerror(errno)); + return -errno; + } + + link[len] = '\0'; + strlcpy(driver, basename(link), NAME_MAX); + if (!strncmp(driver, ".", strlen(driver))) { + AF_XDP_LOG(ERR, "Error getting driver name from %s: %s\n", + link, strerror(errno)); + return -errno; + } + + return 0; +} + +static int +generate_ixgbe_i40e_regex(char *iface_regex_str, + struct pmd_internals *internals, uint16_t netdev_qid) +{ + if (snprintf(iface_regex_str, 128, + "-%s.*-%d", internals->if_name, netdev_qid) >= 128) { + AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n", + internals->if_name, netdev_qid); + return -1; + } + + return 0; +} + +static int +generate_mlx5_regex(char *iface_regex_str, struct pmd_internals *internals, + uint16_t netdev_qid) +{ + char pci_path[PATH_MAX]; + char *pci; + int ret = -1; + struct stat s; + char *link; + int len; + + snprintf(pci_path, sizeof(pci_path), + "/sys/class/net/%s/device", internals->if_name); + if (lstat(pci_path, &s)) { + AF_XDP_LOG(ERR, "Error reading %s: %s\n", + pci_path, strerror(errno)); + return -errno; + } + + /* pci_path should link to a directory whose name is the pci addr */ + link = malloc(s.st_size + 1); + len = readlink(pci_path, link, PATH_MAX - 1); + if (len == -1) { + AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n", + pci_path, strerror(errno)); + ret = -errno; + goto out; + } + + link[len] = '\0'; + pci = basename(link); + if (!strncmp(pci, ".", strlen(pci))) { + AF_XDP_LOG(ERR, "Error getting pci from %s\n", link); + goto out; + } + + if (snprintf(iface_regex_str, 128, ".*p%i@pci:%s", netdev_qid, pci) >= + 128) { + AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n", + internals->if_name, netdev_qid); + goto out; + } + + ret = 0; + +out: + if (link) + free(link); + + return ret; +} + +/* + * array of handlers for different drivers for generating appropriate regex + * format for searching /proc/interrupts + */ +generate_driver_regex_func driver_handlers[NUM_DRIVERS] = { + generate_ixgbe_i40e_regex, + generate_ixgbe_i40e_regex, + generate_mlx5_regex}; + +/* + * function for getting the index into driver_handlers array that corresponds + * to 'driver' + */ +static int +get_driver_idx(char *driver) +{ + for (int i = 0; i < NUM_DRIVERS; i++) { + if (strncmp(driver, driver_array[i], strlen(driver_array[i]))) + continue; + return i; + } + + return -1; +} + +/** generate /proc/interrupts search regex based on driver type */ +static int +generate_search_regex(const char *driver, struct pmd_internals *internals, + uint16_t netdev_qid, regex_t *r) +{ + char iface_regex_str[128]; + int ret = -1; + char *driver_dup = strdup(driver); + int idx = get_driver_idx(driver_dup); + + if (idx == -1) { + AF_XDP_LOG(ERR, "Error getting driver index for %s\n", + internals->if_name); + goto out; + } + + if (driver_handlers[idx](iface_regex_str, internals, netdev_qid)) { + AF_XDP_LOG(ERR, "Error getting regex string for %s\n", + internals->if_name); + goto out; + } + + if (regcomp(r, iface_regex_str, 0)) { + AF_XDP_LOG(ERR, "Error computing regex %s\n", iface_regex_str); + goto out; + } + + ret = 0; + +out: + free(driver_dup); + return ret; +} + +/** get interrupt number associated with the given interface qid */ +static int +get_interrupt_number(regex_t *r, int *interrupt, + struct pmd_internals *internals) +{ + FILE *f_int_proc; + int found = 0; + char line[4096]; + int ret = 0; + + f_int_proc = fopen("/proc/interrupts", "r"); + if (f_int_proc == NULL) { + AF_XDP_LOG(ERR, "Failed to open /proc/interrupts.\n"); + return -1; + } + + while (!feof(f_int_proc) && !found) { + /* Make sure to read a full line at a time */ + if (fgets(line, sizeof(line), f_int_proc) == NULL || + line[strlen(line) - 1] != '\n') { + AF_XDP_LOG(ERR, "Error reading from interrupts file\n"); + ret = -1; + break; + } + + /* Extract interrupt number from line */ + if (regexec(r, line, 0, NULL, 0) == 0) { + *interrupt = atoi(line); + found = true; + AF_XDP_LOG(INFO, "Got interrupt %d for %s\n", + *interrupt, internals->if_name); + } + } + + fclose(f_int_proc); + + return ret; +} + +/** affinitise interrupts for the given qid to the given coreid */ +static int +set_irq_affinity(int coreid, struct pmd_internals *internals, + uint16_t rx_queue_id, uint16_t netdev_qid, int interrupt) +{ + char bitmask[128]; + char smp_affinity_filename[NAME_MAX]; + FILE *f_int_smp_affinity; + int i; + + /* Create affinity bitmask. Every 32 bits are separated by a comma */ + snprintf(bitmask, sizeof(bitmask), "%x", 1 << (coreid % 32)); + for (i = 0; i < coreid / 32; i++) + strlcat(bitmask, ",00000000", sizeof(bitmask)); + + /* Write the new affinity bitmask */ + snprintf(smp_affinity_filename, sizeof(smp_affinity_filename), + "/proc/irq/%d/smp_affinity", interrupt); + f_int_smp_affinity = fopen(smp_affinity_filename, "w"); + if (f_int_smp_affinity == NULL) { + AF_XDP_LOG(ERR, "Error opening %s\n", smp_affinity_filename); + return -1; + } + fwrite(bitmask, strlen(bitmask), 1, f_int_smp_affinity); + fclose(f_int_smp_affinity); + AF_XDP_LOG(INFO, "IRQs for %s ethdev queue %i (netdev queue %i)" + " affinitised to core %i\n", + internals->if_name, rx_queue_id, + netdev_qid, coreid); + + return 0; +} + +static void +configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id) +{ + int coreid = internals->queue_irqs[rx_queue_id]; + char driver[NAME_MAX]; + uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx; + regex_t r; + int interrupt; + + if (coreid < 0) + return; + + if (coreid > (get_nprocs() - 1)) { + AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n", + coreid); + return; + } + + if (get_driver_name(internals, driver)) { + AF_XDP_LOG(ERR, "Error retrieving driver name for %s\n", + internals->if_name); + return; + } + + if (generate_search_regex(driver, internals, netdev_qid, &r)) { + AF_XDP_LOG(ERR, "Error generating search regex for %s\n", + internals->if_name); + return; + } + + if (get_interrupt_number(&r, &interrupt, internals)) { + AF_XDP_LOG(ERR, "Error getting interrupt number for %s\n", + internals->if_name); + return; + } + + if (set_irq_affinity(coreid, internals, rx_queue_id, netdev_qid, + interrupt)) { + AF_XDP_LOG(ERR, "Error setting interrupt affinity for %s\n", + internals->if_name); + return; + } +} + static int eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id, @@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev, goto err; } + configure_irqs(internals, rx_queue_id); + rxq->fds[0].fd = xsk_socket__fd(rxq->xsk); rxq->fds[0].events = POLLIN; @@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused, return 0; } +/** parse queue irq argument */ +static int +parse_queue_irq_arg(const char *key __rte_unused, + const char *value, void *extra_args) +{ + int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT] = extra_args; + char *parse_str = strdup(value); + char delimiter[] = ":"; + char *queue_str; + + queue_str = strtok(parse_str, delimiter); + if (queue_str != NULL && strncmp(queue_str, value, strlen(value))) { + char *end; + long queue = strtol(queue_str, &end, 10); + + if (*end == '\0' && queue >= 0 && + queue < RTE_MAX_QUEUES_PER_PORT) { + char *core_str = strtok(NULL, delimiter); + long core = strtol(core_str, &end, 10); + + if (*end == '\0' && core >= 0 && core < get_nprocs()) { + (*queue_irqs)[queue] = core; + free(parse_str); + return 0; + } + } + } + + AF_XDP_LOG(ERR, "Invalid queue_irq argument.\n"); + free(parse_str); + return -1; +} + static int xdp_get_channels_info(const char *if_name, int *max_queues, int *combined_queues) @@ -877,7 +1211,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues, static int parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue, - int *queue_cnt, int *pmd_zc) + int *queue_cnt, int *pmd_zc, + int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT]) { int ret; @@ -903,6 +1238,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue, if (ret < 0) goto free_kvlist; + ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IRQ_ARG, + &parse_queue_irq_arg, queue_irqs); + if (ret < 0) + goto free_kvlist; + free_kvlist: rte_kvargs_free(kvlist); return ret; @@ -940,7 +1280,8 @@ get_iface_info(const char *if_name, static struct rte_eth_dev * init_internals(struct rte_vdev_device *dev, const char *if_name, - int start_queue_idx, int queue_cnt, int pmd_zc) + int start_queue_idx, int queue_cnt, int pmd_zc, + int queue_irqs[RTE_MAX_QUEUES_PER_PORT]) { const char *name = rte_vdev_device_name(dev); const unsigned int numa_node = dev->device.numa_node; @@ -957,6 +1298,8 @@ init_internals(struct rte_vdev_device *dev, const char *if_name, internals->queue_cnt = queue_cnt; internals->pmd_zc = pmd_zc; strlcpy(internals->if_name, if_name, IFNAMSIZ); + memcpy(internals->queue_irqs, queue_irqs, + sizeof(int) * RTE_MAX_QUEUES_PER_PORT); if (xdp_get_channels_info(if_name, &internals->max_queue_cnt, &internals->combined_queue_cnt)) { @@ -1035,6 +1378,9 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev) struct rte_eth_dev *eth_dev = NULL; const char *name; int pmd_zc = 0; + int queue_irqs[RTE_MAX_QUEUES_PER_PORT]; + + memset(queue_irqs, -1, sizeof(int) * RTE_MAX_QUEUES_PER_PORT); AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n", rte_vdev_device_name(dev)); @@ -1062,7 +1408,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev) dev->device.numa_node = rte_socket_id(); if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx, - &xsk_queue_cnt, &pmd_zc) < 0) { + &xsk_queue_cnt, &pmd_zc, &queue_irqs) < 0) { AF_XDP_LOG(ERR, "Invalid kvargs value\n"); return -EINVAL; } @@ -1073,7 +1419,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev) } eth_dev = init_internals(dev, if_name, xsk_start_queue_idx, - xsk_queue_cnt, pmd_zc); + xsk_queue_cnt, pmd_zc, queue_irqs); if (eth_dev == NULL) { AF_XDP_LOG(ERR, "Failed to init internals\n"); return -1; @@ -1117,7 +1463,8 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp, "iface= " "start_queue= " "queue_count= " - "pmd_zero_copy=<0|1>"); + "pmd_zero_copy=<0|1> " + "queue_irq=:"); RTE_INIT(af_xdp_init_log) { -- 2.17.1