DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] AF_XDP tx halt fix, IRQ pinning and unaligned chunks
@ 2019-09-19 14:15 Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 1/3] net/af_xdp: fix Tx halt when no recv packets Ciara Loftus
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ciara Loftus @ 2019-09-19 14:15 UTC (permalink / raw)
  To: dev, xiaolong.ye, kevin.laatz, bruce.richardson, ciara.loftus

This series contains 3 patches for the AF_XDP PMD.
They were originally independent patches but are now rolled into this
series together.

Patch 1: fix Tx halt when no recv packets (Xiaolong Ye)
Previous (v1): http://patches.dpdk.org/patch/59044/

Patch 2: support pinning of IRQs
Previous (v1): http://patches.dpdk.org/patch/58571/

v2:
* Update docs with example usage
* Split configure_irqs into multiple smaller functions
* Use /sys/class/net/<iface>/device/driver to get driver string
* Implement lookup mechanism for /proc/interrupts string formatting
* Use /sys/class/net/<iface> symlink to get pci addr for mlx5

Patch 3: enable support for unaligned umem chunks
Previous (v2): http://patches.dpdk.org/patch/58624/

v3:
* Fix inconsistent reserve/submit for fill and tx queue addrs
* Improve readability by extracting out functions
* Coding style fixes

The performance of the v3 was measured to be within 5% of the previous zero
copy implementation for test cases using a single PMD. For test cases using
multiple PMDs the following performance improvement was measured:

ports     nics      pinned    Δ old zc
2         2         0         95.21%
2         2         1         211.23%
2         1         0         88.36%
2         1         1         122.86%

Ciara Loftus (2):
  net/af_xdp: support pinning of IRQs
  net/af_xdp: enable support for unaligned umem chunks

Xiaolong Ye (1):
  net/af_xdp: fix Tx halt when no recv packets

 doc/guides/nics/af_xdp.rst             |  17 +-
 doc/guides/rel_notes/release_19_11.rst |  11 +
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 744 ++++++++++++++++++++++---
 3 files changed, 685 insertions(+), 87 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 1/3] net/af_xdp: fix Tx halt when no recv packets
  2019-09-19 14:15 [dpdk-dev] [PATCH 0/3] AF_XDP tx halt fix, IRQ pinning and unaligned chunks Ciara Loftus
@ 2019-09-19 14:15 ` Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 3/3] net/af_xdp: enable support for unaligned umem chunks Ciara Loftus
  2 siblings, 0 replies; 8+ messages in thread
From: Ciara Loftus @ 2019-09-19 14:15 UTC (permalink / raw)
  To: dev, xiaolong.ye, kevin.laatz, bruce.richardson, ciara.loftus

From: Xiaolong Ye <xiaolong.ye@intel.com>

The kernel only consumes Tx packets if we have some Rx traffic on specified
queue or we have called send(). So we need to issue a send() even when the
allocation fails so that kernel will start to consume packets again.

Commit 45bba02c95b0 ("net/af_xdp: support need wakeup feature") breaks
above rule by adding some condition to send, this patch fixes it while
still keeps the need_wakeup feature for Tx.

Fixes: 45bba02c95b0 ("net/af_xdp: support need wakeup feature")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Tested-by: Ciara Loftus <ciara.loftus@intel.com>
---
 drivers/net/af_xdp/rte_eth_af_xdp.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 41ed5b2af..e496e9aaa 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -286,19 +286,16 @@ kick_tx(struct pkt_tx_queue *txq)
 {
 	struct xsk_umem_info *umem = txq->pair->umem;
 
-#if defined(XDP_USE_NEED_WAKEUP)
-	if (xsk_ring_prod__needs_wakeup(&txq->tx))
-#endif
-		while (send(xsk_socket__fd(txq->pair->xsk), NULL,
-			    0, MSG_DONTWAIT) < 0) {
-			/* some thing unexpected */
-			if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
-				break;
-
-			/* pull from completion queue to leave more space */
-			if (errno == EAGAIN)
-				pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
-		}
+	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+		    0, MSG_DONTWAIT) < 0) {
+		/* some thing unexpected */
+		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+			break;
+
+		/* pull from completion queue to leave more space */
+		if (errno == EAGAIN)
+			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+	}
 	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
 }
 
@@ -367,7 +364,10 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
 
-	kick_tx(txq);
+#if defined(XDP_USE_NEED_WAKEUP)
+	if (xsk_ring_prod__needs_wakeup(&txq->tx))
+#endif
+		kick_tx(txq);
 
 	txq->stats.tx_pkts += nb_pkts;
 	txq->stats.tx_bytes += tx_bytes;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs
  2019-09-19 14:15 [dpdk-dev] [PATCH 0/3] AF_XDP tx halt fix, IRQ pinning and unaligned chunks Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 1/3] net/af_xdp: fix Tx halt when no recv packets Ciara Loftus
@ 2019-09-19 14:15 ` Ciara Loftus
  2019-09-24 14:12   ` Ye Xiaolong
  2019-09-24 16:42   ` Stephen Hemminger
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 3/3] net/af_xdp: enable support for unaligned umem chunks Ciara Loftus
  2 siblings, 2 replies; 8+ messages in thread
From: Ciara Loftus @ 2019-09-19 14:15 UTC (permalink / raw)
  To: dev, xiaolong.ye, kevin.laatz, bruce.richardson, ciara.loftus

Network devices using the AF_XDP PMD will trigger interrupts
on reception of packets. The new PMD argument 'queue_irq'
allows the user to specify a core on which to pin interrupts
for a given queue. Multiple queue_irq arguments can be specified.
For example:

  --vdev=net_af_xdp1,iface=eth0,queue_count=2,
           queue_irq=0:2,queue_irq=1:5

..will pin queue 0 interrupts to core 2 and queue 1 interrupts
to core 5.

The queue argument refers to the ethdev queue as opposed to the
netdev queue. These values are the same unless a value greater
than 0 is specified in a start_queue argument.

The drivers supported for this feature are those with support for
AF_XDP zero copy in the kernel, namely ixgbe, i40e and mlx5_core.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/af_xdp.rst             |  15 ++
 doc/guides/rel_notes/release_19_11.rst |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 357 ++++++++++++++++++++++++-
 3 files changed, 374 insertions(+), 5 deletions(-)

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index ec46f08f0..a255ba4e7 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -36,6 +36,11 @@ The following options can be provided to set up an af_xdp port in DPDK.
 *   ``start_queue`` - starting netdev queue id (optional, default 0);
 *   ``queue_count`` - total netdev queue number (optional, default 1);
 *   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
+*   ``queue_irq`` - pin queue irqs to specified core <queue:core> (optional,
+    default no pinning). The queue argument refers to the ethdev queue as
+    opposed to the netdev queue. These values are the same unless a value
+    greater than 0 is specified for start_queue. ixgbe, i40e and mlx5 drivers
+    supported;
 
 Prerequisites
 -------------
@@ -57,3 +62,13 @@ The following example will set up an af_xdp interface in DPDK:
 .. code-block:: console
 
     --vdev net_af_xdp,iface=ens786f1
+
+Pin queue IRQs
+--------------
+The following example will pin queue 0 interrupts to core 2 and queue 1
+interrupts to core 5.
+
+.. code-block:: console
+
+      --vdev=net_af_xdp1,iface=eth0,queue_count=2,
+               queue_irq=0:2,queue_irq=1:5
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 27cfbd9e3..06bf57c42 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Updated the AF_XDP PMD.**
+
+  Updated the AF_XDP PMD. The new features include:
+
+  * Support for pinning netdev queue IRQs to cores specified by the user.
+    Available for ixgbe, i40e and mlx5 drivers.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index e496e9aaa..a00eb6460 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -3,6 +3,7 @@
  */
 #include <unistd.h>
 #include <errno.h>
+#include <regex.h>
 #include <stdlib.h>
 #include <string.h>
 #include <poll.h>
@@ -10,6 +11,7 @@
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/sysinfo.h>
 #include <linux/if_ether.h>
 #include <linux/if_xdp.h>
 #include <linux/if_link.h>
@@ -17,6 +19,8 @@
 #include <linux/sockios.h>
 #include "af_xdp_deps.h"
 #include <bpf/xsk.h>
+#include <sys/stat.h>
+#include <libgen.h>
 
 #include <rte_ethdev.h>
 #include <rte_ethdev_driver.h>
@@ -116,6 +120,7 @@ struct pmd_internals {
 	int queue_cnt;
 	int max_queue_cnt;
 	int combined_queue_cnt;
+	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
 
 	int pmd_zc;
 	struct rte_ether_addr eth_addr;
@@ -128,12 +133,14 @@ struct pmd_internals {
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
 #define ETH_AF_XDP_PMD_ZC_ARG			"pmd_zero_copy"
+#define ETH_AF_XDP_QUEUE_IRQ_ARG		"queue_irq"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
 	ETH_AF_XDP_PMD_ZC_ARG,
+	ETH_AF_XDP_QUEUE_IRQ_ARG,
 	NULL
 };
 
@@ -144,6 +151,21 @@ static const struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+/* drivers supported for the queue_irq option */
+enum {I40E_DRIVER, IXGBE_DRIVER, MLX5_DRIVER, NUM_DRIVERS};
+char driver_array[NUM_DRIVERS][NAME_MAX] = {"i40e", "ixgbe", "mlx5_core"};
+
+/*
+ * function pointer template to be implemented for each driver in 'driver_array'
+ * to generate the appropriate regular expression to search for in
+ * /proc/interrupts in order to identify the IRQ number for the netdev_qid of
+ * the given interface.
+ */
+typedef
+int (*generate_driver_regex_func)(char *iface_regex_str,
+				  struct pmd_internals *internals,
+				  uint16_t netdev_qid);
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
 {
@@ -660,6 +682,283 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	return ret;
 }
 
+/** get interface's driver name to determine /proc/interrupts entry format */
+static int
+get_driver_name(struct pmd_internals *internals, char *driver)
+{
+	char driver_path[PATH_MAX];
+	struct stat s;
+	char link[PATH_MAX];
+	int len;
+
+	snprintf(driver_path, sizeof(driver_path),
+			"/sys/class/net/%s/device/driver", internals->if_name);
+	if (lstat(driver_path, &s)) {
+		AF_XDP_LOG(ERR, "Error reading %s: %s\n",
+					driver_path, strerror(errno));
+		return -errno;
+	}
+
+	/* driver_path should link to /sys/bus/pci/drivers/<driver_name> */
+	len = readlink(driver_path, link, PATH_MAX - 1);
+	if (len == -1) {
+		AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n",
+					driver_path, strerror(errno));
+		return -errno;
+	}
+
+	link[len] = '\0';
+	strlcpy(driver, basename(link), NAME_MAX);
+	if (!strncmp(driver, ".", strlen(driver))) {
+		AF_XDP_LOG(ERR, "Error getting driver name from %s: %s\n",
+					link, strerror(errno));
+		return -errno;
+	}
+
+	return 0;
+}
+
+static int
+generate_ixgbe_i40e_regex(char *iface_regex_str,
+			  struct pmd_internals *internals, uint16_t netdev_qid)
+{
+	if (snprintf(iface_regex_str, 128,
+			"-%s.*-%d", internals->if_name, netdev_qid) >= 128) {
+		AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n",
+					internals->if_name, netdev_qid);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+generate_mlx5_regex(char *iface_regex_str, struct pmd_internals *internals,
+		    uint16_t netdev_qid)
+{
+	char pci_path[PATH_MAX];
+	char *pci;
+	int ret = -1;
+	struct stat s;
+	char *link;
+	int len;
+
+	snprintf(pci_path, sizeof(pci_path),
+			"/sys/class/net/%s/device", internals->if_name);
+	if (lstat(pci_path, &s)) {
+		AF_XDP_LOG(ERR, "Error reading %s: %s\n",
+					pci_path, strerror(errno));
+		return -errno;
+	}
+
+	/* pci_path should link to a directory whose name is the pci addr */
+	link = malloc(s.st_size + 1);
+	len = readlink(pci_path, link, PATH_MAX - 1);
+	if (len == -1) {
+		AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n",
+					pci_path, strerror(errno));
+		ret = -errno;
+		goto out;
+	}
+
+	link[len] = '\0';
+	pci = basename(link);
+	if (!strncmp(pci, ".", strlen(pci))) {
+		AF_XDP_LOG(ERR, "Error getting pci from %s\n", link);
+		goto out;
+	}
+
+	if (snprintf(iface_regex_str, 128, ".*p%i@pci:%s", netdev_qid, pci) >=
+			128) {
+		AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n",
+					internals->if_name, netdev_qid);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	if (link)
+		free(link);
+
+	return ret;
+}
+
+/*
+ * array of handlers for different drivers for generating appropriate regex
+ * format for searching /proc/interrupts
+ */
+generate_driver_regex_func driver_handlers[NUM_DRIVERS] = {
+					generate_ixgbe_i40e_regex,
+					generate_ixgbe_i40e_regex,
+					generate_mlx5_regex};
+
+/*
+ * function for getting the index into driver_handlers array that corresponds
+ * to 'driver'
+ */
+static int
+get_driver_idx(char *driver)
+{
+	for (int i = 0; i < NUM_DRIVERS; i++) {
+		if (strncmp(driver, driver_array[i], strlen(driver_array[i])))
+			continue;
+		return i;
+	}
+
+	return -1;
+}
+
+/** generate /proc/interrupts search regex based on driver type */
+static int
+generate_search_regex(const char *driver, struct pmd_internals *internals,
+		      uint16_t netdev_qid, regex_t *r)
+{
+	char iface_regex_str[128];
+	int ret = -1;
+	char *driver_dup = strdup(driver);
+	int idx = get_driver_idx(driver_dup);
+
+	if (idx == -1) {
+		AF_XDP_LOG(ERR, "Error getting driver index for %s\n",
+					internals->if_name);
+		goto out;
+	}
+
+	if (driver_handlers[idx](iface_regex_str, internals, netdev_qid)) {
+		AF_XDP_LOG(ERR, "Error getting regex string for %s\n",
+					internals->if_name);
+		goto out;
+	}
+
+	if (regcomp(r, iface_regex_str, 0)) {
+		AF_XDP_LOG(ERR, "Error computing regex %s\n", iface_regex_str);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	free(driver_dup);
+	return ret;
+}
+
+/** get interrupt number associated with the given interface qid */
+static int
+get_interrupt_number(regex_t *r, int *interrupt,
+		     struct pmd_internals *internals)
+{
+	FILE *f_int_proc;
+	int found = 0;
+	char line[4096];
+	int ret = 0;
+
+	f_int_proc = fopen("/proc/interrupts", "r");
+	if (f_int_proc == NULL) {
+		AF_XDP_LOG(ERR, "Failed to open /proc/interrupts.\n");
+		return -1;
+	}
+
+	while (!feof(f_int_proc) && !found) {
+		/* Make sure to read a full line at a time */
+		if (fgets(line, sizeof(line), f_int_proc) == NULL ||
+				line[strlen(line) - 1] != '\n') {
+			AF_XDP_LOG(ERR, "Error reading from interrupts file\n");
+			ret = -1;
+			break;
+		}
+
+		/* Extract interrupt number from line */
+		if (regexec(r, line, 0, NULL, 0) == 0) {
+			*interrupt = atoi(line);
+			found = true;
+			AF_XDP_LOG(INFO, "Got interrupt %d for %s\n",
+						*interrupt, internals->if_name);
+		}
+	}
+
+	fclose(f_int_proc);
+
+	return ret;
+}
+
+/** affinitise interrupts for the given qid to the given coreid */
+static int
+set_irq_affinity(int coreid, struct pmd_internals *internals,
+		 uint16_t rx_queue_id, uint16_t netdev_qid, int interrupt)
+{
+	char bitmask[128];
+	char smp_affinity_filename[NAME_MAX];
+	FILE *f_int_smp_affinity;
+	int i;
+
+	/* Create affinity bitmask. Every 32 bits are separated by a comma */
+	snprintf(bitmask, sizeof(bitmask), "%x", 1 << (coreid % 32));
+	for (i = 0; i < coreid / 32; i++)
+		strlcat(bitmask, ",00000000", sizeof(bitmask));
+
+	/* Write the new affinity bitmask */
+	snprintf(smp_affinity_filename, sizeof(smp_affinity_filename),
+			"/proc/irq/%d/smp_affinity", interrupt);
+	f_int_smp_affinity = fopen(smp_affinity_filename, "w");
+	if (f_int_smp_affinity == NULL) {
+		AF_XDP_LOG(ERR, "Error opening %s\n", smp_affinity_filename);
+		return -1;
+	}
+	fwrite(bitmask, strlen(bitmask), 1, f_int_smp_affinity);
+	fclose(f_int_smp_affinity);
+	AF_XDP_LOG(INFO, "IRQs for %s ethdev queue %i (netdev queue %i)"
+				" affinitised to core %i\n",
+				internals->if_name, rx_queue_id,
+				netdev_qid, coreid);
+
+	return 0;
+}
+
+static void
+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
+{
+	int coreid = internals->queue_irqs[rx_queue_id];
+	char driver[NAME_MAX];
+	uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
+	regex_t r;
+	int interrupt;
+
+	if (coreid < 0)
+		return;
+
+	if (coreid > (get_nprocs() - 1)) {
+		AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
+					coreid);
+		return;
+	}
+
+	if (get_driver_name(internals, driver)) {
+		AF_XDP_LOG(ERR, "Error retrieving driver name for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (generate_search_regex(driver, internals, netdev_qid, &r)) {
+		AF_XDP_LOG(ERR, "Error generating search regex for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (get_interrupt_number(&r, &interrupt, internals)) {
+		AF_XDP_LOG(ERR, "Error getting interrupt number for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (set_irq_affinity(coreid, internals, rx_queue_id, netdev_qid,
+				interrupt)) {
+		AF_XDP_LOG(ERR, "Error setting interrupt affinity for %s\n",
+					internals->if_name);
+		return;
+	}
+}
+
 static int
 eth_rx_queue_setup(struct rte_eth_dev *dev,
 		   uint16_t rx_queue_id,
@@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 		goto err;
 	}
 
+	configure_irqs(internals, rx_queue_id);
+
 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
 	rxq->fds[0].events = POLLIN;
 
@@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused,
 	return 0;
 }
 
+/** parse queue irq argument */
+static int
+parse_queue_irq_arg(const char *key __rte_unused,
+		   const char *value, void *extra_args)
+{
+	int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT] = extra_args;
+	char *parse_str = strdup(value);
+	char delimiter[] = ":";
+	char *queue_str;
+
+	queue_str = strtok(parse_str, delimiter);
+	if (queue_str != NULL && strncmp(queue_str, value, strlen(value))) {
+		char *end;
+		long queue = strtol(queue_str, &end, 10);
+
+		if (*end == '\0' && queue >= 0 &&
+				queue < RTE_MAX_QUEUES_PER_PORT) {
+			char *core_str = strtok(NULL, delimiter);
+			long core = strtol(core_str, &end, 10);
+
+			if (*end == '\0' && core >= 0 && core < get_nprocs()) {
+				(*queue_irqs)[queue] = core;
+				free(parse_str);
+				return 0;
+			}
+		}
+	}
+
+	AF_XDP_LOG(ERR, "Invalid queue_irq argument.\n");
+	free(parse_str);
+	return -1;
+}
+
 static int
 xdp_get_channels_info(const char *if_name, int *max_queues,
 				int *combined_queues)
@@ -877,7 +1211,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt, int *pmd_zc)
+			int *queue_cnt, int *pmd_zc,
+			int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT])
 {
 	int ret;
 
@@ -903,6 +1238,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IRQ_ARG,
+				 &parse_queue_irq_arg, queue_irqs);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -940,7 +1280,8 @@ get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt, int pmd_zc)
+			int start_queue_idx, int queue_cnt, int pmd_zc,
+			int queue_irqs[RTE_MAX_QUEUES_PER_PORT])
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -957,6 +1298,8 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->queue_cnt = queue_cnt;
 	internals->pmd_zc = pmd_zc;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+	memcpy(internals->queue_irqs, queue_irqs,
+		sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
@@ -1035,6 +1378,9 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
 	int pmd_zc = 0;
+	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
+
+	memset(queue_irqs, -1, sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
 
 	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
 		rte_vdev_device_name(dev));
@@ -1062,7 +1408,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt, &pmd_zc) < 0) {
+			     &xsk_queue_cnt, &pmd_zc, &queue_irqs) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1073,7 +1419,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt, pmd_zc);
+					xsk_queue_cnt, pmd_zc, queue_irqs);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -1117,7 +1463,8 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "iface=<string> "
 			      "start_queue=<int> "
 			      "queue_count=<int> "
-			      "pmd_zero_copy=<0|1>");
+			      "pmd_zero_copy=<0|1> "
+			      "queue_irq=<int>:<int>");
 
 RTE_INIT(af_xdp_init_log)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [dpdk-dev] [PATCH 3/3] net/af_xdp: enable support for unaligned umem chunks
  2019-09-19 14:15 [dpdk-dev] [PATCH 0/3] AF_XDP tx halt fix, IRQ pinning and unaligned chunks Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 1/3] net/af_xdp: fix Tx halt when no recv packets Ciara Loftus
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs Ciara Loftus
@ 2019-09-19 14:15 ` Ciara Loftus
  2 siblings, 0 replies; 8+ messages in thread
From: Ciara Loftus @ 2019-09-19 14:15 UTC (permalink / raw)
  To: dev, xiaolong.ye, kevin.laatz, bruce.richardson, ciara.loftus

This patch enables the unaligned chunks feature for AF_XDP which allows
chunks to be placed at arbitrary places in the umem, as opposed to them
being required to be aligned to 2k. This allows for DPDK application
mempools to be mapped directly into the umem and in turn enable zero copy
transfer between umem and the PMD.

This patch replaces the zero copy via external mbuf mechanism introduced
in commit e9ff8bb71943 ("net/af_xdp: enable zero copy by external mbuf").
The pmd_zero copy vdev argument is also removed as now the PMD will
auto-detect presence of the unaligned chunks feature and enable it if so
and otherwise fall back to copy mode if not detected.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 doc/guides/nics/af_xdp.rst             |   2 +-
 doc/guides/rel_notes/release_19_11.rst |   4 +
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 375 ++++++++++++++++++++-----
 3 files changed, 305 insertions(+), 76 deletions(-)

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index a255ba4e7..40a35a822 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -35,7 +35,6 @@ The following options can be provided to set up an af_xdp port in DPDK.
 *   ``iface`` - name of the Kernel interface to attach to (required);
 *   ``start_queue`` - starting netdev queue id (optional, default 0);
 *   ``queue_count`` - total netdev queue number (optional, default 1);
-*   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
 *   ``queue_irq`` - pin queue irqs to specified core <queue:core> (optional,
     default no pinning). The queue argument refers to the ethdev queue as
     opposed to the netdev queue. These values are the same unless a value
@@ -53,6 +52,7 @@ This is a Linux-specific PMD, thus the following prerequisites apply:
    <kernel src tree>/tools/lib/bpf;
 *  A Kernel bound interface to attach to;
 *  For need_wakeup feature, it requires kernel version later than v5.3-rc1;
+*  For PMD zero copy, it requires kernel version later than v5.4-rc1;
 
 Set up an af_xdp interface
 -----------------------------
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 06bf57c42..22369107c 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -62,6 +62,8 @@ New Features
 
   * Support for pinning netdev queue IRQs to cores specified by the user.
     Available for ixgbe, i40e and mlx5 drivers.
+  * Enabled zero copy between application mempools and UMEM by enabling the
+    XDP_UMEM_UNALIGNED_CHUNKS UMEM flag.
 
 
 Removed Items
@@ -85,6 +87,8 @@ Removed Items
      "port config <port_id> rx_offload crc_strip|scatter|ipv4_cksum|udp_cksum|tcp_cksum|
      timestamp|vlan_strip|vlan_filter|vlan_extend on|off"
 
+   * Removed AF_XDP pmd_zero copy vdev argument. Support is now auto-detected.
+
 
 API Changes
 -----------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index a00eb6460..9abb9d9ae 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -62,7 +62,13 @@ static int af_xdp_logtype;
 
 #define ETH_AF_XDP_FRAME_SIZE		2048
 #define ETH_AF_XDP_NUM_BUFFERS		4096
+#ifdef XDP_UMEM_UNALIGNED_CHUNK_FLAG
+#define ETH_AF_XDP_MBUF_OVERHEAD	128 /* sizeof(struct rte_mbuf) */
+#define ETH_AF_XDP_DATA_HEADROOM \
+	(ETH_AF_XDP_MBUF_OVERHEAD + RTE_PKTMBUF_HEADROOM)
+#else
 #define ETH_AF_XDP_DATA_HEADROOM	0
+#endif
 #define ETH_AF_XDP_DFLT_NUM_DESCS	XSK_RING_CONS__DEFAULT_NUM_DESCS
 #define ETH_AF_XDP_DFLT_START_QUEUE_IDX	0
 #define ETH_AF_XDP_DFLT_QUEUE_COUNT	1
@@ -77,7 +83,8 @@ struct xsk_umem_info {
 	struct xsk_umem *umem;
 	struct rte_ring *buf_ring;
 	const struct rte_memzone *mz;
-	int pmd_zc;
+	struct rte_mempool *mb_pool;
+	void *buffer;
 };
 
 struct rx_stats {
@@ -102,10 +109,12 @@ struct pkt_rx_queue {
 struct tx_stats {
 	uint64_t tx_pkts;
 	uint64_t tx_bytes;
+	uint64_t tx_dropped;
 };
 
 struct pkt_tx_queue {
 	struct xsk_ring_prod tx;
+	struct xsk_umem_info *umem;
 
 	struct tx_stats stats;
 
@@ -122,7 +131,6 @@ struct pmd_internals {
 	int combined_queue_cnt;
 	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
 
-	int pmd_zc;
 	struct rte_ether_addr eth_addr;
 
 	struct pkt_rx_queue *rx_queues;
@@ -132,14 +140,12 @@ struct pmd_internals {
 #define ETH_AF_XDP_IFACE_ARG			"iface"
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
-#define ETH_AF_XDP_PMD_ZC_ARG			"pmd_zero_copy"
 #define ETH_AF_XDP_QUEUE_IRQ_ARG		"queue_irq"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
-	ETH_AF_XDP_PMD_ZC_ARG,
 	ETH_AF_XDP_QUEUE_IRQ_ARG,
 	NULL
 };
@@ -166,8 +172,43 @@ int (*generate_driver_regex_func)(char *iface_regex_str,
 				  struct pmd_internals *internals,
 				  uint16_t netdev_qid);
 
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
 static inline int
-reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
+reserve_fill_queue_zc(struct xsk_umem_info *umem, uint16_t reserve_size)
+{
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx;
+	uint16_t i;
+	struct rte_mbuf *bufs[reserve_size];
+
+	if (rte_pktmbuf_alloc_bulk(umem->mb_pool, bufs, reserve_size)) {
+		AF_XDP_LOG(DEBUG, "Failed to get enough buffers for fq.\n");
+		return -1;
+	}
+
+	if (unlikely(!xsk_ring_prod__reserve(fq, reserve_size, &idx))) {
+		for (i = 0; i < reserve_size; i++)
+			rte_pktmbuf_free(bufs[i]);
+		AF_XDP_LOG(DEBUG, "Failed to reserve enough fq descs.\n");
+		return -1;
+	}
+
+	for (i = 0; i < reserve_size; i++) {
+		__u64 *fq_addr;
+		uint64_t addr;
+
+		fq_addr = xsk_ring_prod__fill_addr(fq, idx++);
+		addr = (uint64_t)bufs[i] - (uint64_t)umem->buffer;
+		*fq_addr = addr;
+	}
+
+	xsk_ring_prod__submit(fq, reserve_size);
+
+	return 0;
+}
+#else
+static inline int
+reserve_fill_queue_cp(struct xsk_umem_info *umem, uint16_t reserve_size)
 {
 	struct xsk_ring_prod *fq = &umem->fq;
 	void *addrs[reserve_size];
@@ -198,30 +239,87 @@ reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
 
 	return 0;
 }
+#endif
 
-static void
-umem_buf_release_to_fq(void *addr, void *opaque)
+static inline int
+reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
 {
-	struct xsk_umem_info *umem = (struct xsk_umem_info *)opaque;
-	uint64_t umem_addr = (uint64_t)addr - umem->mz->addr_64;
-
-	rte_ring_enqueue(umem->buf_ring, (void *)umem_addr);
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+	return reserve_fill_queue_zc(umem, reserve_size);
+#else
+	return reserve_fill_queue_cp(umem, reserve_size);
+#endif
 }
 
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
 static uint16_t
-eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
 	struct pkt_rx_queue *rxq = queue;
 	struct xsk_ring_cons *rx = &rxq->rx;
 	struct xsk_umem_info *umem = rxq->umem;
 	struct xsk_ring_prod *fq = &umem->fq;
 	uint32_t idx_rx = 0;
+	unsigned long rx_bytes = 0;
+	int rcvd, i;
 	uint32_t free_thresh = fq->size >> 1;
-	int pmd_zc = umem->pmd_zc;
-	struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
-	unsigned long dropped = 0;
+
+	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
+	if (rcvd == 0) {
+#if defined(XDP_USE_NEED_WAKEUP)
+		if (xsk_ring_prod__needs_wakeup(fq))
+			(void)poll(rxq->fds, 1, 1000);
+#endif
+
+		return rcvd;
+	}
+
+	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
+
+	for (i = 0; i < rcvd; i++) {
+		const struct xdp_desc *desc;
+		uint64_t addr;
+		uint32_t len;
+		uint64_t offset;
+
+		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
+		addr = desc->addr;
+		len = desc->len;
+
+		offset = xsk_umem__extract_offset(addr);
+		addr = xsk_umem__extract_addr(addr);
+
+		bufs[i] = (struct rte_mbuf *)
+				xsk_umem__get_data(umem->buffer, addr);
+		bufs[i]->data_off = offset - sizeof(struct rte_mbuf);
+
+		rte_pktmbuf_pkt_len(bufs[i]) = len;
+		rte_pktmbuf_data_len(bufs[i]) = len;
+		rx_bytes += len;
+	}
+
+	xsk_ring_cons__release(rx, rcvd);
+
+	/* statistics */
+	rxq->stats.rx_pkts += rcvd;
+	rxq->stats.rx_bytes += rx_bytes;
+
+	return rcvd;
+}
+#else
+static uint16_t
+af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+	struct pkt_rx_queue *rxq = queue;
+	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_umem_info *umem = rxq->umem;
+	struct xsk_ring_prod *fq = &umem->fq;
+	uint32_t idx_rx = 0;
 	unsigned long rx_bytes = 0;
 	int rcvd, i;
+	uint32_t free_thresh = fq->size >> 1;
+	struct rte_mbuf *mbufs[ETH_AF_XDP_RX_BATCH_SIZE];
 
 	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_RX_BATCH_SIZE);
 
@@ -246,25 +344,14 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint64_t addr;
 		uint32_t len;
 		void *pkt;
-		uint16_t buf_len = ETH_AF_XDP_FRAME_SIZE;
-		struct rte_mbuf_ext_shared_info *shinfo;
 
 		desc = xsk_ring_cons__rx_desc(rx, idx_rx++);
 		addr = desc->addr;
 		len = desc->len;
 		pkt = xsk_umem__get_data(rxq->umem->mz->addr, addr);
 
-		if (pmd_zc) {
-			shinfo = rte_pktmbuf_ext_shinfo_init_helper(pkt,
-					&buf_len, umem_buf_release_to_fq, umem);
-
-			rte_pktmbuf_attach_extbuf(mbufs[i], pkt, 0, buf_len,
-						  shinfo);
-		} else {
-			rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *),
-							pkt, len);
-			rte_ring_enqueue(umem->buf_ring, (void *)addr);
-		}
+		rte_memcpy(rte_pktmbuf_mtod(mbufs[i], void *), pkt, len);
+		rte_ring_enqueue(umem->buf_ring, (void *)addr);
 		rte_pktmbuf_pkt_len(mbufs[i]) = len;
 		rte_pktmbuf_data_len(mbufs[i]) = len;
 		rx_bytes += len;
@@ -274,7 +361,7 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	xsk_ring_cons__release(rx, rcvd);
 
 	/* statistics */
-	rxq->stats.rx_pkts += (rcvd - dropped);
+	rxq->stats.rx_pkts += rcvd;
 	rxq->stats.rx_bytes += rx_bytes;
 
 out:
@@ -284,6 +371,17 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	return rcvd;
 }
+#endif
+
+static uint16_t
+eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+	return af_xdp_rx_zc(queue, bufs, nb_pkts);
+#else
+	return af_xdp_rx_cp(queue, bufs, nb_pkts);
+#endif
+}
 
 static void
 pull_umem_cq(struct xsk_umem_info *umem, int size)
@@ -297,7 +395,13 @@ pull_umem_cq(struct xsk_umem_info *umem, int size)
 	for (i = 0; i < n; i++) {
 		uint64_t addr;
 		addr = *xsk_ring_cons__comp_addr(cq, idx_cq++);
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+		addr = xsk_umem__extract_addr(addr);
+		rte_pktmbuf_free((struct rte_mbuf *)
+					xsk_umem__get_data(umem->buffer, addr));
+#else
 		rte_ring_enqueue(umem->buf_ring, (void *)addr);
+#endif
 	}
 
 	xsk_ring_cons__release(cq, n);
@@ -306,7 +410,7 @@ pull_umem_cq(struct xsk_umem_info *umem, int size)
 static void
 kick_tx(struct pkt_tx_queue *txq)
 {
-	struct xsk_umem_info *umem = txq->pair->umem;
+	struct xsk_umem_info *umem = txq->umem;
 
 	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
 		    0, MSG_DONTWAIT) < 0) {
@@ -318,24 +422,97 @@ kick_tx(struct pkt_tx_queue *txq)
 		if (errno == EAGAIN)
 			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
 	}
+#ifndef XDP_UMEM_UNALIGNED_CHUNK_FLAG
 	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+#endif
 }
 
-static inline bool
-in_umem_range(struct xsk_umem_info *umem, uint64_t addr)
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+static uint16_t
+af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
-	uint64_t mz_base_addr = umem->mz->addr_64;
+	struct pkt_tx_queue *txq = queue;
+	struct xsk_umem_info *umem = txq->umem;
+	struct rte_mbuf *mbuf;
+	unsigned long tx_bytes = 0;
+	int i;
+	uint32_t idx_tx;
+	uint16_t count = 0;
+	struct xdp_desc *desc;
+	uint64_t addr, offset;
 
-	return addr >= mz_base_addr && addr < mz_base_addr + umem->mz->len;
-}
+	pull_umem_cq(umem, nb_pkts);
+
+	for (i = 0; i < nb_pkts; i++) {
+		mbuf = bufs[i];
+
+		if (mbuf->pool == umem->mb_pool) {
+			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
+				kick_tx(txq);
+				goto out;
+			}
+			desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx);
+			desc->len = mbuf->pkt_len;
+			addr = (uint64_t)mbuf - (uint64_t)umem->buffer;
+			offset = rte_pktmbuf_mtod(mbuf, uint64_t) -
+					(uint64_t)mbuf;
+			offset = offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			desc->addr = addr | offset;
+			count++;
+		} else {
+			struct rte_mbuf *local_mbuf =
+					rte_pktmbuf_alloc(umem->mb_pool);
+			void *pkt;
+
+			if (local_mbuf == NULL) {
+				rte_pktmbuf_free(mbuf);
+				goto out;
+			}
 
+			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
+				kick_tx(txq);
+				goto out;
+			}
+
+			desc = xsk_ring_prod__tx_desc(&txq->tx, idx_tx);
+			desc->len = mbuf->pkt_len;
+
+			addr = (uint64_t)local_mbuf - (uint64_t)umem->buffer;
+			offset = rte_pktmbuf_mtod(local_mbuf, uint64_t) -
+					(uint64_t)local_mbuf;
+			pkt = xsk_umem__get_data(umem->buffer, addr + offset);
+			offset = offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT;
+			desc->addr = addr | offset;
+			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
+					desc->len);
+			rte_pktmbuf_free(mbuf);
+			count++;
+		}
+
+		tx_bytes += mbuf->pkt_len;
+	}
+
+#if defined(XDP_USE_NEED_WAKEUP)
+	if (xsk_ring_prod__needs_wakeup(&txq->tx))
+#endif
+		kick_tx(txq);
+
+out:
+	xsk_ring_prod__submit(&txq->tx, count);
+
+	txq->stats.tx_pkts += count;
+	txq->stats.tx_bytes += tx_bytes;
+	txq->stats.tx_dropped += nb_pkts - count;
+
+	return count;
+}
+#else
 static uint16_t
-eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
 	struct pkt_tx_queue *txq = queue;
-	struct xsk_umem_info *umem = txq->pair->umem;
+	struct xsk_umem_info *umem = txq->umem;
 	struct rte_mbuf *mbuf;
-	int pmd_zc = umem->pmd_zc;
 	void *addrs[ETH_AF_XDP_TX_BATCH_SIZE];
 	unsigned long tx_bytes = 0;
 	int i;
@@ -364,24 +541,12 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		mbuf = bufs[i];
 		desc->len = mbuf->pkt_len;
 
-		/*
-		 * We need to make sure the external mbuf address is within
-		 * current port's umem memzone range
-		 */
-		if (pmd_zc && RTE_MBUF_HAS_EXTBUF(mbuf) &&
-				in_umem_range(umem, (uint64_t)mbuf->buf_addr)) {
-			desc->addr = (uint64_t)mbuf->buf_addr -
-				umem->mz->addr_64;
-			mbuf->buf_addr = xsk_umem__get_data(umem->mz->addr,
-					(uint64_t)addrs[i]);
-		} else {
-			desc->addr = (uint64_t)addrs[i];
-			pkt = xsk_umem__get_data(umem->mz->addr,
-					desc->addr);
-			rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *),
-					desc->len);
-		}
+		desc->addr = (uint64_t)addrs[i];
+		pkt = xsk_umem__get_data(umem->mz->addr,
+					 desc->addr);
+		rte_memcpy(pkt, rte_pktmbuf_mtod(mbuf, void *), desc->len);
 		tx_bytes += mbuf->pkt_len;
+		rte_pktmbuf_free(mbuf);
 	}
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
@@ -394,11 +559,19 @@ eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	txq->stats.tx_pkts += nb_pkts;
 	txq->stats.tx_bytes += tx_bytes;
 
-	for (i = 0; i < nb_pkts; i++)
-		rte_pktmbuf_free(bufs[i]);
-
 	return nb_pkts;
 }
+#endif
+
+static uint16_t
+eth_af_xdp_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+	return af_xdp_tx_zc(queue, bufs, nb_pkts);
+#else
+	return af_xdp_tx_cp(queue, bufs, nb_pkts);
+#endif
+}
 
 static int
 eth_dev_start(struct rte_eth_dev *dev)
@@ -468,6 +641,7 @@ eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
 		stats->ipackets += stats->q_ipackets[i];
 		stats->ibytes += stats->q_ibytes[i];
 		stats->imissed += rxq->stats.rx_dropped;
+		stats->oerrors += txq->stats.tx_dropped;
 		ret = getsockopt(xsk_socket__fd(rxq->xsk), SOL_XDP,
 				XDP_STATISTICS, &xdp_stats, &optlen);
 		if (ret != 0) {
@@ -514,11 +688,16 @@ remove_xdp_program(struct pmd_internals *internals)
 static void
 xdp_umem_destroy(struct xsk_umem_info *umem)
 {
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+	rte_mempool_free(umem->mb_pool);
+	umem->mb_pool = NULL;
+#else
 	rte_memzone_free(umem->mz);
 	umem->mz = NULL;
 
 	rte_ring_free(umem->buf_ring);
 	umem->buf_ring = NULL;
+#endif
 
 	rte_free(umem);
 	umem = NULL;
@@ -568,6 +747,55 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
 	return 0;
 }
 
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+static inline uint64_t get_base_addr(struct rte_mempool *mp)
+{
+	struct rte_mempool_memhdr *memhdr;
+
+	memhdr = STAILQ_FIRST(&mp->mem_list);
+	return (uint64_t)memhdr->addr & ~(getpagesize() - 1);
+}
+
+static struct
+xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
+				  struct pkt_rx_queue *rxq)
+{
+	struct xsk_umem_info *umem;
+	int ret;
+	struct xsk_umem_config usr_config = {
+		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
+		.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG};
+	void *base_addr = NULL;
+	struct rte_mempool *mb_pool = rxq->mb_pool;
+
+	usr_config.frame_size = rte_pktmbuf_data_room_size(mb_pool) +
+					ETH_AF_XDP_MBUF_OVERHEAD +
+					mb_pool->private_data_size;
+	usr_config.frame_headroom = ETH_AF_XDP_DATA_HEADROOM +
+					mb_pool->private_data_size;
+
+	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
+	if (umem == NULL) {
+		AF_XDP_LOG(ERR, "Failed to allocate umem info");
+		return NULL;
+	}
+
+	umem->mb_pool = mb_pool;
+	base_addr = (void *)get_base_addr(mb_pool);
+
+	ret = xsk_umem__create(&umem->umem, base_addr,
+			       mb_pool->populated_size * usr_config.frame_size,
+			       &umem->fq, &umem->cq,
+			       &usr_config);
+
+	if (ret) {
+		AF_XDP_LOG(ERR, "Failed to create umem");
+		goto err;
+	}
+	umem->buffer = base_addr;
+
+#else
 static struct
 xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 				  struct pkt_rx_queue *rxq)
@@ -628,6 +856,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 	}
 	umem->mz = mz;
 
+#endif
 	return umem;
 
 err:
@@ -647,6 +876,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	rxq->umem = xdp_umem_configure(internals, rxq);
 	if (rxq->umem == NULL)
 		return -ENOMEM;
+	txq->umem = rxq->umem;
 
 	cfg.rx_size = ring_size;
 	cfg.tx_size = ring_size;
@@ -968,7 +1198,6 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 		   struct rte_mempool *mb_pool)
 {
 	struct pmd_internals *internals = dev->data->dev_private;
-	uint32_t buf_size, data_size;
 	struct pkt_rx_queue *rxq;
 	int ret;
 
@@ -976,6 +1205,10 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 
 	AF_XDP_LOG(INFO, "Set up rx queue, rx queue id: %d, xsk queue id: %d\n",
 		   rx_queue_id, rxq->xsk_queue_idx);
+
+#ifndef XDP_UMEM_UNALIGNED_CHUNK_FLAG
+	uint32_t buf_size, data_size;
+
 	/* Now get the space available for data in the mbuf */
 	buf_size = rte_pktmbuf_data_room_size(mb_pool) -
 		RTE_PKTMBUF_HEADROOM;
@@ -987,6 +1220,7 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 		ret = -ENOMEM;
 		goto err;
 	}
+#endif
 
 	rxq->mb_pool = mb_pool;
 
@@ -1001,8 +1235,6 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
 	rxq->fds[0].events = POLLIN;
 
-	rxq->umem->pmd_zc = internals->pmd_zc;
-
 	dev->data->rx_queues[rx_queue_id] = rxq;
 	return 0;
 
@@ -1211,7 +1443,7 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt, int *pmd_zc,
+			int *queue_cnt,
 			int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT])
 {
 	int ret;
@@ -1233,11 +1465,6 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		goto free_kvlist;
 	}
 
-	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_PMD_ZC_ARG,
-				 &parse_integer_arg, pmd_zc);
-	if (ret < 0)
-		goto free_kvlist;
-
 	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IRQ_ARG,
 				 &parse_queue_irq_arg, queue_irqs);
 	if (ret < 0)
@@ -1280,7 +1507,7 @@ get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt, int pmd_zc,
+			int start_queue_idx, int queue_cnt,
 			int queue_irqs[RTE_MAX_QUEUES_PER_PORT])
 {
 	const char *name = rte_vdev_device_name(dev);
@@ -1296,7 +1523,6 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 
 	internals->start_queue_idx = start_queue_idx;
 	internals->queue_cnt = queue_cnt;
-	internals->pmd_zc = pmd_zc;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
 	memcpy(internals->queue_irqs, queue_irqs,
 		sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
@@ -1354,8 +1580,9 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	/* Let rte_eth_dev_close() release the port resources. */
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_CLOSE_REMOVE;
 
-	if (internals->pmd_zc)
-		AF_XDP_LOG(INFO, "Zero copy between umem and mbuf enabled.\n");
+#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+	AF_XDP_LOG(INFO, "Zero copy between umem and mbuf enabled.\n");
+#endif
 
 	return eth_dev;
 
@@ -1377,7 +1604,6 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	int xsk_queue_cnt = ETH_AF_XDP_DFLT_QUEUE_COUNT;
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
-	int pmd_zc = 0;
 	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
 
 	memset(queue_irqs, -1, sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
@@ -1408,7 +1634,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt, &pmd_zc, &queue_irqs) < 0) {
+			     &xsk_queue_cnt, &queue_irqs) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1419,7 +1645,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt, pmd_zc, queue_irqs);
+					xsk_queue_cnt, queue_irqs);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -1463,7 +1689,6 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "iface=<string> "
 			      "start_queue=<int> "
 			      "queue_count=<int> "
-			      "pmd_zero_copy=<0|1> "
 			      "queue_irq=<int>:<int>");
 
 RTE_INIT(af_xdp_init_log)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs Ciara Loftus
@ 2019-09-24 14:12   ` Ye Xiaolong
  2019-09-27 13:21     ` Loftus, Ciara
  2019-09-24 16:42   ` Stephen Hemminger
  1 sibling, 1 reply; 8+ messages in thread
From: Ye Xiaolong @ 2019-09-24 14:12 UTC (permalink / raw)
  To: Ciara Loftus; +Cc: dev, kevin.laatz, bruce.richardson

On 09/19, Ciara Loftus wrote:

[snip]

>+/* drivers supported for the queue_irq option */
>+enum {I40E_DRIVER, IXGBE_DRIVER, MLX5_DRIVER, NUM_DRIVERS};

Minor nit, how about using below format for readability and align with other 
enum type definition in DPDK?

enum supported_driver {
	I40E_DRIVER,
	IXGBE_DRIVER,
	MLX5_DRIVER,
	NUM_DRIVERS
};

>+char driver_array[NUM_DRIVERS][NAME_MAX] = {"i40e", "ixgbe", "mlx5_core"};

[snip]

>+ * function for getting the index into driver_handlers array that corresponds
>+ * to 'driver'
>+ */
>+static int
>+get_driver_idx(char *driver)

const char *driver

>+{
>+	for (int i = 0; i < NUM_DRIVERS; i++) {
>+		if (strncmp(driver, driver_array[i], strlen(driver_array[i])))
>+			continue;
>+		return i;
>+	}
>+
>+	return -1;
>+}
>+
>+/** generate /proc/interrupts search regex based on driver type */
>+static int
>+generate_search_regex(const char *driver, struct pmd_internals *internals,
>+		      uint16_t netdev_qid, regex_t *r)

I'd prefer put *internals as the first parameter.

>+{
>+	char iface_regex_str[128];
>+	int ret = -1;
>+	char *driver_dup = strdup(driver);
>+	int idx = get_driver_idx(driver_dup);

Why not using driver directly?

>+
>+	if (idx == -1) {
>+		AF_XDP_LOG(ERR, "Error getting driver index for %s\n",
>+					internals->if_name);
>+		goto out;
>+	}
>+
>+	if (driver_handlers[idx](iface_regex_str, internals, netdev_qid)) {
>+		AF_XDP_LOG(ERR, "Error getting regex string for %s\n",
>+					internals->if_name);
>+		goto out;
>+	}

Need to check whether driver_handlers[idex] exists.

>+
>+	if (regcomp(r, iface_regex_str, 0)) {
>+		AF_XDP_LOG(ERR, "Error computing regex %s\n", iface_regex_str);
>+		goto out;
>+	}
>+
>+	ret = 0;
>+
>+out:
>+	free(driver_dup);
>+	return ret;
>+}
>+
>+/** get interrupt number associated with the given interface qid */
>+static int
>+get_interrupt_number(regex_t *r, int *interrupt,
>+		     struct pmd_internals *internals)

Better to put the input before output in parameter list, I'd prefer

get_interrupt_number(struct pmd_internals *internals,
				regex_t *r, int *interrupt)

>+{
>+	FILE *f_int_proc;
>+	int found = 0;
>+	char line[4096];
>+	int ret = 0;
>+
>+	f_int_proc = fopen("/proc/interrupts", "r");
>+	if (f_int_proc == NULL) {
>+		AF_XDP_LOG(ERR, "Failed to open /proc/interrupts.\n");
>+		return -1;
>+	}
>+
>+	while (!feof(f_int_proc) && !found) {
>+		/* Make sure to read a full line at a time */
>+		if (fgets(line, sizeof(line), f_int_proc) == NULL ||
>+				line[strlen(line) - 1] != '\n') {
>+			AF_XDP_LOG(ERR, "Error reading from interrupts file\n");
>+			ret = -1;
>+			break;
>+		}
>+
>+		/* Extract interrupt number from line */
>+		if (regexec(r, line, 0, NULL, 0) == 0) {
>+			*interrupt = atoi(line);
>+			found = true;
>+			AF_XDP_LOG(INFO, "Got interrupt %d for %s\n",
>+						*interrupt, internals->if_name);
>+		}
>+	}
>+
>+	fclose(f_int_proc);
>+
>+	return ret;
>+}
>+
>+/** affinitise interrupts for the given qid to the given coreid */
>+static int
>+set_irq_affinity(int coreid, struct pmd_internals *internals,
>+		 uint16_t rx_queue_id, uint16_t netdev_qid, int interrupt)

Prefer to put *internals in the beginning.

>+{
>+	char bitmask[128];
>+	char smp_affinity_filename[NAME_MAX];
>+	FILE *f_int_smp_affinity;
>+	int i;
>+
>+	/* Create affinity bitmask. Every 32 bits are separated by a comma */
>+	snprintf(bitmask, sizeof(bitmask), "%x", 1 << (coreid % 32));
>+	for (i = 0; i < coreid / 32; i++)
>+		strlcat(bitmask, ",00000000", sizeof(bitmask));
>+
>+	/* Write the new affinity bitmask */
>+	snprintf(smp_affinity_filename, sizeof(smp_affinity_filename),
>+			"/proc/irq/%d/smp_affinity", interrupt);
>+	f_int_smp_affinity = fopen(smp_affinity_filename, "w");
>+	if (f_int_smp_affinity == NULL) {
>+		AF_XDP_LOG(ERR, "Error opening %s\n", smp_affinity_filename);
>+		return -1;
>+	}
>+	fwrite(bitmask, strlen(bitmask), 1, f_int_smp_affinity);

Need to check the return value of fwrite, otherwise coverity may complain.

>+	fclose(f_int_smp_affinity);
>+	AF_XDP_LOG(INFO, "IRQs for %s ethdev queue %i (netdev queue %i)"
>+				" affinitised to core %i\n",
>+				internals->if_name, rx_queue_id,
>+				netdev_qid, coreid);
>+
>+	return 0;
>+}
>+
>+static void
>+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
>+{
>+	int coreid = internals->queue_irqs[rx_queue_id];
>+	char driver[NAME_MAX];
>+	uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
>+	regex_t r;
>+	int interrupt;
>+
>+	if (coreid < 0)
>+		return;
>+
>+	if (coreid > (get_nprocs() - 1)) {
>+		AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
>+					coreid);
>+		return;
>+	}

I think we can combine above 2 sanity checks together.

>+
>+	if (get_driver_name(internals, driver)) {
>+		AF_XDP_LOG(ERR, "Error retrieving driver name for %s\n",
>+					internals->if_name);
>+		return;
>+	}
>+
>+	if (generate_search_regex(driver, internals, netdev_qid, &r)) {
>+		AF_XDP_LOG(ERR, "Error generating search regex for %s\n",
>+					internals->if_name);
>+		return;
>+	}
>+
>+	if (get_interrupt_number(&r, &interrupt, internals)) {
>+		AF_XDP_LOG(ERR, "Error getting interrupt number for %s\n",
>+					internals->if_name);
>+		return;
>+	}
>+
>+	if (set_irq_affinity(coreid, internals, rx_queue_id, netdev_qid,
>+				interrupt)) {
>+		AF_XDP_LOG(ERR, "Error setting interrupt affinity for %s\n",
>+					internals->if_name);
>+		return;
>+	}
>+}
>+
> static int
> eth_rx_queue_setup(struct rte_eth_dev *dev,
> 		   uint16_t rx_queue_id,
>@@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> 		goto err;
> 	}
> 
>+	configure_irqs(internals, rx_queue_id);
>+
> 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
> 	rxq->fds[0].events = POLLIN;
> 
>@@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused,
> 	return 0;
> }
> 
>+/** parse queue irq argument */
>+static int
>+parse_queue_irq_arg(const char *key __rte_unused,
>+		   const char *value, void *extra_args)
>+{
>+	int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT] = extra_args;
>+	char *parse_str = strdup(value);
>+	char delimiter[] = ":";
>+	char *queue_str;
>+
>+	queue_str = strtok(parse_str, delimiter);
>+	if (queue_str != NULL && strncmp(queue_str, value, strlen(value))) {
>+		char *end;
>+		long queue = strtol(queue_str, &end, 10);
>+
>+		if (*end == '\0' && queue >= 0 &&
>+				queue < RTE_MAX_QUEUES_PER_PORT) {
>+			char *core_str = strtok(NULL, delimiter);
>+			long core = strtol(core_str, &end, 10);
>+
>+			if (*end == '\0' && core >= 0 && core < get_nprocs()) {
>+				(*queue_irqs)[queue] = core;
>+				free(parse_str);
>+				return 0;
>+			}
>+		}
>+	}
>+
>+	AF_XDP_LOG(ERR, "Invalid queue_irq argument.\n");
>+	free(parse_str);
>+	return -1;
>+}
>+
> static int
> xdp_get_channels_info(const char *if_name, int *max_queues,
> 				int *combined_queues)
>@@ -877,7 +1211,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
> 
> static int
> parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
>-			int *queue_cnt, int *pmd_zc)
>+			int *queue_cnt, int *pmd_zc,
>+			int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT])
> {
> 	int ret;
> 
>@@ -903,6 +1238,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> 	if (ret < 0)
> 		goto free_kvlist;
> 
>+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IRQ_ARG,
>+				 &parse_queue_irq_arg, queue_irqs);
>+	if (ret < 0)
>+		goto free_kvlist;
>+
> free_kvlist:
> 	rte_kvargs_free(kvlist);
> 	return ret;
>@@ -940,7 +1280,8 @@ get_iface_info(const char *if_name,
> 
> static struct rte_eth_dev *
> init_internals(struct rte_vdev_device *dev, const char *if_name,
>-			int start_queue_idx, int queue_cnt, int pmd_zc)
>+			int start_queue_idx, int queue_cnt, int pmd_zc,
>+			int queue_irqs[RTE_MAX_QUEUES_PER_PORT])
> {
> 	const char *name = rte_vdev_device_name(dev);
> 	const unsigned int numa_node = dev->device.numa_node;
>@@ -957,6 +1298,8 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
> 	internals->queue_cnt = queue_cnt;
> 	internals->pmd_zc = pmd_zc;
> 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
>+	memcpy(internals->queue_irqs, queue_irqs,

Use rte_memcpy instead.

Thanks,
Xiaolong
>+		sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
> 
> 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
> 				  &internals->combined_queue_cnt)) {
>@@ -1035,6 +1378,9 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> 	struct rte_eth_dev *eth_dev = NULL;
> 	const char *name;
> 	int pmd_zc = 0;
>+	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
>+
>+	memset(queue_irqs, -1, sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
> 
> 	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
> 		rte_vdev_device_name(dev));
>@@ -1062,7 +1408,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> 		dev->device.numa_node = rte_socket_id();
> 
> 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
>-			     &xsk_queue_cnt, &pmd_zc) < 0) {
>+			     &xsk_queue_cnt, &pmd_zc, &queue_irqs) < 0) {
> 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
> 		return -EINVAL;
> 	}
>@@ -1073,7 +1419,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
> 	}
> 
> 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
>-					xsk_queue_cnt, pmd_zc);
>+					xsk_queue_cnt, pmd_zc, queue_irqs);
> 	if (eth_dev == NULL) {
> 		AF_XDP_LOG(ERR, "Failed to init internals\n");
> 		return -1;
>@@ -1117,7 +1463,8 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
> 			      "iface=<string> "
> 			      "start_queue=<int> "
> 			      "queue_count=<int> "
>-			      "pmd_zero_copy=<0|1>");
>+			      "pmd_zero_copy=<0|1> "
>+			      "queue_irq=<int>:<int>");
> 
> RTE_INIT(af_xdp_init_log)
> {
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs
  2019-09-19 14:15 ` [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs Ciara Loftus
  2019-09-24 14:12   ` Ye Xiaolong
@ 2019-09-24 16:42   ` Stephen Hemminger
  1 sibling, 0 replies; 8+ messages in thread
From: Stephen Hemminger @ 2019-09-24 16:42 UTC (permalink / raw)
  To: Ciara Loftus; +Cc: dev, xiaolong.ye, kevin.laatz, bruce.richardson

On Thu, 19 Sep 2019 14:15:19 +0000
Ciara Loftus <ciara.loftus@intel.com> wrote:

> +char driver_array[NUM_DRIVERS][NAME_MAX] = {"i40e", "ixgbe", "mlx5_core"};

This only makes sense in this file.  Should be static.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs
  2019-09-24 14:12   ` Ye Xiaolong
@ 2019-09-27 13:21     ` Loftus, Ciara
  2019-09-27 14:06       ` Ye Xiaolong
  0 siblings, 1 reply; 8+ messages in thread
From: Loftus, Ciara @ 2019-09-27 13:21 UTC (permalink / raw)
  To: Ye, Xiaolong; +Cc: dev, Laatz, Kevin, Richardson, Bruce

[snip]

> >+
> >+static void
> >+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
> >+{
> >+	int coreid = internals->queue_irqs[rx_queue_id];
> >+	char driver[NAME_MAX];
> >+	uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
> >+	regex_t r;
> >+	int interrupt;
> >+
> >+	if (coreid < 0)
> >+		return;
> >+
> >+	if (coreid > (get_nprocs() - 1)) {
> >+		AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
> >+					coreid);
> >+		return;
> >+	}
> 
> I think we can combine above 2 sanity checks together.
> 

Hi Xiaolong,

Thanks for your review. I agree with all of your feedback except this one.

configure_irqs() is called for every queue. The queues with no affinity have a coreid initialized to -1. So coreid < 0 is a valid value and we should return with no error. However for the case where coreid > nprocs, this is an actual error and we should report that with a log.
What do you think?

Thanks,
Ciara

[snip]

> >@@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
> > 		goto err;
> > 	}
> >
> >+	configure_irqs(internals, rx_queue_id);
> >+
> > 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
> > 	rxq->fds[0].events = POLLIN;
> >
> >@@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused,
> > 	return 0;
> > }
> >


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs
  2019-09-27 13:21     ` Loftus, Ciara
@ 2019-09-27 14:06       ` Ye Xiaolong
  0 siblings, 0 replies; 8+ messages in thread
From: Ye Xiaolong @ 2019-09-27 14:06 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev, Laatz, Kevin, Richardson, Bruce

On 09/27, Loftus, Ciara wrote:
>[snip]
>
>> >+
>> >+static void
>> >+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
>> >+{
>> >+	int coreid = internals->queue_irqs[rx_queue_id];
>> >+	char driver[NAME_MAX];
>> >+	uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
>> >+	regex_t r;
>> >+	int interrupt;
>> >+
>> >+	if (coreid < 0)
>> >+		return;
>> >+
>> >+	if (coreid > (get_nprocs() - 1)) {
>> >+		AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
>> >+					coreid);
>> >+		return;
>> >+	}
>> 
>> I think we can combine above 2 sanity checks together.
>> 
>
>Hi Xiaolong,
>
>Thanks for your review. I agree with all of your feedback except this one.
>
>configure_irqs() is called for every queue. The queues with no affinity have a coreid initialized to -1. So coreid < 0 is a valid value and we should return with no error. However for the case where coreid > nprocs, this is an actual error and we should report that with a log.
>What do you think?

It makes sense, thanks for the explanation.

Thanks,
Xiaolong
>
>Thanks,
>Ciara
>
>[snip]
>
>> >@@ -697,6 +996,8 @@ eth_rx_queue_setup(struct rte_eth_dev *dev,
>> > 		goto err;
>> > 	}
>> >
>> >+	configure_irqs(internals, rx_queue_id);
>> >+
>> > 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
>> > 	rxq->fds[0].events = POLLIN;
>> >
>> >@@ -834,6 +1135,39 @@ parse_name_arg(const char *key __rte_unused,
>> > 	return 0;
>> > }
>> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-27 14:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 14:15 [dpdk-dev] [PATCH 0/3] AF_XDP tx halt fix, IRQ pinning and unaligned chunks Ciara Loftus
2019-09-19 14:15 ` [dpdk-dev] [PATCH 1/3] net/af_xdp: fix Tx halt when no recv packets Ciara Loftus
2019-09-19 14:15 ` [dpdk-dev] [PATCH 2/3] net/af_xdp: support pinning of IRQs Ciara Loftus
2019-09-24 14:12   ` Ye Xiaolong
2019-09-27 13:21     ` Loftus, Ciara
2019-09-27 14:06       ` Ye Xiaolong
2019-09-24 16:42   ` Stephen Hemminger
2019-09-19 14:15 ` [dpdk-dev] [PATCH 3/3] net/af_xdp: enable support for unaligned umem chunks Ciara Loftus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).