DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: dev@dpdk.org
Cc: Stephen Hemminger <stephen@networkplumber.org>
Subject: [PATCH v3 1/9] net/ioring: introduce new driver
Date: Tue, 11 Mar 2025 16:51:19 -0700	[thread overview]
Message-ID: <20250311235424.172440-2-stephen@networkplumber.org> (raw)
In-Reply-To: <20250311235424.172440-1-stephen@networkplumber.org>

Add basic driver initialization, documentation, and device creation
and basic documentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/nics/features/ioring.ini |   9 +
 doc/guides/nics/index.rst           |   1 +
 doc/guides/nics/ioring.rst          |  66 +++++++
 drivers/net/ioring/meson.build      |  15 ++
 drivers/net/ioring/rte_eth_ioring.c | 262 ++++++++++++++++++++++++++++
 drivers/net/meson.build             |   1 +
 6 files changed, 354 insertions(+)
 create mode 100644 doc/guides/nics/features/ioring.ini
 create mode 100644 doc/guides/nics/ioring.rst
 create mode 100644 drivers/net/ioring/meson.build
 create mode 100644 drivers/net/ioring/rte_eth_ioring.c

diff --git a/doc/guides/nics/features/ioring.ini b/doc/guides/nics/features/ioring.ini
new file mode 100644
index 0000000000..c4c57caaa4
--- /dev/null
+++ b/doc/guides/nics/features/ioring.ini
@@ -0,0 +1,9 @@
+;
+; Supported features of the 'ioring' driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Linux		     = Y
+x86-64               = Y
+Usage doc            = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 10a2eca3b0..afb6bf289b 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -41,6 +41,7 @@ Network Interface Controller Drivers
     igc
     intel_vf
     ionic
+    ioring
     ipn3ke
     ixgbe
     mana
diff --git a/doc/guides/nics/ioring.rst b/doc/guides/nics/ioring.rst
new file mode 100644
index 0000000000..7d37a6bb37
--- /dev/null
+++ b/doc/guides/nics/ioring.rst
@@ -0,0 +1,66 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+
+IORING Poll Mode Driver
+=======================
+
+The IORING Poll Mode Driver (PMD) is a simplified and improved version of the TAP PMD. It is a
+virtual device that uses Linux ioring to inject packets into the Linux kernel.
+It is useful when writing DPDK applications, that need to support interaction
+with the Linux TCP/IP stack for control plane or tunneling.
+
+The IORING PMD creates a kernel network device that can be
+managed by standard tools such as ``ip`` and ``ethtool`` commands.
+
+From a DPDK application, the IORING device looks like a DPDK ethdev.
+It supports the standard DPDK API's to query for information, statistics,
+and send/receive packets.
+
+Requirements
+------------
+
+The IORING requires the io_uring library (liburing) which provides the helper
+functions to manage io_uring with the kernel.
+
+For more info on io_uring, please see:
+
+https://kernel.dk/io_uring.pdf
+
+
+Arguments
+---------
+
+IORING devices are created with the command line ``--vdev=net_ioring0`` option.
+This option may be specified more than once by repeating with a different ``net_ioringX`` device.
+
+By default, the Linux interfaces are named ``enio0``, ``enio1``, etc.
+The interface name can be specified by adding the ``iface=foo0``, for example::
+
+   --vdev=net_ioring0,iface=io0 --vdev=net_ioring1,iface=io1, ...
+
+The PMD inherits the MAC address assigned by the kernel which will be
+a locally assigned random Ethernet address.
+
+Normally, when the DPDK application exits, the IORING device is removed.
+But this behavior can be overridden by the use of the persist flag, example::
+
+  --vdev=net_ioring0,iface=io0,persist ...
+
+
+Multi-process sharing
+---------------------
+
+The IORING device does not support secondary process (yet).
+
+
+Limitations
+-----------
+
+- IO uring requires io_uring support. This was add in Linux kernl version 5.1
+  Also, IO uring maybe disabled in some environments or by security policies.
+
+- Since IORING device uses a file descriptor to talk to the kernel,
+  the same number of queues must be specified for receive and transmit.
+
+- No flow support. Receive queue selection for incoming packets is determined
+  by the Linux kernel. See kernel documentation for more info:
+  https://www.kernel.org/doc/html/latest/networking/scaling.html
diff --git a/drivers/net/ioring/meson.build b/drivers/net/ioring/meson.build
new file mode 100644
index 0000000000..264554d069
--- /dev/null
+++ b/drivers/net/ioring/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2024 Stephen Hemminger
+
+if not is_linux
+    build = false
+    reason = 'only supported on Linux'
+endif
+
+dep = dependency('liburing', required:false)
+reason = 'missing dependency, "liburing"'
+build = dep.found()
+ext_deps += dep
+
+sources = files('rte_eth_ioring.c')
+require_iova_in_mbuf = false
diff --git a/drivers/net/ioring/rte_eth_ioring.c b/drivers/net/ioring/rte_eth_ioring.c
new file mode 100644
index 0000000000..4d5a5174db
--- /dev/null
+++ b/drivers/net/ioring/rte_eth_ioring.c
@@ -0,0 +1,262 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) Stephen Hemminger
+ */
+
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <net/if.h>
+#include <linux/if.h>
+#include <linux/if_tun.h>
+
+#include <bus_vdev_driver.h>
+#include <ethdev_driver.h>
+#include <ethdev_vdev.h>
+#include <rte_common.h>
+#include <rte_dev.h>
+#include <rte_eal.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_kvargs.h>
+#include <rte_log.h>
+
+#define IORING_DEFAULT_IFNAME	"itap%d"
+
+RTE_LOG_REGISTER_DEFAULT(ioring_logtype, NOTICE);
+#define RTE_LOGTYPE_IORING ioring_logtype
+#define PMD_LOG(level, ...) RTE_LOG_LINE_PREFIX(level, IORING, "%s(): ", __func__, __VA_ARGS__)
+
+#define IORING_IFACE_ARG	"iface"
+#define IORING_PERSIST_ARG	"persist"
+
+static const char * const valid_arguments[] = {
+	IORING_IFACE_ARG,
+	IORING_PERSIST_ARG,
+	NULL
+};
+
+struct pmd_internals {
+	int keep_fd;			/* keep alive file descriptor */
+	char ifname[IFNAMSIZ];		/* name assigned by kernel */
+	struct rte_ether_addr eth_addr; /* address assigned by kernel */
+};
+
+/* Creates a new tap device, name returned in ifr */
+static int
+tap_open(const char *name, struct ifreq *ifr, uint8_t persist)
+{
+	static const char tun_dev[] = "/dev/net/tun";
+	int tap_fd;
+
+	tap_fd = open(tun_dev, O_RDWR | O_CLOEXEC | O_NONBLOCK);
+	if (tap_fd < 0) {
+		PMD_LOG(ERR, "Open %s failed: %s", tun_dev, strerror(errno));
+		return -1;
+	}
+
+	int features = 0;
+	if (ioctl(tap_fd, TUNGETFEATURES, &features) < 0) {
+		PMD_LOG(ERR, "ioctl(TUNGETFEATURES) %s", strerror(errno));
+		goto error;
+	}
+
+	int flags = IFF_TAP | IFF_MULTI_QUEUE | IFF_NO_PI;
+	if ((features & flags) != flags) {
+		PMD_LOG(ERR, "TUN features %#x missing support for %#x",
+			features, features & flags);
+		goto error;
+	}
+
+#ifdef IFF_NAPI
+	/* If kernel supports using NAPI enable it */
+	if (features & IFF_NAPI)
+		flags |= IFF_NAPI;
+#endif
+	/*
+	 * Sets the device name and packet format.
+	 * Do not want the protocol information (PI)
+	 */
+	strlcpy(ifr->ifr_name, name, IFNAMSIZ);
+	ifr->ifr_flags = flags;
+	if (ioctl(tap_fd, TUNSETIFF, ifr) < 0) {
+		PMD_LOG(ERR, "ioctl(TUNSETIFF) %s: %s",
+			ifr->ifr_name, strerror(errno));
+		goto error;
+	}
+
+	/* (Optional) keep the device after application exit */
+	if (persist && ioctl(tap_fd, TUNSETPERSIST, 1) < 0) {
+		PMD_LOG(ERR, "ioctl(TUNSETPERIST) %s: %s",
+			ifr->ifr_name, strerror(errno));
+		goto error;
+	}
+
+	return tap_fd;
+error:
+	close(tap_fd);
+	return -1;
+}
+
+static int
+eth_dev_close(struct rte_eth_dev *dev)
+{
+	struct pmd_internals *pmd = dev->data->dev_private;
+
+	PMD_LOG(INFO, "Closing %s", pmd->ifname);
+
+	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+		return 0;
+
+	/* mac_addrs must not be freed alone because part of dev_private */
+	dev->data->mac_addrs = NULL;
+
+	if (pmd->keep_fd != -1) {
+		close(pmd->keep_fd);
+		pmd->keep_fd = -1;
+	}
+
+	return 0;
+}
+
+static const struct eth_dev_ops ops = {
+	.dev_close		= eth_dev_close,
+};
+
+static int
+ioring_create(struct rte_eth_dev *dev, const char *tap_name, uint8_t persist)
+{
+	struct rte_eth_dev_data *data = dev->data;
+	struct pmd_internals *pmd = data->dev_private;
+
+	pmd->keep_fd = -1;
+
+	data->dev_flags = RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
+	dev->dev_ops = &ops;
+
+	/* Get the initial fd used to keep the tap device around */
+	struct ifreq ifr = { };
+	pmd->keep_fd = tap_open(tap_name, &ifr, persist);
+	if (pmd->keep_fd < 0)
+		goto error;
+
+	strlcpy(pmd->ifname, ifr.ifr_name, IFNAMSIZ);
+
+	/* Read the MAC address assigned by the kernel */
+	if (ioctl(pmd->keep_fd, SIOCGIFHWADDR, &ifr) < 0) {
+		PMD_LOG(ERR, "Unable to get MAC address for %s: %s",
+			ifr.ifr_name, strerror(errno));
+		goto error;
+	}
+	memcpy(&pmd->eth_addr, &ifr.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN);
+	data->mac_addrs = &pmd->eth_addr;
+
+	/* Detach this instance, not used for traffic */
+	ifr.ifr_flags = IFF_DETACH_QUEUE;
+	if (ioctl(pmd->keep_fd, TUNSETQUEUE, &ifr) < 0) {
+		PMD_LOG(ERR, "Unable to detach keep-alive queue for %s: %s",
+			ifr.ifr_name, strerror(errno));
+		goto error;
+	}
+
+	PMD_LOG(DEBUG, "%s setup", ifr.ifr_name);
+	return 0;
+
+error:
+	if (pmd->keep_fd != -1)
+		close(pmd->keep_fd);
+	return -1;
+}
+
+static int
+parse_iface_arg(const char *key __rte_unused, const char *value, void *extra_args)
+{
+	char *name = extra_args;
+
+	/* must not be null string */
+	if (name == NULL || name[0] == '\0' ||
+	    strnlen(name, IFNAMSIZ) == IFNAMSIZ)
+		return -EINVAL;
+
+	strlcpy(name, value, IFNAMSIZ);
+	return 0;
+}
+
+static int
+ioring_probe(struct rte_vdev_device *vdev)
+{
+	const char *name = rte_vdev_device_name(vdev);
+	const char *params = rte_vdev_device_args(vdev);
+	struct rte_kvargs *kvlist = NULL;
+	struct rte_eth_dev *eth_dev = NULL;
+	char tap_name[IFNAMSIZ] = IORING_DEFAULT_IFNAME;
+	uint8_t persist = 0;
+	int ret;
+
+	PMD_LOG(INFO, "Initializing %s", name);
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
+		return -1; /* TODO */
+
+	if (params != NULL) {
+		kvlist = rte_kvargs_parse(params, valid_arguments);
+		if (kvlist == NULL)
+			return -1;
+
+		if (rte_kvargs_count(kvlist, IORING_IFACE_ARG) == 1) {
+			ret = rte_kvargs_process_opt(kvlist, IORING_IFACE_ARG,
+						     &parse_iface_arg, tap_name);
+			if (ret < 0)
+				goto error;
+		}
+
+		if (rte_kvargs_count(kvlist, IORING_PERSIST_ARG) == 1)
+			persist = 1;
+	}
+
+	eth_dev = rte_eth_vdev_allocate(vdev, sizeof(struct pmd_internals));
+	if (eth_dev == NULL) {
+		PMD_LOG(ERR, "%s Unable to allocate device struct", tap_name);
+		goto error;
+	}
+
+	if (ioring_create(eth_dev, tap_name, persist) < 0)
+		goto error;
+
+	rte_eth_dev_probing_finish(eth_dev);
+	return 0;
+
+error:
+	if (eth_dev != NULL)
+		rte_eth_dev_release_port(eth_dev);
+	rte_kvargs_free(kvlist);
+	return -1;
+}
+
+static int
+ioring_remove(struct rte_vdev_device *dev)
+{
+	struct rte_eth_dev *eth_dev;
+
+	eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(dev));
+	if (eth_dev == NULL)
+		return 0;
+
+	eth_dev_close(eth_dev);
+	rte_eth_dev_release_port(eth_dev);
+	return 0;
+}
+
+static struct rte_vdev_driver pmd_ioring_drv = {
+	.probe = ioring_probe,
+	.remove = ioring_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_ioring, pmd_ioring_drv);
+RTE_PMD_REGISTER_ALIAS(net_ioring, eth_ioring);
+RTE_PMD_REGISTER_PARAM_STRING(net_ioring, IORING_IFACE_ARG "=<string> ");
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 460eb69e5b..2e39136a5b 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -34,6 +34,7 @@ drivers = [
         'intel/ixgbe',
         'intel/cpfl',  # depends on idpf, so must come after it
         'ionic',
+        'ioring',
         'mana',
         'memif',
         'mlx4',
-- 
2.47.2


  reply	other threads:[~2025-03-11 23:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-10 21:23 [RFC 0/8] ioring: network driver Stephen Hemminger
2024-12-10 21:23 ` [RFC 1/8] net/ioring: introduce new driver Stephen Hemminger
2024-12-10 21:23 ` [RFC 2/8] net/ioring: implement link state Stephen Hemminger
2024-12-10 21:23 ` [RFC 3/8] net/ioring: implement control functions Stephen Hemminger
2024-12-10 21:23 ` [RFC 4/8] net/ioring: implement management functions Stephen Hemminger
2024-12-10 21:23 ` [RFC 5/8] net/ioring: implement primary secondary fd passing Stephen Hemminger
2024-12-10 21:23 ` [RFC 6/8] net/ioring: implement receive and transmit Stephen Hemminger
2024-12-10 21:23 ` [RFC 7/8] net/ioring: add VLAN support Stephen Hemminger
2024-12-10 21:23 ` [RFC 8/8] net/ioring: implement statistics Stephen Hemminger
2024-12-11 11:34 ` [RFC 0/8] ioring: network driver Konstantin Ananyev
2024-12-11 15:03   ` Stephen Hemminger
2024-12-12 19:06     ` Konstantin Ananyev
2024-12-19 15:40       ` Morten Brørup
2024-12-20 14:34         ` Konstantin Ananyev
2024-12-20 16:19           ` Stephen Hemminger
2024-12-11 16:28 ` [PATCH v2 " Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 1/8] net/ioring: introduce new driver Stephen Hemminger
2024-12-28 16:39     ` Morten Brørup
2024-12-11 16:28   ` [PATCH v2 2/8] net/ioring: implement link state Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 3/8] net/ioring: implement control functions Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 4/8] net/ioring: implement management functions Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 5/8] net/ioring: implement primary secondary fd passing Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 6/8] net/ioring: implement receive and transmit Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 7/8] net/ioring: add VLAN support Stephen Hemminger
2024-12-11 16:28   ` [PATCH v2 8/8] net/ioring: implement statistics Stephen Hemminger
2025-03-11 23:51 ` [PATCH v3 0/9] ioring PMD device Stephen Hemminger
2025-03-11 23:51   ` Stephen Hemminger [this message]
2025-03-11 23:51   ` [PATCH v3 2/9] net/ioring: implement link state Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 3/9] net/ioring: implement control functions Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 4/9] net/ioring: implement management functions Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 5/9] net/ioring: implement secondary process support Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 6/9] net/ioring: implement receive and transmit Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 7/9] net/ioring: add VLAN support Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 8/9] net/ioring: implement statistics Stephen Hemminger
2025-03-11 23:51   ` [PATCH v3 9/9] net/ioring: support multi-segment Rx and Tx Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250311235424.172440-2-stephen@networkplumber.org \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).