* [PATCH dpdk 0/4] net/tap: add network namespace support
@ 2025-10-27 15:37 Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 1/4] net/tap: add netlink helpers Robin Jarry
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 15:37 UTC (permalink / raw)
To: dev
The TAP driver currently uses ioctl operations which are name-based and
namespace-unaware. When an interface is moved to another namespace, the
driver loses control and cannot track the device.
This series migrates to netlink-based interface control using ifindex
instead of names, making operations namespace-safe. When an interface
moves to another namespace, the driver detects RTM_DELLINK, queries the
new namespace using TUNGETDEVNETNS, and recreates netlink sockets in
that namespace to maintain control.
The implementation falls back to ioctl when netlink is unavailable,
preserving compatibility with older kernels.
Tested by moving TAP interfaces between namespaces while running
testpmd. All link operations continue to work transparently after
namespace changes.
Robin Jarry (4):
net/tap: add netlink helpers
net/tap: rename internal ioctl wrapper
net/tap: use netlink if possible
net/tap: detect namespace change
drivers/net/tap/rte_eth_tap.c | 316 ++++++++++++++++++++++++++++------
drivers/net/tap/rte_eth_tap.h | 2 +-
drivers/net/tap/tap_netlink.c | 291 +++++++++++++++++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
4 files changed, 565 insertions(+), 54 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk 1/4] net/tap: add netlink helpers
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
@ 2025-10-27 15:37 ` Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 2/4] net/tap: rename internal ioctl wrapper Robin Jarry
` (4 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 15:37 UTC (permalink / raw)
To: dev, Stephen Hemminger
Add functions to get/set link flags, MAC address, and MTU using netlink
RTM_GETLINK/RTM_SETLINK messages instead of ioctl.
These will be used in the next commits for a more robust solution that
does not rely on interface names.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/tap_netlink.c | 291 ++++++++++++++++++++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
2 files changed, 299 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap/tap_netlink.c b/drivers/net/tap/tap_netlink.c
index 5ff60f41d426..20bdbe5f7df8 100644
--- a/drivers/net/tap/tap_netlink.c
+++ b/drivers/net/tap/tap_netlink.c
@@ -6,6 +6,7 @@
#include <errno.h>
#include <inttypes.h>
#include <linux/netlink.h>
+#include <net/if.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
@@ -411,3 +412,293 @@ tap_nlattr_nested_finish(struct tap_nlmsg *msg)
rte_free(tail);
}
+
+/**
+ * Helper structure to pass data between netlink request and callback
+ */
+struct link_info_ctx {
+ struct ifinfomsg *info;
+ struct rte_ether_addr *mac;
+ unsigned int *flags;
+ unsigned int ifindex;
+ int found;
+};
+
+/**
+ * Callback to extract link information from RTM_GETLINK response
+ */
+static int
+tap_nl_link_cb(struct nlmsghdr *nh, void *arg)
+{
+ struct link_info_ctx *ctx = arg;
+ struct ifinfomsg *ifi = NLMSG_DATA(nh);
+ struct rtattr *rta;
+ int rta_len;
+
+ if (nh->nlmsg_type != RTM_NEWLINK)
+ return 0;
+
+ /* Check if this is the interface we're looking for */
+ if (ifi->ifi_index != (int)ctx->ifindex)
+ return 0;
+
+ ctx->found = 1;
+
+ /* Copy basic info if requested */
+ if (ctx->info)
+ *ctx->info = *ifi;
+
+ /* Extract flags if requested */
+ if (ctx->flags)
+ *ctx->flags = ifi->ifi_flags;
+
+ /* Parse attributes for MAC address if requested */
+ if (ctx->mac) {
+ rta = IFLA_RTA(ifi);
+ rta_len = IFLA_PAYLOAD(nh);
+
+ for (; RTA_OK(rta, rta_len); rta = RTA_NEXT(rta, rta_len)) {
+ if (rta->rta_type == IFLA_ADDRESS) {
+ if (RTA_PAYLOAD(rta) >= RTE_ETHER_ADDR_LEN)
+ memcpy(ctx->mac, RTA_DATA(rta),
+ RTE_ETHER_ADDR_LEN);
+ break;
+ }
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * Get interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Pointer to store interface flags
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_link_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .flags = flags,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Flags to set/unset
+ * @param set
+ * 1 to set flags, 0 to unset them
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_link_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ .ifi_flags = set ? flags : 0,
+ .ifi_change = flags, /* mask of flags to change */
+ },
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Set interface MTU by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mtu
+ * New MTU value
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_link_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MTU attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_MTU;
+ rta->rta_len = RTA_LENGTH(sizeof(mtu));
+ memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Get interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * Pointer to store MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_link_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .mac = mac,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * New MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_link_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MAC address attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_ADDRESS;
+ rta->rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN);
+ memcpy(RTA_DATA(rta), mac, RTE_ETHER_ADDR_LEN);
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
diff --git a/drivers/net/tap/tap_netlink.h b/drivers/net/tap/tap_netlink.h
index 5eff6edbb1cd..e9c9e5dce553 100644
--- a/drivers/net/tap/tap_netlink.h
+++ b/drivers/net/tap/tap_netlink.h
@@ -6,12 +6,11 @@
#ifndef _TAP_NETLINK_H_
#define _TAP_NETLINK_H_
-#include <ctype.h>
#include <inttypes.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
-#include <stdio.h>
+#include <rte_ether.h>
#include <rte_log.h>
#define NLMSG_BUF 512
@@ -39,4 +38,11 @@ void tap_nlattr_add32(struct tap_nlmsg *msg, unsigned short type, uint32_t data)
int tap_nlattr_nested_start(struct tap_nlmsg *msg, uint16_t type);
void tap_nlattr_nested_finish(struct tap_nlmsg *msg);
+/* Link management functions using netlink */
+int tap_nl_get_link_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags);
+int tap_nl_set_link_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set);
+int tap_nl_set_link_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu);
+int tap_nl_set_link_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac);
+int tap_nl_get_link_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac);
+
#endif /* _TAP_NETLINK_H_ */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk 2/4] net/tap: rename internal ioctl wrapper
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 1/4] net/tap: add netlink helpers Robin Jarry
@ 2025-10-27 15:37 ` Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 3/4] net/tap: use netlink if possible Robin Jarry
` (3 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 15:37 UTC (permalink / raw)
To: dev, Stephen Hemminger
Prepare to replace ioctl with netlink by renaming enum ioctl_mode to
ctrl_mode and wrapping tap_ioctl with tap_ctrl.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/rte_eth_tap.c | 59 ++++++++++++++++++++---------------
1 file changed, 33 insertions(+), 26 deletions(-)
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 1bc8ae51cf6b..5b98e381b424 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -117,7 +117,7 @@ tap_trigger_cb(int sig __rte_unused)
}
/* Specifies on what netdevices the ioctl should be applied */
-enum ioctl_mode {
+enum ctrl_mode {
LOCAL_AND_REMOTE,
LOCAL_ONLY,
REMOTE_ONLY,
@@ -757,7 +757,7 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
}
static const char *
-tap_ioctl_req2str(unsigned long request)
+tap_ctrl_req2str(unsigned long request)
{
switch (request) {
case SIOCSIFFLAGS:
@@ -776,7 +776,7 @@ tap_ioctl_req2str(unsigned long request)
static int
tap_ioctl(struct pmd_internals *pmd, unsigned long request,
- struct ifreq *ifr, int set, enum ioctl_mode mode)
+ struct ifreq *ifr, int set, enum ctrl_mode mode)
{
short req_flags = ifr->ifr_flags;
int remote = pmd->remote_if_index &&
@@ -821,10 +821,17 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request,
error:
TAP_LOG(DEBUG, "%s(%s) failed: %s(%d)", ifr->ifr_name,
- tap_ioctl_req2str(request), strerror(errno), errno);
+ tap_ctrl_req2str(request), strerror(errno), errno);
return -errno;
}
+static int
+tap_ctrl(struct pmd_internals *pmd, unsigned long request,
+ struct ifreq *ifr, int set, enum ctrl_mode mode)
+{
+ return tap_ioctl(pmd, request, ifr, set, mode);
+}
+
static int
tap_link_set_down(struct rte_eth_dev *dev)
{
@@ -832,7 +839,7 @@ tap_link_set_down(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_UP };
dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_ONLY);
+ return tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_ONLY);
}
static int
@@ -842,7 +849,7 @@ tap_link_set_up(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_UP };
dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ return tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
}
static int
@@ -1234,14 +1241,14 @@ tap_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
struct ifreq ifr = { .ifr_flags = 0 };
if (pmd->remote_if_index) {
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, REMOTE_ONLY);
+ tap_ctrl(pmd, SIOCGIFFLAGS, &ifr, 0, REMOTE_ONLY);
if (!(ifr.ifr_flags & IFF_UP) ||
!(ifr.ifr_flags & IFF_RUNNING)) {
dev_link->link_status = RTE_ETH_LINK_DOWN;
return 0;
}
}
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, LOCAL_ONLY);
+ tap_ctrl(pmd, SIOCGIFFLAGS, &ifr, 0, LOCAL_ONLY);
dev_link->link_status =
((ifr.ifr_flags & IFF_UP) && (ifr.ifr_flags & IFF_RUNNING) ?
RTE_ETH_LINK_UP :
@@ -1256,7 +1263,7 @@ tap_promisc_enable(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
if (ret != 0)
return ret;
@@ -1266,7 +1273,7 @@ tap_promisc_enable(struct rte_eth_dev *dev)
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
/*
* rte_eth_dev_promiscuous_enable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1285,7 +1292,7 @@ tap_promisc_disable(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
if (ret != 0)
return ret;
@@ -1295,7 +1302,7 @@ tap_promisc_disable(struct rte_eth_dev *dev)
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
/*
* rte_eth_dev_promiscuous_disable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1315,7 +1322,7 @@ tap_allmulti_enable(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
if (ret != 0)
return ret;
@@ -1325,7 +1332,7 @@ tap_allmulti_enable(struct rte_eth_dev *dev)
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
/*
* rte_eth_dev_allmulticast_enable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1345,7 +1352,7 @@ tap_allmulti_disable(struct rte_eth_dev *dev)
struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
if (ret != 0)
return ret;
@@ -1355,7 +1362,7 @@ tap_allmulti_disable(struct rte_eth_dev *dev)
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_ctrl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
/*
* rte_eth_dev_allmulticast_disable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1372,7 +1379,7 @@ static int
tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
{
struct pmd_internals *pmd = dev->data->dev_private;
- enum ioctl_mode mode = LOCAL_ONLY;
+ enum ctrl_mode mode = LOCAL_ONLY;
struct ifreq ifr;
int ret;
@@ -1388,7 +1395,7 @@ tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
return -EINVAL;
}
/* Check the actual current MAC address on the tap netdevice */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, LOCAL_ONLY);
+ ret = tap_ctrl(pmd, SIOCGIFHWADDR, &ifr, 0, LOCAL_ONLY);
if (ret < 0)
return ret;
if (rte_is_same_ether_addr(
@@ -1396,7 +1403,7 @@ tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
mac_addr))
return 0;
/* Check the current MAC address on the remote */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY);
+ ret = tap_ctrl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY);
if (ret < 0)
return ret;
if (!rte_is_same_ether_addr(
@@ -1406,7 +1413,7 @@ tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
ifr.ifr_hwaddr.sa_family = AF_LOCAL;
rte_ether_addr_copy(mac_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- ret = tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 1, mode);
+ ret = tap_ctrl(pmd, SIOCSIFHWADDR, &ifr, 1, mode);
if (ret < 0)
return ret;
@@ -1660,7 +1667,7 @@ tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
struct pmd_internals *pmd = dev->data->dev_private;
struct ifreq ifr = { .ifr_mtu = mtu };
- return tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE);
+ return tap_ctrl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE);
}
static int
@@ -2014,14 +2021,14 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
TAP_LOG(DEBUG, "allocated %s", pmd->name);
ifr.ifr_mtu = dev->data->mtu;
- if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE) < 0)
+ if (tap_ctrl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE) < 0)
goto error_exit;
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
memset(&ifr, 0, sizeof(struct ifreq));
ifr.ifr_hwaddr.sa_family = AF_LOCAL;
rte_ether_addr_copy(&pmd->eth_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
+ if (tap_ctrl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
goto error_exit;
}
@@ -2071,10 +2078,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->remote_iface, remote_iface, RTE_ETH_NAME_MAX_LEN);
/* Save state of remote device */
- tap_ioctl(pmd, SIOCGIFFLAGS, &pmd->remote_initial_flags, 0, REMOTE_ONLY);
+ tap_ctrl(pmd, SIOCGIFFLAGS, &pmd->remote_initial_flags, 0, REMOTE_ONLY);
/* Replicate remote MAC address */
- if (tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY) < 0) {
+ if (tap_ctrl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY) < 0) {
TAP_LOG(ERR, "%s: failed to get %s MAC address.",
pmd->name, pmd->remote_iface);
goto error_remote;
@@ -2082,7 +2089,7 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
rte_ether_addr_copy((struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data, &pmd->eth_addr);
/* The desired MAC is already in ifreq after SIOCGIFHWADDR. */
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0) {
+ if (tap_ctrl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0) {
TAP_LOG(ERR, "%s: failed to get %s MAC address.",
pmd->name, remote_iface);
goto error_remote;
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk 3/4] net/tap: use netlink if possible
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 1/4] net/tap: add netlink helpers Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 2/4] net/tap: rename internal ioctl wrapper Robin Jarry
@ 2025-10-27 15:37 ` Robin Jarry
2025-10-27 16:06 ` Stephen Hemminger
2025-10-27 15:37 ` [PATCH dpdk 4/4] net/tap: detect namespace change Robin Jarry
` (2 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 15:37 UTC (permalink / raw)
To: dev, Stephen Hemminger
Make netlink socket available unconditionally, not just for rte_flow.
Use netlink for get/set operations on link flags, MAC, and MTU when
available. Fall back to ioctl if netlink socket creation fails.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/rte_eth_tap.c | 143 ++++++++++++++++++++++++++++------
drivers/net/tap/rte_eth_tap.h | 2 +-
2 files changed, 122 insertions(+), 23 deletions(-)
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 5b98e381b424..b53c85746056 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -774,6 +774,89 @@ tap_ctrl_req2str(unsigned long request)
return "UNKNOWN";
}
+static int
+tap_nl_ctrl(struct pmd_internals *pmd, unsigned long request,
+ struct ifreq *ifr, int set, enum ctrl_mode mode)
+{
+ bool remote = pmd->remote_if_index && (mode == REMOTE_ONLY || mode == LOCAL_AND_REMOTE);
+ struct rte_ether_addr *mac;
+ int ret = 0;
+
+ switch (request) {
+ case SIOCSIFFLAGS:
+ if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) {
+ ret = tap_nl_set_link_flags(pmd->nlsk_fd, pmd->if_index,
+ ifr->ifr_flags, set);
+ if (ret < 0)
+ return ret;
+ }
+ if (remote)
+ ret = tap_nl_set_link_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ ifr->ifr_flags, set);
+ break;
+
+ case SIOCGIFFLAGS:
+ if (mode == REMOTE_ONLY && pmd->remote_if_index) {
+ unsigned int flags = 0;
+ ret = tap_nl_get_link_flags(pmd->nlsk_fd, pmd->remote_if_index, &flags);
+ if (ret == 0)
+ ifr->ifr_flags = flags;
+ } else {
+ unsigned int flags = 0;
+ ret = tap_nl_get_link_flags(pmd->nlsk_fd, pmd->if_index, &flags);
+ if (ret == 0)
+ ifr->ifr_flags = flags;
+ }
+ break;
+
+ case SIOCGIFHWADDR:
+ mac = (struct rte_ether_addr *)ifr->ifr_hwaddr.sa_data;
+ if (mode == REMOTE_ONLY && pmd->remote_if_index) {
+ ret = tap_nl_get_link_mac(pmd->nlsk_fd, pmd->remote_if_index, mac);
+ if (ret == 0)
+ ifr->ifr_hwaddr.sa_family = AF_LOCAL;
+ } else {
+ ret = tap_nl_get_link_mac(pmd->nlsk_fd, pmd->if_index, mac);
+ if (ret == 0)
+ ifr->ifr_hwaddr.sa_family = AF_LOCAL;
+ }
+ break;
+
+ case SIOCSIFHWADDR:
+ mac = (struct rte_ether_addr *)ifr->ifr_hwaddr.sa_data;
+ if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) {
+ ret = tap_nl_set_link_mac(pmd->nlsk_fd, pmd->if_index, mac);
+ if (ret < 0)
+ return ret;
+ }
+ if (remote)
+ ret = tap_nl_set_link_mac(pmd->nlsk_fd, pmd->remote_if_index, mac);
+ break;
+
+ case SIOCSIFMTU:
+ if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE) {
+ ret = tap_nl_set_link_mtu(pmd->nlsk_fd, pmd->if_index, ifr->ifr_mtu);
+ if (ret < 0)
+ return ret;
+ }
+ if (remote)
+ ret = tap_nl_set_link_mtu(pmd->nlsk_fd, pmd->remote_if_index, ifr->ifr_mtu);
+ break;
+
+ default:
+ TAP_LOG(WARNING, "%s: unsupported netlink request", pmd->name);
+ return -EINVAL;
+ }
+
+ if (ret < 0) {
+ TAP_LOG(DEBUG, "%s: netlink %s failed: %s(%d)", pmd->name,
+ tap_ctrl_req2str(request), strerror(errno), errno);
+ return -errno;
+ }
+
+ return 0;
+}
+
static int
tap_ioctl(struct pmd_internals *pmd, unsigned long request,
struct ifreq *ifr, int set, enum ctrl_mode mode)
@@ -782,8 +865,6 @@ tap_ioctl(struct pmd_internals *pmd, unsigned long request,
int remote = pmd->remote_if_index &&
(mode == REMOTE_ONLY || mode == LOCAL_AND_REMOTE);
- if (!pmd->remote_if_index && mode == REMOTE_ONLY)
- return 0;
/*
* If there is a remote netdevice, apply ioctl on it, then apply it on
* the tap netdevice.
@@ -829,6 +910,14 @@ static int
tap_ctrl(struct pmd_internals *pmd, unsigned long request,
struct ifreq *ifr, int set, enum ctrl_mode mode)
{
+ if (!pmd->remote_if_index && mode == REMOTE_ONLY)
+ return 0;
+
+ /* Use netlink if available */
+ if (pmd->nlsk_fd >= 0 && pmd->if_index > 0)
+ return tap_nl_ctrl(pmd, request, ifr, set, mode);
+
+ /* Otherwise, fall back to ioctl */
return tap_ioctl(pmd, request, ifr, set, mode);
}
@@ -1138,12 +1227,15 @@ tap_dev_close(struct rte_eth_dev *dev)
if (internals->nlsk_fd != -1) {
tap_flow_flush(dev, NULL);
tap_flow_implicit_flush(internals, NULL);
- tap_nl_final(internals->nlsk_fd);
- internals->nlsk_fd = -1;
tap_flow_bpf_destroy(internals);
}
#endif
+ if (internals->nlsk_fd != -1) {
+ tap_nl_final(internals->nlsk_fd);
+ internals->nlsk_fd = -1;
+ }
+
for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
struct rx_queue *rxq = &internals->rxq[i];
@@ -1953,10 +2045,7 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->name, tap_name, sizeof(pmd->name));
pmd->type = type;
pmd->ka_fd = -1;
-
-#ifdef HAVE_TCA_FLOWER
pmd->nlsk_fd = -1;
-#endif
pmd->gso_ctx_mp = NULL;
pmd->ioctl_sock = socket(AF_INET, SOCK_DGRAM, 0);
@@ -2035,26 +2124,38 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
/* Make network device persist after application exit */
pmd->persist = persist;
-#ifdef HAVE_TCA_FLOWER
/*
- * Set up everything related to rte_flow:
- * - netlink socket
- * - tap / remote if_index
- * - mandatory QDISCs
- * - rte_flow actual/implicit lists
- * - implicit rules
+ * Try to create netlink socket for better interface control.
+ * This provides ifindex-based operations and is more namespace-safe.
+ * Fall back to ioctl if netlink is not available.
*/
pmd->nlsk_fd = tap_nl_init(0);
if (pmd->nlsk_fd == -1) {
- TAP_LOG(WARNING, "%s: failed to create netlink socket.",
+ TAP_LOG(INFO, "%s: netlink unavailable, using ioctl fallback.",
+ pmd->name);
+ } else {
+ pmd->if_index = if_nametoindex(pmd->name);
+ if (!pmd->if_index) {
+ TAP_LOG(WARNING, "%s: failed to get if_index.",
+ pmd->name);
+ close(pmd->nlsk_fd);
+ pmd->nlsk_fd = -1;
+ }
+ }
+
+#ifdef HAVE_TCA_FLOWER
+ /*
+ * Set up everything related to rte_flow:
+ * - mandatory QDISCs (requires netlink)
+ * - rte_flow actual/implicit lists
+ * - implicit rules
+ */
+ if (pmd->nlsk_fd == -1) {
+ TAP_LOG(WARNING, "%s: rte_flow requires netlink support.",
pmd->name);
goto disable_rte_flow;
}
- pmd->if_index = if_nametoindex(pmd->name);
- if (!pmd->if_index) {
- TAP_LOG(ERR, "%s: failed to get if_index.", pmd->name);
- goto disable_rte_flow;
- }
+
if (qdisc_create_multiq(pmd->nlsk_fd, pmd->if_index) < 0) {
TAP_LOG(ERR, "%s: failed to create multiq qdisc.",
pmd->name);
@@ -2141,10 +2242,8 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
#endif
error_exit:
-#ifdef HAVE_TCA_FLOWER
if (pmd->nlsk_fd != -1)
close(pmd->nlsk_fd);
-#endif
if (pmd->ka_fd != -1)
close(pmd->ka_fd);
if (pmd->ioctl_sock != -1)
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index ce4322ad046e..bb5aa8966bb0 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -77,9 +77,9 @@ struct pmd_internals {
int remote_if_index; /* remote netdevice IF_INDEX */
int if_index; /* IF_INDEX for the port */
int ioctl_sock; /* socket for ioctl calls */
+ int nlsk_fd; /* Netlink socket fd */
#ifdef HAVE_TCA_FLOWER
- int nlsk_fd; /* Netlink socket fd */
int flow_isolate; /* 1 if flow isolation is enabled */
struct tap_rss *rss; /* BPF program */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk 4/4] net/tap: detect namespace change
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
` (2 preceding siblings ...)
2025-10-27 15:37 ` [PATCH dpdk 3/4] net/tap: use netlink if possible Robin Jarry
@ 2025-10-27 15:37 ` Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
5 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 15:37 UTC (permalink / raw)
To: dev, Stephen Hemminger
When an interface is moved to another network namespace, the kernel
sends RTM_DELLINK. Detect this case by using TUNGETDEVNETNS ioctl on the
keep-alive fd. If successful, the interface still exists but in
a different namespace.
To handle this, temporarily switch to the new namespace using setns(),
query the new ifindex, recreate netlink and LSC interrupt sockets in
that namespace, then switch back. Replace the old netlink socket with
the new one so subsequent operations work in the target namespace.
This allows the driver to track interfaces across namespace changes
without losing control.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/rte_eth_tap.c | 114 +++++++++++++++++++++++++++++++++-
1 file changed, 111 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index b53c85746056..a9edf585e131 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -37,6 +37,7 @@
#include <net/if.h>
#include <linux/if_tun.h>
#include <linux/if_ether.h>
+#include <linux/sched.h>
#include <fcntl.h>
#include <ctype.h>
@@ -1774,17 +1775,118 @@ tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused,
return 0;
}
+#ifdef TUNGETDEVNETNS
+static void tap_dev_intr_handler(void *cb_arg);
+static int tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set);
+
+static int
+tap_netns_change(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int netns_fd, orig_netns_fd, new_nlsk_fd;
+
+ netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS);
+ if (netns_fd < 0) {
+ TAP_LOG(INFO, "%s: interface deleted", pmd->name);
+ return 0;
+ }
+
+ /* Interface was moved to another namespace */
+ pmd->if_index = 0;
+
+ /* Save current namespace */
+ orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+ if (orig_netns_fd < 0) {
+ TAP_LOG(ERR, "%s: failed to open original netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ return -1;
+ }
+
+ /* Switch to new namespace */
+ if (setns(netns_fd, CLONE_NEWNET) < 0) {
+ TAP_LOG(ERR, "%s: failed to enter new netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ close(orig_netns_fd);
+ return -1;
+ }
+
+ /*
+ * Update ifindex by querying interface name.
+ * The interface now has a new ifindex in the new namespace.
+ */
+ pmd->if_index = if_nametoindex(pmd->name);
+
+ /* Recreate netlink socket in new namespace */
+ new_nlsk_fd = tap_nl_init(0);
+
+ /* Recreate LSC interrupt netlink socket in new namespace */
+ rte_intr_callback_unregister_pending(pmd->intr_handle, tap_dev_intr_handler, dev, NULL);
+ if (tap_lsc_intr_handle_set(dev, 1) < 0)
+ TAP_LOG(WARNING, "%s: failed to recreate LSC interrupt socket",
+ pmd->name);
+
+ /* Switch back to original namespace */
+ if (setns(orig_netns_fd, CLONE_NEWNET) < 0)
+ TAP_LOG(ERR, "%s: failed to return to original netns: %s",
+ pmd->name, strerror(errno));
+
+ close(orig_netns_fd);
+ close(netns_fd);
+
+ if (pmd->if_index == 0) {
+ TAP_LOG(WARNING, "%s: interface moved to another namespace, "
+ "failed to get new ifindex",
+ pmd->name);
+ if (new_nlsk_fd >= 0)
+ close(new_nlsk_fd);
+ return -1;
+ }
+
+ if (new_nlsk_fd < 0) {
+ TAP_LOG(WARNING, "%s: failed to recreate netlink socket in new namespace",
+ pmd->name);
+ return -1;
+ }
+
+ /* Close old netlink socket and replace with new one */
+ if (pmd->nlsk_fd >= 0)
+ tap_nl_final(pmd->nlsk_fd);
+ pmd->nlsk_fd = new_nlsk_fd;
+
+ TAP_LOG(INFO, "%s: interface moved to another namespace, new ifindex: %u",
+ pmd->name, pmd->if_index);
+
+ return 0;
+}
+#endif
+
static int
tap_nl_msg_handler(struct nlmsghdr *nh, void *arg)
{
struct rte_eth_dev *dev = arg;
struct pmd_internals *pmd = dev->data->dev_private;
struct ifinfomsg *info = NLMSG_DATA(nh);
+ int is_local = (info->ifi_index == pmd->if_index);
+ int is_remote = (info->ifi_index == pmd->remote_if_index);
- if (nh->nlmsg_type != RTM_NEWLINK ||
- (info->ifi_index != pmd->if_index &&
- info->ifi_index != pmd->remote_if_index))
+ /* Ignore messages not for our interfaces */
+ if (!is_local && !is_remote)
return 0;
+
+#ifdef TUNGETDEVNETNS
+ if (nh->nlmsg_type == RTM_DELLINK && is_local) {
+ /*
+ * RTM_DELLINK may indicate the interface was moved to another
+ * network namespace. Check if the device still exists by
+ * querying its namespace via the keep-alive fd.
+ */
+ int ret = tap_netns_change(dev);
+ if (ret < 0)
+ return ret;
+ }
+#endif
return tap_link_update(dev, 0);
}
@@ -1813,6 +1915,12 @@ tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set)
return 0;
}
if (set) {
+ /*
+ * Subscribe to RTMGRP_LINK to receive RTM_NEWLINK (link state
+ * changes) events. Also receives RTM_DELLINK events which are
+ * used for namespace change detection when TUNGETDEVNETNS is
+ * available.
+ */
rte_intr_fd_set(pmd->intr_handle, tap_nl_init(RTMGRP_LINK));
if (unlikely(rte_intr_fd_get(pmd->intr_handle) == -1))
return -EBADF;
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH dpdk 3/4] net/tap: use netlink if possible
2025-10-27 15:37 ` [PATCH dpdk 3/4] net/tap: use netlink if possible Robin Jarry
@ 2025-10-27 16:06 ` Stephen Hemminger
2025-10-27 16:10 ` Robin Jarry
0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2025-10-27 16:06 UTC (permalink / raw)
To: Robin Jarry; +Cc: dev
On Mon, 27 Oct 2025 16:37:54 +0100
Robin Jarry <rjarry@redhat.com> wrote:
> Make netlink socket available unconditionally, not just for rte_flow.
>
> Use netlink for get/set operations on link flags, MAC, and MTU when
> available. Fall back to ioctl if netlink socket creation fails.
>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
Netlink has been available since linux 2.4!
Rather than having two code paths, only one of which gets tested,
better to commit to netlink and use it.
Since dealing with netlink is a nuisance, I wonder if using
libmnl would be better. Yes, it creates new dependency but netlink
handling has been place with lots of Coverity overruns etc.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH dpdk 3/4] net/tap: use netlink if possible
2025-10-27 16:06 ` Stephen Hemminger
@ 2025-10-27 16:10 ` Robin Jarry
2025-10-27 16:58 ` Stephen Hemminger
0 siblings, 1 reply; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 16:10 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Stephen Hemminger, Oct 27, 2025 at 17:06:
> Netlink has been available since linux 2.4!
> Rather than having two code paths, only one of which gets tested,
> better to commit to netlink and use it.
>
> Since dealing with netlink is a nuisance, I wonder if using
> libmnl would be better. Yes, it creates new dependency but netlink
> handling has been place with lots of Coverity overruns etc.
I don't mind replacing ioctl with netlink calls but adding a libmnl
dependency just for one driver seems overkill.
--
Robin
> Many suitcases look alike.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH dpdk 3/4] net/tap: use netlink if possible
2025-10-27 16:10 ` Robin Jarry
@ 2025-10-27 16:58 ` Stephen Hemminger
0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2025-10-27 16:58 UTC (permalink / raw)
To: Robin Jarry; +Cc: dev
On Mon, 27 Oct 2025 17:10:24 +0100
"Robin Jarry" <rjarry@redhat.com> wrote:
> Stephen Hemminger, Oct 27, 2025 at 17:06:
> > Netlink has been available since linux 2.4!
> > Rather than having two code paths, only one of which gets tested,
> > better to commit to netlink and use it.
> >
> > Since dealing with netlink is a nuisance, I wonder if using
> > libmnl would be better. Yes, it creates new dependency but netlink
> > handling has been place with lots of Coverity overruns etc.
>
> I don't mind replacing ioctl with netlink calls but adding a libmnl
> dependency just for one driver seems overkill.
>
There is netlink handling in a couple drivers but probably not worth
bothering for now.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v2 0/3] net/tap: add network namespace support
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
` (3 preceding siblings ...)
2025-10-27 15:37 ` [PATCH dpdk 4/4] net/tap: detect namespace change Robin Jarry
@ 2025-10-27 18:19 ` Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 1/3] net/tap: add netlink helpers Robin Jarry
` (3 more replies)
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
5 siblings, 4 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 18:19 UTC (permalink / raw)
To: dev
The TAP driver currently uses ioctl operations which are name-based and
namespace-unaware. When an interface is moved to another namespace, the
driver loses control and cannot track the device.
This series migrates to netlink-based interface control using ifindex
instead of names, making operations namespace-safe. When an interface
moves to another namespace, the driver detects RTM_DELLINK, queries the
new namespace using TUNGETDEVNETNS, and recreates netlink sockets in
that namespace to maintain control.
The implementation falls back to ioctl when netlink is unavailable,
preserving compatibility with older kernels.
Tested by moving TAP interfaces between namespaces while running
testpmd. All link operations continue to work transparently after
namespace changes.
v2: completely removed ioctl-based implementation
Robin Jarry (3):
net/tap: add netlink helpers
net/tap: replace ioctl with netlink
net/tap: detect namespace change
drivers/net/tap/rte_eth_tap.c | 412 +++++++++++++++++++---------------
drivers/net/tap/rte_eth_tap.h | 5 +-
drivers/net/tap/tap_netlink.c | 291 ++++++++++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
4 files changed, 534 insertions(+), 184 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v2 1/3] net/tap: add netlink helpers
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
@ 2025-10-27 18:19 ` Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 2/3] net/tap: replace ioctl with netlink Robin Jarry
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 18:19 UTC (permalink / raw)
To: dev, Stephen Hemminger
Add functions to get/set link flags, MAC address, and MTU using netlink
RTM_GETLINK/RTM_SETLINK messages instead of ioctl.
These will be used in the next commits for a more robust solution that
does not rely on interface names.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/tap_netlink.c | 291 ++++++++++++++++++++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
2 files changed, 299 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap/tap_netlink.c b/drivers/net/tap/tap_netlink.c
index 5ff60f41d426..0682ba87e0da 100644
--- a/drivers/net/tap/tap_netlink.c
+++ b/drivers/net/tap/tap_netlink.c
@@ -6,6 +6,7 @@
#include <errno.h>
#include <inttypes.h>
#include <linux/netlink.h>
+#include <net/if.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
@@ -411,3 +412,293 @@ tap_nlattr_nested_finish(struct tap_nlmsg *msg)
rte_free(tail);
}
+
+/**
+ * Helper structure to pass data between netlink request and callback
+ */
+struct link_info_ctx {
+ struct ifinfomsg *info;
+ struct rte_ether_addr *mac;
+ unsigned int *flags;
+ unsigned int ifindex;
+ int found;
+};
+
+/**
+ * Callback to extract link information from RTM_GETLINK response
+ */
+static int
+tap_nl_link_cb(struct nlmsghdr *nh, void *arg)
+{
+ struct link_info_ctx *ctx = arg;
+ struct ifinfomsg *ifi = NLMSG_DATA(nh);
+ struct rtattr *rta;
+ int rta_len;
+
+ if (nh->nlmsg_type != RTM_NEWLINK)
+ return 0;
+
+ /* Check if this is the interface we're looking for */
+ if (ifi->ifi_index != (int)ctx->ifindex)
+ return 0;
+
+ ctx->found = 1;
+
+ /* Copy basic info if requested */
+ if (ctx->info)
+ *ctx->info = *ifi;
+
+ /* Extract flags if requested */
+ if (ctx->flags)
+ *ctx->flags = ifi->ifi_flags;
+
+ /* Parse attributes for MAC address if requested */
+ if (ctx->mac) {
+ rta = IFLA_RTA(ifi);
+ rta_len = IFLA_PAYLOAD(nh);
+
+ for (; RTA_OK(rta, rta_len); rta = RTA_NEXT(rta, rta_len)) {
+ if (rta->rta_type == IFLA_ADDRESS) {
+ if (RTA_PAYLOAD(rta) >= RTE_ETHER_ADDR_LEN)
+ memcpy(ctx->mac, RTA_DATA(rta),
+ RTE_ETHER_ADDR_LEN);
+ break;
+ }
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * Get interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Pointer to store interface flags
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .flags = flags,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Flags to set/unset
+ * @param set
+ * 1 to set flags, 0 to unset them
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ .ifi_flags = set ? flags : 0,
+ .ifi_change = flags, /* mask of flags to change */
+ },
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Set interface MTU by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mtu
+ * New MTU value
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MTU attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_MTU;
+ rta->rta_len = RTA_LENGTH(sizeof(mtu));
+ memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Get interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * Pointer to store MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .mac = mac,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * New MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MAC address attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_ADDRESS;
+ rta->rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN);
+ memcpy(RTA_DATA(rta), mac, RTE_ETHER_ADDR_LEN);
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
diff --git a/drivers/net/tap/tap_netlink.h b/drivers/net/tap/tap_netlink.h
index 5eff6edbb1cd..b85be166245e 100644
--- a/drivers/net/tap/tap_netlink.h
+++ b/drivers/net/tap/tap_netlink.h
@@ -6,12 +6,11 @@
#ifndef _TAP_NETLINK_H_
#define _TAP_NETLINK_H_
-#include <ctype.h>
#include <inttypes.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
-#include <stdio.h>
+#include <rte_ether.h>
#include <rte_log.h>
#define NLMSG_BUF 512
@@ -39,4 +38,11 @@ void tap_nlattr_add32(struct tap_nlmsg *msg, unsigned short type, uint32_t data)
int tap_nlattr_nested_start(struct tap_nlmsg *msg, uint16_t type);
void tap_nlattr_nested_finish(struct tap_nlmsg *msg);
+/* Link management functions using netlink */
+int tap_nl_get_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags);
+int tap_nl_set_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set);
+int tap_nl_set_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu);
+int tap_nl_set_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac);
+int tap_nl_get_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac);
+
#endif /* _TAP_NETLINK_H_ */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v2 2/3] net/tap: replace ioctl with netlink
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 1/3] net/tap: add netlink helpers Robin Jarry
@ 2025-10-27 18:19 ` Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 3/3] net/tap: detect namespace change Robin Jarry
2025-10-27 21:55 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Stephen Hemminger
3 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 18:19 UTC (permalink / raw)
To: dev, Stephen Hemminger
Remove ioctl-based link control implementation. All interface operations
now use netlink exclusively via direct tap_nl_* calls.
Remove tap_ctrl/tap_nl_ctrl wrapper functions, enum ctrl_mode, and
ioctl_sock field. Make netlink socket mandatory - driver fails if
netlink is unavailable.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/rte_eth_tap.c | 298 ++++++++++++++--------------------
drivers/net/tap/rte_eth_tap.h | 5 +-
2 files changed, 124 insertions(+), 179 deletions(-)
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 1bc8ae51cf6b..e006c71989a8 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -22,9 +22,7 @@
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
-#include <sys/socket.h>
#include <sys/ioctl.h>
-#include <sys/utsname.h>
#include <sys/mman.h>
#include <errno.h>
#include <signal.h>
@@ -33,12 +31,9 @@
#include <stdlib.h>
#include <sys/uio.h>
#include <unistd.h>
-#include <arpa/inet.h>
#include <net/if.h>
#include <linux/if_tun.h>
-#include <linux/if_ether.h>
#include <fcntl.h>
-#include <ctype.h>
#include <tap_rss.h>
#include <rte_eth_tap.h>
@@ -116,13 +111,6 @@ tap_trigger_cb(int sig __rte_unused)
tap_trigger = (tap_trigger + 1) | 0x80000000;
}
-/* Specifies on what netdevices the ioctl should be applied */
-enum ioctl_mode {
- LOCAL_AND_REMOTE,
- LOCAL_ONLY,
- REMOTE_ONLY,
-};
-
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
@@ -756,93 +744,28 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
return num_tx;
}
-static const char *
-tap_ioctl_req2str(unsigned long request)
-{
- switch (request) {
- case SIOCSIFFLAGS:
- return "SIOCSIFFLAGS";
- case SIOCGIFFLAGS:
- return "SIOCGIFFLAGS";
- case SIOCGIFHWADDR:
- return "SIOCGIFHWADDR";
- case SIOCSIFHWADDR:
- return "SIOCSIFHWADDR";
- case SIOCSIFMTU:
- return "SIOCSIFMTU";
- }
- return "UNKNOWN";
-}
-
-static int
-tap_ioctl(struct pmd_internals *pmd, unsigned long request,
- struct ifreq *ifr, int set, enum ioctl_mode mode)
-{
- short req_flags = ifr->ifr_flags;
- int remote = pmd->remote_if_index &&
- (mode == REMOTE_ONLY || mode == LOCAL_AND_REMOTE);
-
- if (!pmd->remote_if_index && mode == REMOTE_ONLY)
- return 0;
- /*
- * If there is a remote netdevice, apply ioctl on it, then apply it on
- * the tap netdevice.
- */
-apply:
- if (remote)
- strlcpy(ifr->ifr_name, pmd->remote_iface, IFNAMSIZ);
- else if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE)
- strlcpy(ifr->ifr_name, pmd->name, IFNAMSIZ);
- switch (request) {
- case SIOCSIFFLAGS:
- /* fetch current flags to leave other flags untouched */
- if (ioctl(pmd->ioctl_sock, SIOCGIFFLAGS, ifr) < 0)
- goto error;
- if (set)
- ifr->ifr_flags |= req_flags;
- else
- ifr->ifr_flags &= ~req_flags;
- break;
- case SIOCGIFFLAGS:
- case SIOCGIFHWADDR:
- case SIOCSIFHWADDR:
- case SIOCSIFMTU:
- break;
- default:
- TAP_LOG(WARNING, "%s: ioctl() called with wrong arg",
- pmd->name);
- return -EINVAL;
- }
- if (ioctl(pmd->ioctl_sock, request, ifr) < 0)
- goto error;
- if (remote-- && mode == LOCAL_AND_REMOTE)
- goto apply;
- return 0;
-
-error:
- TAP_LOG(DEBUG, "%s(%s) failed: %s(%d)", ifr->ifr_name,
- tap_ioctl_req2str(request), strerror(errno), errno);
- return -errno;
-}
-
static int
tap_link_set_down(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_UP };
dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_ONLY);
+ return tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_UP, 0);
}
static int
tap_link_set_up(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_UP };
+ int ret;
dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_UP, 1);
+ if (ret < 0)
+ return ret;
+ if (pmd->remote_if_index)
+ return tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_UP, 1);
+ return 0;
}
static int
@@ -1131,8 +1054,6 @@ tap_dev_close(struct rte_eth_dev *dev)
if (internals->nlsk_fd != -1) {
tap_flow_flush(dev, NULL);
tap_flow_implicit_flush(internals, NULL);
- tap_nl_final(internals->nlsk_fd);
- internals->nlsk_fd = -1;
tap_flow_bpf_destroy(internals);
}
#endif
@@ -1150,11 +1071,10 @@ tap_dev_close(struct rte_eth_dev *dev)
if (internals->remote_if_index) {
/* Restore initial remote state */
- int ret = ioctl(internals->ioctl_sock, SIOCSIFFLAGS,
- &internals->remote_initial_flags);
+ int ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index,
+ internals->remote_initial_flags, 1);
if (ret)
TAP_LOG(ERR, "restore remote state failed: %d", ret);
-
}
rte_mempool_free(internals->gso_ctx_mp);
@@ -1174,9 +1094,9 @@ tap_dev_close(struct rte_eth_dev *dev)
rte_intr_instance_free(internals->intr_handle);
- if (internals->ioctl_sock != -1) {
- close(internals->ioctl_sock);
- internals->ioctl_sock = -1;
+ if (internals->nlsk_fd != -1) {
+ tap_nl_final(internals->nlsk_fd);
+ internals->nlsk_fd = -1;
}
free(dev->process_private);
dev->process_private = NULL;
@@ -1231,21 +1151,22 @@ tap_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
{
struct rte_eth_link *dev_link = &dev->data->dev_link;
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = 0 };
+ unsigned int flags = 0;
if (pmd->remote_if_index) {
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, REMOTE_ONLY);
- if (!(ifr.ifr_flags & IFF_UP) ||
- !(ifr.ifr_flags & IFF_RUNNING)) {
- dev_link->link_status = RTE_ETH_LINK_DOWN;
- return 0;
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->remote_if_index, &flags) == 0) {
+ if (!(flags & IFF_UP) || !(flags & IFF_RUNNING)) {
+ dev_link->link_status = RTE_ETH_LINK_DOWN;
+ return 0;
+ }
}
}
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, LOCAL_ONLY);
- dev_link->link_status =
- ((ifr.ifr_flags & IFF_UP) && (ifr.ifr_flags & IFF_RUNNING) ?
- RTE_ETH_LINK_UP :
- RTE_ETH_LINK_DOWN);
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->if_index, &flags) == 0) {
+ if ((flags & IFF_UP) && (flags & IFF_RUNNING))
+ dev_link->link_status = RTE_ETH_LINK_UP;
+ else
+ dev_link->link_status = RTE_ETH_LINK_DOWN;
+ }
return 0;
}
@@ -1253,20 +1174,28 @@ static int
tap_promisc_enable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 1);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_PROMISC, 1);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->promiscuous = 1;
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 0);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_PROMISC, 0);
/*
* rte_eth_dev_promiscuous_enable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1282,20 +1211,28 @@ static int
tap_promisc_disable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 0);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_PROMISC, 0);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->promiscuous = 0;
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 1);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_PROMISC, 1);
/*
* rte_eth_dev_promiscuous_disable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1312,20 +1249,28 @@ static int
tap_allmulti_enable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 1);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_ALLMULTI, 1);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->all_multicast = 1;
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 0);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_ALLMULTI, 0);
/*
* rte_eth_dev_allmulticast_enable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1342,20 +1287,28 @@ static int
tap_allmulti_disable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 0);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_ALLMULTI, 0);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->all_multicast = 0;
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 1);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_ALLMULTI, 1);
/*
* rte_eth_dev_allmulticast_disable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1372,8 +1325,8 @@ static int
tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
{
struct pmd_internals *pmd = dev->data->dev_private;
- enum ioctl_mode mode = LOCAL_ONLY;
- struct ifreq ifr;
+ struct rte_ether_addr current_mac;
+ bool set_remote = false;
int ret;
if (pmd->type == ETH_TUNTAP_TYPE_TUN) {
@@ -1388,28 +1341,31 @@ tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
return -EINVAL;
}
/* Check the actual current MAC address on the tap netdevice */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, LOCAL_ONLY);
+ ret = tap_nl_get_mac(pmd->nlsk_fd, pmd->if_index, ¤t_mac);
if (ret < 0)
return ret;
- if (rte_is_same_ether_addr(
- (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data,
- mac_addr))
+ if (rte_is_same_ether_addr(¤t_mac, mac_addr))
return 0;
- /* Check the current MAC address on the remote */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY);
- if (ret < 0)
- return ret;
- if (!rte_is_same_ether_addr(
- (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data,
- mac_addr))
- mode = LOCAL_AND_REMOTE;
- ifr.ifr_hwaddr.sa_family = AF_LOCAL;
- rte_ether_addr_copy(mac_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- ret = tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 1, mode);
+ /* Check the current MAC address on the remote */
+ if (pmd->remote_if_index) {
+ ret = tap_nl_get_mac(pmd->nlsk_fd, pmd->remote_if_index, ¤t_mac);
+ if (ret < 0)
+ return ret;
+ if (!rte_is_same_ether_addr(¤t_mac, mac_addr))
+ set_remote = true;
+ }
+
+ ret = tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, mac_addr);
if (ret < 0)
return ret;
+ if (set_remote) {
+ ret = tap_nl_set_mac(pmd->nlsk_fd, pmd->remote_if_index, mac_addr);
+ if (ret < 0)
+ return ret;
+ }
+
rte_ether_addr_copy(mac_addr, &pmd->eth_addr);
#ifdef HAVE_TCA_FLOWER
@@ -1658,9 +1614,16 @@ static int
tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_mtu = mtu };
+ int ret;
- return tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_mtu(pmd->nlsk_fd, pmd->if_index, mtu);
+ if (ret < 0)
+ return ret;
+
+ if (pmd->remote_if_index)
+ return tap_nl_set_mtu(pmd->nlsk_fd, pmd->remote_if_index, mtu);
+
+ return 0;
}
static int
@@ -1921,7 +1884,6 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
struct pmd_process_private *process_private;
const char *tuntap_name = tuntap_types[type];
struct rte_eth_dev_data *data;
- struct ifreq ifr;
int i;
TAP_LOG(DEBUG, "%s device on numa %u", tuntap_name, rte_socket_id());
@@ -1946,20 +1908,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->name, tap_name, sizeof(pmd->name));
pmd->type = type;
pmd->ka_fd = -1;
-
-#ifdef HAVE_TCA_FLOWER
pmd->nlsk_fd = -1;
-#endif
pmd->gso_ctx_mp = NULL;
- pmd->ioctl_sock = socket(AF_INET, SOCK_DGRAM, 0);
- if (pmd->ioctl_sock == -1) {
- TAP_LOG(ERR,
- "%s Unable to get a socket for management: %s",
- tuntap_name, strerror(errno));
- goto error_exit;
- }
-
/* Allocate interrupt instance */
pmd->intr_handle = rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
if (pmd->intr_handle == NULL) {
@@ -2013,15 +1964,27 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
}
TAP_LOG(DEBUG, "allocated %s", pmd->name);
- ifr.ifr_mtu = dev->data->mtu;
- if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE) < 0)
+ /*
+ * Create netlink socket for interface control.
+ * Netlink provides ifindex-based operations and is namespace-safe.
+ */
+ pmd->nlsk_fd = tap_nl_init(0);
+ if (pmd->nlsk_fd == -1) {
+ TAP_LOG(ERR, "%s: failed to create netlink socket.", pmd->name);
+ goto error_exit;
+ }
+
+ pmd->if_index = if_nametoindex(pmd->name);
+ if (!pmd->if_index) {
+ TAP_LOG(ERR, "%s: failed to get if_index.", pmd->name);
+ goto error_exit;
+ }
+
+ if (tap_nl_set_mtu(pmd->nlsk_fd, pmd->if_index, dev->data->mtu) < 0)
goto error_exit;
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
- memset(&ifr, 0, sizeof(struct ifreq));
- ifr.ifr_hwaddr.sa_family = AF_LOCAL;
- rte_ether_addr_copy(&pmd->eth_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
+ if (tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, &pmd->eth_addr) < 0)
goto error_exit;
}
@@ -2031,23 +1994,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
#ifdef HAVE_TCA_FLOWER
/*
* Set up everything related to rte_flow:
- * - netlink socket
- * - tap / remote if_index
* - mandatory QDISCs
* - rte_flow actual/implicit lists
* - implicit rules
*/
- pmd->nlsk_fd = tap_nl_init(0);
- if (pmd->nlsk_fd == -1) {
- TAP_LOG(WARNING, "%s: failed to create netlink socket.",
- pmd->name);
- goto disable_rte_flow;
- }
- pmd->if_index = if_nametoindex(pmd->name);
- if (!pmd->if_index) {
- TAP_LOG(ERR, "%s: failed to get if_index.", pmd->name);
- goto disable_rte_flow;
- }
if (qdisc_create_multiq(pmd->nlsk_fd, pmd->if_index) < 0) {
TAP_LOG(ERR, "%s: failed to create multiq qdisc.",
pmd->name);
@@ -2071,19 +2021,19 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->remote_iface, remote_iface, RTE_ETH_NAME_MAX_LEN);
/* Save state of remote device */
- tap_ioctl(pmd, SIOCGIFFLAGS, &pmd->remote_initial_flags, 0, REMOTE_ONLY);
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ &pmd->remote_initial_flags) < 0)
+ pmd->remote_initial_flags = 0;
/* Replicate remote MAC address */
- if (tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY) < 0) {
+ if (tap_nl_get_mac(pmd->nlsk_fd, pmd->remote_if_index, &pmd->eth_addr) < 0) {
TAP_LOG(ERR, "%s: failed to get %s MAC address.",
pmd->name, pmd->remote_iface);
goto error_remote;
}
- rte_ether_addr_copy((struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data, &pmd->eth_addr);
- /* The desired MAC is already in ifreq after SIOCGIFHWADDR. */
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0) {
- TAP_LOG(ERR, "%s: failed to get %s MAC address.",
+ if (tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, &pmd->eth_addr) < 0) {
+ TAP_LOG(ERR, "%s: failed to set %s MAC address.",
pmd->name, remote_iface);
goto error_remote;
}
@@ -2134,14 +2084,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
#endif
error_exit:
-#ifdef HAVE_TCA_FLOWER
if (pmd->nlsk_fd != -1)
close(pmd->nlsk_fd);
-#endif
if (pmd->ka_fd != -1)
close(pmd->ka_fd);
- if (pmd->ioctl_sock != -1)
- close(pmd->ioctl_sock);
/* mac_addrs must not be freed alone because part of dev_private */
dev->data->mac_addrs = NULL;
rte_intr_instance_free(pmd->intr_handle);
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index ce4322ad046e..218ee1b811d8 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -73,13 +73,12 @@ struct pmd_internals {
int type; /* Type field - TUN|TAP */
int persist; /* 1 if keep link up, else 0 */
struct rte_ether_addr eth_addr; /* Mac address of the device port */
- struct ifreq remote_initial_flags;/* Remote netdevice flags on init */
+ unsigned int remote_initial_flags;/* Remote netdevice flags on init */
int remote_if_index; /* remote netdevice IF_INDEX */
int if_index; /* IF_INDEX for the port */
- int ioctl_sock; /* socket for ioctl calls */
+ int nlsk_fd; /* Netlink socket fd */
#ifdef HAVE_TCA_FLOWER
- int nlsk_fd; /* Netlink socket fd */
int flow_isolate; /* 1 if flow isolation is enabled */
struct tap_rss *rss; /* BPF program */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v2 3/3] net/tap: detect namespace change
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 1/3] net/tap: add netlink helpers Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 2/3] net/tap: replace ioctl with netlink Robin Jarry
@ 2025-10-27 18:19 ` Robin Jarry
2025-10-27 21:55 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Stephen Hemminger
3 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 18:19 UTC (permalink / raw)
To: dev, Stephen Hemminger
When an interface is moved to another network namespace, the kernel
sends RTM_DELLINK. Detect this case by using TUNGETDEVNETNS ioctl on the
keep-alive fd. If successful, the interface still exists but in
a different namespace.
To handle this, temporarily switch to the new namespace using setns(),
query the new ifindex, recreate netlink and LSC interrupt sockets in
that namespace, then switch back. Replace the old netlink socket with
the new one so subsequent operations work in the target namespace.
This allows the driver to track interfaces across namespace changes
without losing control.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/rte_eth_tap.c | 114 +++++++++++++++++++++++++++++++++-
1 file changed, 111 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index e006c71989a8..bb96aa7e61ec 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -33,6 +33,7 @@
#include <unistd.h>
#include <net/if.h>
#include <linux/if_tun.h>
+#include <linux/sched.h>
#include <fcntl.h>
#include <tap_rss.h>
@@ -1638,17 +1639,118 @@ tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused,
return 0;
}
+#ifdef TUNGETDEVNETNS
+static void tap_dev_intr_handler(void *cb_arg);
+static int tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set);
+
+static int
+tap_netns_change(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int netns_fd, orig_netns_fd, new_nlsk_fd;
+
+ netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS);
+ if (netns_fd < 0) {
+ TAP_LOG(INFO, "%s: interface deleted", pmd->name);
+ return 0;
+ }
+
+ /* Interface was moved to another namespace */
+ pmd->if_index = 0;
+
+ /* Save current namespace */
+ orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+ if (orig_netns_fd < 0) {
+ TAP_LOG(ERR, "%s: failed to open original netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ return -1;
+ }
+
+ /* Switch to new namespace */
+ if (setns(netns_fd, CLONE_NEWNET) < 0) {
+ TAP_LOG(ERR, "%s: failed to enter new netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ close(orig_netns_fd);
+ return -1;
+ }
+
+ /*
+ * Update ifindex by querying interface name.
+ * The interface now has a new ifindex in the new namespace.
+ */
+ pmd->if_index = if_nametoindex(pmd->name);
+
+ /* Recreate netlink socket in new namespace */
+ new_nlsk_fd = tap_nl_init(0);
+
+ /* Recreate LSC interrupt netlink socket in new namespace */
+ rte_intr_callback_unregister_pending(pmd->intr_handle, tap_dev_intr_handler, dev, NULL);
+ if (tap_lsc_intr_handle_set(dev, 1) < 0)
+ TAP_LOG(WARNING, "%s: failed to recreate LSC interrupt socket",
+ pmd->name);
+
+ /* Switch back to original namespace */
+ if (setns(orig_netns_fd, CLONE_NEWNET) < 0)
+ TAP_LOG(ERR, "%s: failed to return to original netns: %s",
+ pmd->name, strerror(errno));
+
+ close(orig_netns_fd);
+ close(netns_fd);
+
+ if (pmd->if_index == 0) {
+ TAP_LOG(WARNING, "%s: interface moved to another namespace, "
+ "failed to get new ifindex",
+ pmd->name);
+ if (new_nlsk_fd >= 0)
+ close(new_nlsk_fd);
+ return -1;
+ }
+
+ if (new_nlsk_fd < 0) {
+ TAP_LOG(WARNING, "%s: failed to recreate netlink socket in new namespace",
+ pmd->name);
+ return -1;
+ }
+
+ /* Close old netlink socket and replace with new one */
+ if (pmd->nlsk_fd >= 0)
+ tap_nl_final(pmd->nlsk_fd);
+ pmd->nlsk_fd = new_nlsk_fd;
+
+ TAP_LOG(INFO, "%s: interface moved to another namespace, new ifindex: %u",
+ pmd->name, pmd->if_index);
+
+ return 0;
+}
+#endif
+
static int
tap_nl_msg_handler(struct nlmsghdr *nh, void *arg)
{
struct rte_eth_dev *dev = arg;
struct pmd_internals *pmd = dev->data->dev_private;
struct ifinfomsg *info = NLMSG_DATA(nh);
+ int is_local = (info->ifi_index == pmd->if_index);
+ int is_remote = (info->ifi_index == pmd->remote_if_index);
- if (nh->nlmsg_type != RTM_NEWLINK ||
- (info->ifi_index != pmd->if_index &&
- info->ifi_index != pmd->remote_if_index))
+ /* Ignore messages not for our interfaces */
+ if (!is_local && !is_remote)
return 0;
+
+#ifdef TUNGETDEVNETNS
+ if (nh->nlmsg_type == RTM_DELLINK && is_local) {
+ /*
+ * RTM_DELLINK may indicate the interface was moved to another
+ * network namespace. Check if the device still exists by
+ * querying its namespace via the keep-alive fd.
+ */
+ int ret = tap_netns_change(dev);
+ if (ret < 0)
+ return ret;
+ }
+#endif
return tap_link_update(dev, 0);
}
@@ -1677,6 +1779,12 @@ tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set)
return 0;
}
if (set) {
+ /*
+ * Subscribe to RTMGRP_LINK to receive RTM_NEWLINK (link state
+ * changes) events. Also receives RTM_DELLINK events which are
+ * used for namespace change detection when TUNGETDEVNETNS is
+ * available.
+ */
rte_intr_fd_set(pmd->intr_handle, tap_nl_init(RTMGRP_LINK));
if (unlikely(rte_intr_fd_get(pmd->intr_handle) == -1))
return -EBADF;
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH dpdk v2 0/3] net/tap: add network namespace support
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
` (2 preceding siblings ...)
2025-10-27 18:19 ` [PATCH dpdk v2 3/3] net/tap: detect namespace change Robin Jarry
@ 2025-10-27 21:55 ` Stephen Hemminger
3 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2025-10-27 21:55 UTC (permalink / raw)
To: Robin Jarry; +Cc: dev
On Mon, 27 Oct 2025 19:19:27 +0100
Robin Jarry <rjarry@redhat.com> wrote:
> The TAP driver currently uses ioctl operations which are name-based and
> namespace-unaware. When an interface is moved to another namespace, the
> driver loses control and cannot track the device.
>
> This series migrates to netlink-based interface control using ifindex
> instead of names, making operations namespace-safe. When an interface
> moves to another namespace, the driver detects RTM_DELLINK, queries the
> new namespace using TUNGETDEVNETNS, and recreates netlink sockets in
> that namespace to maintain control.
>
> The implementation falls back to ioctl when netlink is unavailable,
> preserving compatibility with older kernels.
>
> Tested by moving TAP interfaces between namespaces while running
> testpmd. All link operations continue to work transparently after
> namespace changes.
>
> v2: completely removed ioctl-based implementation
>
> Robin Jarry (3):
> net/tap: add netlink helpers
> net/tap: replace ioctl with netlink
> net/tap: detect namespace change
>
> drivers/net/tap/rte_eth_tap.c | 412 +++++++++++++++++++---------------
> drivers/net/tap/rte_eth_tap.h | 5 +-
> drivers/net/tap/tap_netlink.c | 291 ++++++++++++++++++++++++
> drivers/net/tap/tap_netlink.h | 10 +-
> 4 files changed, 534 insertions(+), 184 deletions(-)
>
Any documentation or release notes?
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v3 0/3] net/tap: add network namespace support
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
` (4 preceding siblings ...)
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
@ 2025-10-27 22:16 ` Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 1/3] net/tap: add netlink helpers Robin Jarry
` (2 more replies)
5 siblings, 3 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 22:16 UTC (permalink / raw)
To: dev
The TAP driver currently uses ioctl operations which are name-based and
namespace-unaware. When an interface is moved to another namespace, the
driver loses control and cannot track the device.
This series migrates to netlink-based interface control using ifindex
instead of names, making operations namespace-safe. When an interface
moves to another namespace, the driver detects RTM_DELLINK, queries the
new namespace using TUNGETDEVNETNS, and recreates netlink sockets in
that namespace to maintain control.
The implementation falls back to ioctl when netlink is unavailable,
preserving compatibility with older kernels.
Tested by moving TAP interfaces between namespaces while running
testpmd. All link operations continue to work transparently after
namespace changes.
v3: added release notes
v2: completely removed ioctl-based implementation
Robin Jarry (3):
net/tap: add netlink helpers
net/tap: replace ioctl with netlink
net/tap: detect namespace change
doc/guides/rel_notes/release_25_11.rst | 6 +
drivers/net/tap/rte_eth_tap.c | 412 ++++++++++++++-----------
drivers/net/tap/rte_eth_tap.h | 5 +-
drivers/net/tap/tap_netlink.c | 291 +++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
5 files changed, 540 insertions(+), 184 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v3 1/3] net/tap: add netlink helpers
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
@ 2025-10-27 22:16 ` Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 2/3] net/tap: replace ioctl with netlink Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 3/3] net/tap: detect namespace change Robin Jarry
2 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 22:16 UTC (permalink / raw)
To: dev, Stephen Hemminger
Add functions to get/set link flags, MAC address, and MTU using netlink
RTM_GETLINK/RTM_SETLINK messages instead of ioctl.
These will be used in the next commits for a more robust solution that
does not rely on interface names.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
drivers/net/tap/tap_netlink.c | 291 ++++++++++++++++++++++++++++++++++
drivers/net/tap/tap_netlink.h | 10 +-
2 files changed, 299 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap/tap_netlink.c b/drivers/net/tap/tap_netlink.c
index 5ff60f41d426..0682ba87e0da 100644
--- a/drivers/net/tap/tap_netlink.c
+++ b/drivers/net/tap/tap_netlink.c
@@ -6,6 +6,7 @@
#include <errno.h>
#include <inttypes.h>
#include <linux/netlink.h>
+#include <net/if.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>
@@ -411,3 +412,293 @@ tap_nlattr_nested_finish(struct tap_nlmsg *msg)
rte_free(tail);
}
+
+/**
+ * Helper structure to pass data between netlink request and callback
+ */
+struct link_info_ctx {
+ struct ifinfomsg *info;
+ struct rte_ether_addr *mac;
+ unsigned int *flags;
+ unsigned int ifindex;
+ int found;
+};
+
+/**
+ * Callback to extract link information from RTM_GETLINK response
+ */
+static int
+tap_nl_link_cb(struct nlmsghdr *nh, void *arg)
+{
+ struct link_info_ctx *ctx = arg;
+ struct ifinfomsg *ifi = NLMSG_DATA(nh);
+ struct rtattr *rta;
+ int rta_len;
+
+ if (nh->nlmsg_type != RTM_NEWLINK)
+ return 0;
+
+ /* Check if this is the interface we're looking for */
+ if (ifi->ifi_index != (int)ctx->ifindex)
+ return 0;
+
+ ctx->found = 1;
+
+ /* Copy basic info if requested */
+ if (ctx->info)
+ *ctx->info = *ifi;
+
+ /* Extract flags if requested */
+ if (ctx->flags)
+ *ctx->flags = ifi->ifi_flags;
+
+ /* Parse attributes for MAC address if requested */
+ if (ctx->mac) {
+ rta = IFLA_RTA(ifi);
+ rta_len = IFLA_PAYLOAD(nh);
+
+ for (; RTA_OK(rta, rta_len); rta = RTA_NEXT(rta, rta_len)) {
+ if (rta->rta_type == IFLA_ADDRESS) {
+ if (RTA_PAYLOAD(rta) >= RTE_ETHER_ADDR_LEN)
+ memcpy(ctx->mac, RTA_DATA(rta),
+ RTE_ETHER_ADDR_LEN);
+ break;
+ }
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * Get interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Pointer to store interface flags
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .flags = flags,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface flags by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param flags
+ * Flags to set/unset
+ * @param set
+ * 1 to set flags, 0 to unset them
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ .ifi_flags = set ? flags : 0,
+ .ifi_change = flags, /* mask of flags to change */
+ },
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Set interface MTU by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mtu
+ * New MTU value
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MTU attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_MTU;
+ rta->rta_len = RTA_LENGTH(sizeof(mtu));
+ memcpy(RTA_DATA(rta), &mtu, sizeof(mtu));
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
+
+/**
+ * Get interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * Pointer to store MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_get_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_GETLINK,
+ .nlmsg_flags = NLM_F_REQUEST,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct link_info_ctx ctx = {
+ .mac = mac,
+ .ifindex = ifindex,
+ .found = 0,
+ };
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ if (tap_nl_recv(nlsk_fd, tap_nl_link_cb, &ctx) < 0)
+ return -1;
+
+ if (!ctx.found) {
+ errno = ENODEV;
+ return -1;
+ }
+
+ return 0;
+}
+
+/**
+ * Set interface MAC address by ifindex
+ *
+ * @param nlsk_fd
+ * Netlink socket file descriptor
+ * @param ifindex
+ * Interface index
+ * @param mac
+ * New MAC address
+ *
+ * @return
+ * 0 on success, -1 on error
+ */
+int
+tap_nl_set_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac)
+{
+ struct {
+ struct nlmsghdr nh;
+ struct ifinfomsg ifi;
+ char buf[64];
+ } req = {
+ .nh = {
+ .nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
+ .nlmsg_type = RTM_SETLINK,
+ .nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK,
+ },
+ .ifi = {
+ .ifi_family = AF_UNSPEC,
+ .ifi_index = ifindex,
+ },
+ };
+ struct rtattr *rta;
+
+ /* Add MAC address attribute */
+ rta = (struct rtattr *)((char *)&req + NLMSG_ALIGN(req.nh.nlmsg_len));
+ rta->rta_type = IFLA_ADDRESS;
+ rta->rta_len = RTA_LENGTH(RTE_ETHER_ADDR_LEN);
+ memcpy(RTA_DATA(rta), mac, RTE_ETHER_ADDR_LEN);
+ req.nh.nlmsg_len = NLMSG_ALIGN(req.nh.nlmsg_len) + RTA_ALIGN(rta->rta_len);
+
+ if (tap_nl_send(nlsk_fd, &req.nh) < 0)
+ return -1;
+
+ return tap_nl_recv_ack(nlsk_fd);
+}
diff --git a/drivers/net/tap/tap_netlink.h b/drivers/net/tap/tap_netlink.h
index 5eff6edbb1cd..b85be166245e 100644
--- a/drivers/net/tap/tap_netlink.h
+++ b/drivers/net/tap/tap_netlink.h
@@ -6,12 +6,11 @@
#ifndef _TAP_NETLINK_H_
#define _TAP_NETLINK_H_
-#include <ctype.h>
#include <inttypes.h>
#include <linux/rtnetlink.h>
#include <linux/netlink.h>
-#include <stdio.h>
+#include <rte_ether.h>
#include <rte_log.h>
#define NLMSG_BUF 512
@@ -39,4 +38,11 @@ void tap_nlattr_add32(struct tap_nlmsg *msg, unsigned short type, uint32_t data)
int tap_nlattr_nested_start(struct tap_nlmsg *msg, uint16_t type);
void tap_nlattr_nested_finish(struct tap_nlmsg *msg);
+/* Link management functions using netlink */
+int tap_nl_get_flags(int nlsk_fd, unsigned int ifindex, unsigned int *flags);
+int tap_nl_set_flags(int nlsk_fd, unsigned int ifindex, unsigned int flags, int set);
+int tap_nl_set_mtu(int nlsk_fd, unsigned int ifindex, unsigned int mtu);
+int tap_nl_set_mac(int nlsk_fd, unsigned int ifindex, const struct rte_ether_addr *mac);
+int tap_nl_get_mac(int nlsk_fd, unsigned int ifindex, struct rte_ether_addr *mac);
+
#endif /* _TAP_NETLINK_H_ */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v3 2/3] net/tap: replace ioctl with netlink
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 1/3] net/tap: add netlink helpers Robin Jarry
@ 2025-10-27 22:16 ` Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 3/3] net/tap: detect namespace change Robin Jarry
2 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 22:16 UTC (permalink / raw)
To: dev, Stephen Hemminger
Remove ioctl-based link control implementation. All interface operations
now use netlink exclusively via direct tap_nl_* calls.
Remove tap_ctrl/tap_nl_ctrl wrapper functions, enum ctrl_mode, and
ioctl_sock field. Make netlink socket mandatory - driver fails if
netlink is unavailable.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
doc/guides/rel_notes/release_25_11.rst | 5 +
drivers/net/tap/rte_eth_tap.c | 298 ++++++++++---------------
drivers/net/tap/rte_eth_tap.h | 5 +-
3 files changed, 129 insertions(+), 179 deletions(-)
diff --git a/doc/guides/rel_notes/release_25_11.rst b/doc/guides/rel_notes/release_25_11.rst
index c5ba335cfca3..41b6131c80f3 100644
--- a/doc/guides/rel_notes/release_25_11.rst
+++ b/doc/guides/rel_notes/release_25_11.rst
@@ -167,6 +167,11 @@ New Features
The built-in help text function is available as a public function which can be reused by custom functions,
if so desired.
+* **Updated TAP ethernet driver.**
+
+ * Replaced ``ioctl`` based link control with a Netlink based implementation.
+ * Linux net devices can now be renamed without breaking link control.
+
Removed Items
-------------
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 1bc8ae51cf6b..e006c71989a8 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -22,9 +22,7 @@
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
-#include <sys/socket.h>
#include <sys/ioctl.h>
-#include <sys/utsname.h>
#include <sys/mman.h>
#include <errno.h>
#include <signal.h>
@@ -33,12 +31,9 @@
#include <stdlib.h>
#include <sys/uio.h>
#include <unistd.h>
-#include <arpa/inet.h>
#include <net/if.h>
#include <linux/if_tun.h>
-#include <linux/if_ether.h>
#include <fcntl.h>
-#include <ctype.h>
#include <tap_rss.h>
#include <rte_eth_tap.h>
@@ -116,13 +111,6 @@ tap_trigger_cb(int sig __rte_unused)
tap_trigger = (tap_trigger + 1) | 0x80000000;
}
-/* Specifies on what netdevices the ioctl should be applied */
-enum ioctl_mode {
- LOCAL_AND_REMOTE,
- LOCAL_ONLY,
- REMOTE_ONLY,
-};
-
/* Message header to synchronize queues via IPC */
struct ipc_queues {
char port_name[RTE_DEV_NAME_MAX_LEN];
@@ -756,93 +744,28 @@ pmd_tx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
return num_tx;
}
-static const char *
-tap_ioctl_req2str(unsigned long request)
-{
- switch (request) {
- case SIOCSIFFLAGS:
- return "SIOCSIFFLAGS";
- case SIOCGIFFLAGS:
- return "SIOCGIFFLAGS";
- case SIOCGIFHWADDR:
- return "SIOCGIFHWADDR";
- case SIOCSIFHWADDR:
- return "SIOCSIFHWADDR";
- case SIOCSIFMTU:
- return "SIOCSIFMTU";
- }
- return "UNKNOWN";
-}
-
-static int
-tap_ioctl(struct pmd_internals *pmd, unsigned long request,
- struct ifreq *ifr, int set, enum ioctl_mode mode)
-{
- short req_flags = ifr->ifr_flags;
- int remote = pmd->remote_if_index &&
- (mode == REMOTE_ONLY || mode == LOCAL_AND_REMOTE);
-
- if (!pmd->remote_if_index && mode == REMOTE_ONLY)
- return 0;
- /*
- * If there is a remote netdevice, apply ioctl on it, then apply it on
- * the tap netdevice.
- */
-apply:
- if (remote)
- strlcpy(ifr->ifr_name, pmd->remote_iface, IFNAMSIZ);
- else if (mode == LOCAL_ONLY || mode == LOCAL_AND_REMOTE)
- strlcpy(ifr->ifr_name, pmd->name, IFNAMSIZ);
- switch (request) {
- case SIOCSIFFLAGS:
- /* fetch current flags to leave other flags untouched */
- if (ioctl(pmd->ioctl_sock, SIOCGIFFLAGS, ifr) < 0)
- goto error;
- if (set)
- ifr->ifr_flags |= req_flags;
- else
- ifr->ifr_flags &= ~req_flags;
- break;
- case SIOCGIFFLAGS:
- case SIOCGIFHWADDR:
- case SIOCSIFHWADDR:
- case SIOCSIFMTU:
- break;
- default:
- TAP_LOG(WARNING, "%s: ioctl() called with wrong arg",
- pmd->name);
- return -EINVAL;
- }
- if (ioctl(pmd->ioctl_sock, request, ifr) < 0)
- goto error;
- if (remote-- && mode == LOCAL_AND_REMOTE)
- goto apply;
- return 0;
-
-error:
- TAP_LOG(DEBUG, "%s(%s) failed: %s(%d)", ifr->ifr_name,
- tap_ioctl_req2str(request), strerror(errno), errno);
- return -errno;
-}
-
static int
tap_link_set_down(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_UP };
dev->data->dev_link.link_status = RTE_ETH_LINK_DOWN;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_ONLY);
+ return tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_UP, 0);
}
static int
tap_link_set_up(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_UP };
+ int ret;
dev->data->dev_link.link_status = RTE_ETH_LINK_UP;
- return tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_UP, 1);
+ if (ret < 0)
+ return ret;
+ if (pmd->remote_if_index)
+ return tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_UP, 1);
+ return 0;
}
static int
@@ -1131,8 +1054,6 @@ tap_dev_close(struct rte_eth_dev *dev)
if (internals->nlsk_fd != -1) {
tap_flow_flush(dev, NULL);
tap_flow_implicit_flush(internals, NULL);
- tap_nl_final(internals->nlsk_fd);
- internals->nlsk_fd = -1;
tap_flow_bpf_destroy(internals);
}
#endif
@@ -1150,11 +1071,10 @@ tap_dev_close(struct rte_eth_dev *dev)
if (internals->remote_if_index) {
/* Restore initial remote state */
- int ret = ioctl(internals->ioctl_sock, SIOCSIFFLAGS,
- &internals->remote_initial_flags);
+ int ret = tap_nl_set_flags(internals->nlsk_fd, internals->remote_if_index,
+ internals->remote_initial_flags, 1);
if (ret)
TAP_LOG(ERR, "restore remote state failed: %d", ret);
-
}
rte_mempool_free(internals->gso_ctx_mp);
@@ -1174,9 +1094,9 @@ tap_dev_close(struct rte_eth_dev *dev)
rte_intr_instance_free(internals->intr_handle);
- if (internals->ioctl_sock != -1) {
- close(internals->ioctl_sock);
- internals->ioctl_sock = -1;
+ if (internals->nlsk_fd != -1) {
+ tap_nl_final(internals->nlsk_fd);
+ internals->nlsk_fd = -1;
}
free(dev->process_private);
dev->process_private = NULL;
@@ -1231,21 +1151,22 @@ tap_link_update(struct rte_eth_dev *dev, int wait_to_complete __rte_unused)
{
struct rte_eth_link *dev_link = &dev->data->dev_link;
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = 0 };
+ unsigned int flags = 0;
if (pmd->remote_if_index) {
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, REMOTE_ONLY);
- if (!(ifr.ifr_flags & IFF_UP) ||
- !(ifr.ifr_flags & IFF_RUNNING)) {
- dev_link->link_status = RTE_ETH_LINK_DOWN;
- return 0;
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->remote_if_index, &flags) == 0) {
+ if (!(flags & IFF_UP) || !(flags & IFF_RUNNING)) {
+ dev_link->link_status = RTE_ETH_LINK_DOWN;
+ return 0;
+ }
}
}
- tap_ioctl(pmd, SIOCGIFFLAGS, &ifr, 0, LOCAL_ONLY);
- dev_link->link_status =
- ((ifr.ifr_flags & IFF_UP) && (ifr.ifr_flags & IFF_RUNNING) ?
- RTE_ETH_LINK_UP :
- RTE_ETH_LINK_DOWN);
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->if_index, &flags) == 0) {
+ if ((flags & IFF_UP) && (flags & IFF_RUNNING))
+ dev_link->link_status = RTE_ETH_LINK_UP;
+ else
+ dev_link->link_status = RTE_ETH_LINK_DOWN;
+ }
return 0;
}
@@ -1253,20 +1174,28 @@ static int
tap_promisc_enable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 1);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_PROMISC, 1);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->promiscuous = 1;
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 0);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_PROMISC, 0);
/*
* rte_eth_dev_promiscuous_enable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1282,20 +1211,28 @@ static int
tap_promisc_disable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_PROMISC };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 0);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_PROMISC, 0);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->promiscuous = 0;
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_PROMISC);
if (ret != 0) {
/* Rollback promisc flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_PROMISC, 1);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_PROMISC, 1);
/*
* rte_eth_dev_promiscuous_disable() rollback
* dev->data->promiscuous in the case of failure.
@@ -1312,20 +1249,28 @@ static int
tap_allmulti_enable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 1);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_ALLMULTI, 1);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->all_multicast = 1;
ret = tap_flow_implicit_create(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 0);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_ALLMULTI, 0);
/*
* rte_eth_dev_allmulticast_enable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1342,20 +1287,28 @@ static int
tap_allmulti_disable(struct rte_eth_dev *dev)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_flags = IFF_ALLMULTI };
int ret;
- ret = tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 0, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 0);
if (ret != 0)
return ret;
+ if (pmd->remote_if_index) {
+ ret = tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index, IFF_ALLMULTI, 0);
+ if (ret != 0)
+ return ret;
+ }
+
#ifdef HAVE_TCA_FLOWER
if (pmd->remote_if_index && !pmd->flow_isolate) {
dev->data->all_multicast = 0;
ret = tap_flow_implicit_destroy(pmd, TAP_REMOTE_ALLMULTI);
if (ret != 0) {
/* Rollback allmulti flag */
- tap_ioctl(pmd, SIOCSIFFLAGS, &ifr, 1, LOCAL_AND_REMOTE);
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->if_index, IFF_ALLMULTI, 1);
+ if (pmd->remote_if_index)
+ tap_nl_set_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ IFF_ALLMULTI, 1);
/*
* rte_eth_dev_allmulticast_disable() rollback
* dev->data->all_multicast in the case of failure.
@@ -1372,8 +1325,8 @@ static int
tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
{
struct pmd_internals *pmd = dev->data->dev_private;
- enum ioctl_mode mode = LOCAL_ONLY;
- struct ifreq ifr;
+ struct rte_ether_addr current_mac;
+ bool set_remote = false;
int ret;
if (pmd->type == ETH_TUNTAP_TYPE_TUN) {
@@ -1388,28 +1341,31 @@ tap_mac_set(struct rte_eth_dev *dev, struct rte_ether_addr *mac_addr)
return -EINVAL;
}
/* Check the actual current MAC address on the tap netdevice */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, LOCAL_ONLY);
+ ret = tap_nl_get_mac(pmd->nlsk_fd, pmd->if_index, ¤t_mac);
if (ret < 0)
return ret;
- if (rte_is_same_ether_addr(
- (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data,
- mac_addr))
+ if (rte_is_same_ether_addr(¤t_mac, mac_addr))
return 0;
- /* Check the current MAC address on the remote */
- ret = tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY);
- if (ret < 0)
- return ret;
- if (!rte_is_same_ether_addr(
- (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data,
- mac_addr))
- mode = LOCAL_AND_REMOTE;
- ifr.ifr_hwaddr.sa_family = AF_LOCAL;
- rte_ether_addr_copy(mac_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- ret = tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 1, mode);
+ /* Check the current MAC address on the remote */
+ if (pmd->remote_if_index) {
+ ret = tap_nl_get_mac(pmd->nlsk_fd, pmd->remote_if_index, ¤t_mac);
+ if (ret < 0)
+ return ret;
+ if (!rte_is_same_ether_addr(¤t_mac, mac_addr))
+ set_remote = true;
+ }
+
+ ret = tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, mac_addr);
if (ret < 0)
return ret;
+ if (set_remote) {
+ ret = tap_nl_set_mac(pmd->nlsk_fd, pmd->remote_if_index, mac_addr);
+ if (ret < 0)
+ return ret;
+ }
+
rte_ether_addr_copy(mac_addr, &pmd->eth_addr);
#ifdef HAVE_TCA_FLOWER
@@ -1658,9 +1614,16 @@ static int
tap_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
{
struct pmd_internals *pmd = dev->data->dev_private;
- struct ifreq ifr = { .ifr_mtu = mtu };
+ int ret;
- return tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE);
+ ret = tap_nl_set_mtu(pmd->nlsk_fd, pmd->if_index, mtu);
+ if (ret < 0)
+ return ret;
+
+ if (pmd->remote_if_index)
+ return tap_nl_set_mtu(pmd->nlsk_fd, pmd->remote_if_index, mtu);
+
+ return 0;
}
static int
@@ -1921,7 +1884,6 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
struct pmd_process_private *process_private;
const char *tuntap_name = tuntap_types[type];
struct rte_eth_dev_data *data;
- struct ifreq ifr;
int i;
TAP_LOG(DEBUG, "%s device on numa %u", tuntap_name, rte_socket_id());
@@ -1946,20 +1908,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->name, tap_name, sizeof(pmd->name));
pmd->type = type;
pmd->ka_fd = -1;
-
-#ifdef HAVE_TCA_FLOWER
pmd->nlsk_fd = -1;
-#endif
pmd->gso_ctx_mp = NULL;
- pmd->ioctl_sock = socket(AF_INET, SOCK_DGRAM, 0);
- if (pmd->ioctl_sock == -1) {
- TAP_LOG(ERR,
- "%s Unable to get a socket for management: %s",
- tuntap_name, strerror(errno));
- goto error_exit;
- }
-
/* Allocate interrupt instance */
pmd->intr_handle = rte_intr_instance_alloc(RTE_INTR_INSTANCE_F_SHARED);
if (pmd->intr_handle == NULL) {
@@ -2013,15 +1964,27 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
}
TAP_LOG(DEBUG, "allocated %s", pmd->name);
- ifr.ifr_mtu = dev->data->mtu;
- if (tap_ioctl(pmd, SIOCSIFMTU, &ifr, 1, LOCAL_AND_REMOTE) < 0)
+ /*
+ * Create netlink socket for interface control.
+ * Netlink provides ifindex-based operations and is namespace-safe.
+ */
+ pmd->nlsk_fd = tap_nl_init(0);
+ if (pmd->nlsk_fd == -1) {
+ TAP_LOG(ERR, "%s: failed to create netlink socket.", pmd->name);
+ goto error_exit;
+ }
+
+ pmd->if_index = if_nametoindex(pmd->name);
+ if (!pmd->if_index) {
+ TAP_LOG(ERR, "%s: failed to get if_index.", pmd->name);
+ goto error_exit;
+ }
+
+ if (tap_nl_set_mtu(pmd->nlsk_fd, pmd->if_index, dev->data->mtu) < 0)
goto error_exit;
if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
- memset(&ifr, 0, sizeof(struct ifreq));
- ifr.ifr_hwaddr.sa_family = AF_LOCAL;
- rte_ether_addr_copy(&pmd->eth_addr, (struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data);
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0)
+ if (tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, &pmd->eth_addr) < 0)
goto error_exit;
}
@@ -2031,23 +1994,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
#ifdef HAVE_TCA_FLOWER
/*
* Set up everything related to rte_flow:
- * - netlink socket
- * - tap / remote if_index
* - mandatory QDISCs
* - rte_flow actual/implicit lists
* - implicit rules
*/
- pmd->nlsk_fd = tap_nl_init(0);
- if (pmd->nlsk_fd == -1) {
- TAP_LOG(WARNING, "%s: failed to create netlink socket.",
- pmd->name);
- goto disable_rte_flow;
- }
- pmd->if_index = if_nametoindex(pmd->name);
- if (!pmd->if_index) {
- TAP_LOG(ERR, "%s: failed to get if_index.", pmd->name);
- goto disable_rte_flow;
- }
if (qdisc_create_multiq(pmd->nlsk_fd, pmd->if_index) < 0) {
TAP_LOG(ERR, "%s: failed to create multiq qdisc.",
pmd->name);
@@ -2071,19 +2021,19 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
strlcpy(pmd->remote_iface, remote_iface, RTE_ETH_NAME_MAX_LEN);
/* Save state of remote device */
- tap_ioctl(pmd, SIOCGIFFLAGS, &pmd->remote_initial_flags, 0, REMOTE_ONLY);
+ if (tap_nl_get_flags(pmd->nlsk_fd, pmd->remote_if_index,
+ &pmd->remote_initial_flags) < 0)
+ pmd->remote_initial_flags = 0;
/* Replicate remote MAC address */
- if (tap_ioctl(pmd, SIOCGIFHWADDR, &ifr, 0, REMOTE_ONLY) < 0) {
+ if (tap_nl_get_mac(pmd->nlsk_fd, pmd->remote_if_index, &pmd->eth_addr) < 0) {
TAP_LOG(ERR, "%s: failed to get %s MAC address.",
pmd->name, pmd->remote_iface);
goto error_remote;
}
- rte_ether_addr_copy((struct rte_ether_addr *)&ifr.ifr_hwaddr.sa_data, &pmd->eth_addr);
- /* The desired MAC is already in ifreq after SIOCGIFHWADDR. */
- if (tap_ioctl(pmd, SIOCSIFHWADDR, &ifr, 0, LOCAL_ONLY) < 0) {
- TAP_LOG(ERR, "%s: failed to get %s MAC address.",
+ if (tap_nl_set_mac(pmd->nlsk_fd, pmd->if_index, &pmd->eth_addr) < 0) {
+ TAP_LOG(ERR, "%s: failed to set %s MAC address.",
pmd->name, remote_iface);
goto error_remote;
}
@@ -2134,14 +2084,10 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
#endif
error_exit:
-#ifdef HAVE_TCA_FLOWER
if (pmd->nlsk_fd != -1)
close(pmd->nlsk_fd);
-#endif
if (pmd->ka_fd != -1)
close(pmd->ka_fd);
- if (pmd->ioctl_sock != -1)
- close(pmd->ioctl_sock);
/* mac_addrs must not be freed alone because part of dev_private */
dev->data->mac_addrs = NULL;
rte_intr_instance_free(pmd->intr_handle);
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index ce4322ad046e..218ee1b811d8 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -73,13 +73,12 @@ struct pmd_internals {
int type; /* Type field - TUN|TAP */
int persist; /* 1 if keep link up, else 0 */
struct rte_ether_addr eth_addr; /* Mac address of the device port */
- struct ifreq remote_initial_flags;/* Remote netdevice flags on init */
+ unsigned int remote_initial_flags;/* Remote netdevice flags on init */
int remote_if_index; /* remote netdevice IF_INDEX */
int if_index; /* IF_INDEX for the port */
- int ioctl_sock; /* socket for ioctl calls */
+ int nlsk_fd; /* Netlink socket fd */
#ifdef HAVE_TCA_FLOWER
- int nlsk_fd; /* Netlink socket fd */
int flow_isolate; /* 1 if flow isolation is enabled */
struct tap_rss *rss; /* BPF program */
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH dpdk v3 3/3] net/tap: detect namespace change
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 1/3] net/tap: add netlink helpers Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 2/3] net/tap: replace ioctl with netlink Robin Jarry
@ 2025-10-27 22:16 ` Robin Jarry
2 siblings, 0 replies; 17+ messages in thread
From: Robin Jarry @ 2025-10-27 22:16 UTC (permalink / raw)
To: dev, Stephen Hemminger
When an interface is moved to another network namespace, the kernel
sends RTM_DELLINK. Detect this case by using TUNGETDEVNETNS ioctl on the
keep-alive fd. If successful, the interface still exists but in
a different namespace.
To handle this, temporarily switch to the new namespace using setns(),
query the new ifindex, recreate netlink and LSC interrupt sockets in
that namespace, then switch back. Replace the old netlink socket with
the new one so subsequent operations work in the target namespace.
This allows the driver to track interfaces across namespace changes
without losing control.
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
doc/guides/rel_notes/release_25_11.rst | 1 +
drivers/net/tap/rte_eth_tap.c | 114 ++++++++++++++++++++++++-
2 files changed, 112 insertions(+), 3 deletions(-)
diff --git a/doc/guides/rel_notes/release_25_11.rst b/doc/guides/rel_notes/release_25_11.rst
index 41b6131c80f3..fe191bd78aa2 100644
--- a/doc/guides/rel_notes/release_25_11.rst
+++ b/doc/guides/rel_notes/release_25_11.rst
@@ -171,6 +171,7 @@ New Features
* Replaced ``ioctl`` based link control with a Netlink based implementation.
* Linux net devices can now be renamed without breaking link control.
+ * Linux net devices can now be moved to different namespaces without breaking link control.
Removed Items
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index e006c71989a8..bb96aa7e61ec 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -33,6 +33,7 @@
#include <unistd.h>
#include <net/if.h>
#include <linux/if_tun.h>
+#include <linux/sched.h>
#include <fcntl.h>
#include <tap_rss.h>
@@ -1638,17 +1639,118 @@ tap_set_mc_addr_list(struct rte_eth_dev *dev __rte_unused,
return 0;
}
+#ifdef TUNGETDEVNETNS
+static void tap_dev_intr_handler(void *cb_arg);
+static int tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set);
+
+static int
+tap_netns_change(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int netns_fd, orig_netns_fd, new_nlsk_fd;
+
+ netns_fd = ioctl(pmd->ka_fd, TUNGETDEVNETNS);
+ if (netns_fd < 0) {
+ TAP_LOG(INFO, "%s: interface deleted", pmd->name);
+ return 0;
+ }
+
+ /* Interface was moved to another namespace */
+ pmd->if_index = 0;
+
+ /* Save current namespace */
+ orig_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+ if (orig_netns_fd < 0) {
+ TAP_LOG(ERR, "%s: failed to open original netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ return -1;
+ }
+
+ /* Switch to new namespace */
+ if (setns(netns_fd, CLONE_NEWNET) < 0) {
+ TAP_LOG(ERR, "%s: failed to enter new netns: %s",
+ pmd->name, strerror(errno));
+ close(netns_fd);
+ close(orig_netns_fd);
+ return -1;
+ }
+
+ /*
+ * Update ifindex by querying interface name.
+ * The interface now has a new ifindex in the new namespace.
+ */
+ pmd->if_index = if_nametoindex(pmd->name);
+
+ /* Recreate netlink socket in new namespace */
+ new_nlsk_fd = tap_nl_init(0);
+
+ /* Recreate LSC interrupt netlink socket in new namespace */
+ rte_intr_callback_unregister_pending(pmd->intr_handle, tap_dev_intr_handler, dev, NULL);
+ if (tap_lsc_intr_handle_set(dev, 1) < 0)
+ TAP_LOG(WARNING, "%s: failed to recreate LSC interrupt socket",
+ pmd->name);
+
+ /* Switch back to original namespace */
+ if (setns(orig_netns_fd, CLONE_NEWNET) < 0)
+ TAP_LOG(ERR, "%s: failed to return to original netns: %s",
+ pmd->name, strerror(errno));
+
+ close(orig_netns_fd);
+ close(netns_fd);
+
+ if (pmd->if_index == 0) {
+ TAP_LOG(WARNING, "%s: interface moved to another namespace, "
+ "failed to get new ifindex",
+ pmd->name);
+ if (new_nlsk_fd >= 0)
+ close(new_nlsk_fd);
+ return -1;
+ }
+
+ if (new_nlsk_fd < 0) {
+ TAP_LOG(WARNING, "%s: failed to recreate netlink socket in new namespace",
+ pmd->name);
+ return -1;
+ }
+
+ /* Close old netlink socket and replace with new one */
+ if (pmd->nlsk_fd >= 0)
+ tap_nl_final(pmd->nlsk_fd);
+ pmd->nlsk_fd = new_nlsk_fd;
+
+ TAP_LOG(INFO, "%s: interface moved to another namespace, new ifindex: %u",
+ pmd->name, pmd->if_index);
+
+ return 0;
+}
+#endif
+
static int
tap_nl_msg_handler(struct nlmsghdr *nh, void *arg)
{
struct rte_eth_dev *dev = arg;
struct pmd_internals *pmd = dev->data->dev_private;
struct ifinfomsg *info = NLMSG_DATA(nh);
+ int is_local = (info->ifi_index == pmd->if_index);
+ int is_remote = (info->ifi_index == pmd->remote_if_index);
- if (nh->nlmsg_type != RTM_NEWLINK ||
- (info->ifi_index != pmd->if_index &&
- info->ifi_index != pmd->remote_if_index))
+ /* Ignore messages not for our interfaces */
+ if (!is_local && !is_remote)
return 0;
+
+#ifdef TUNGETDEVNETNS
+ if (nh->nlmsg_type == RTM_DELLINK && is_local) {
+ /*
+ * RTM_DELLINK may indicate the interface was moved to another
+ * network namespace. Check if the device still exists by
+ * querying its namespace via the keep-alive fd.
+ */
+ int ret = tap_netns_change(dev);
+ if (ret < 0)
+ return ret;
+ }
+#endif
return tap_link_update(dev, 0);
}
@@ -1677,6 +1779,12 @@ tap_lsc_intr_handle_set(struct rte_eth_dev *dev, int set)
return 0;
}
if (set) {
+ /*
+ * Subscribe to RTMGRP_LINK to receive RTM_NEWLINK (link state
+ * changes) events. Also receives RTM_DELLINK events which are
+ * used for namespace change detection when TUNGETDEVNETNS is
+ * available.
+ */
rte_intr_fd_set(pmd->intr_handle, tap_nl_init(RTMGRP_LINK));
if (unlikely(rte_intr_fd_get(pmd->intr_handle) == -1))
return -EBADF;
--
2.51.0
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-10-27 22:18 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-27 15:37 [PATCH dpdk 0/4] net/tap: add network namespace support Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 1/4] net/tap: add netlink helpers Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 2/4] net/tap: rename internal ioctl wrapper Robin Jarry
2025-10-27 15:37 ` [PATCH dpdk 3/4] net/tap: use netlink if possible Robin Jarry
2025-10-27 16:06 ` Stephen Hemminger
2025-10-27 16:10 ` Robin Jarry
2025-10-27 16:58 ` Stephen Hemminger
2025-10-27 15:37 ` [PATCH dpdk 4/4] net/tap: detect namespace change Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 1/3] net/tap: add netlink helpers Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 2/3] net/tap: replace ioctl with netlink Robin Jarry
2025-10-27 18:19 ` [PATCH dpdk v2 3/3] net/tap: detect namespace change Robin Jarry
2025-10-27 21:55 ` [PATCH dpdk v2 0/3] net/tap: add network namespace support Stephen Hemminger
2025-10-27 22:16 ` [PATCH dpdk v3 " Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 1/3] net/tap: add netlink helpers Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 2/3] net/tap: replace ioctl with netlink Robin Jarry
2025-10-27 22:16 ` [PATCH dpdk v3 3/3] net/tap: detect namespace change Robin Jarry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).