From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id BAAE7A0565; Tue, 10 Mar 2020 12:10:55 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AF8B81C001; Tue, 10 Mar 2020 12:10:47 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id A34E01BFFC for ; Tue, 10 Mar 2020 12:10:46 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02AB5gBc007525; Tue, 10 Mar 2020 04:10:46 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=CgUCevlhUoYl43LFRPMvltMfrC2w+nzreB606PKUe3s=; b=yZykNSnbxOgXBY2cOOeb61zXm1OdOsSa/BfXryoDQbxQprEEK2gqx5kzsesgMw3DrNQ6 u0+pKzjrxdXsZZTO0jPh/cBPR78RnOp28Hr4RTPR5G3k0x116J7PYC4fm7KHd7twO16W W7i75IXAKjedqF6KojGjhqW+AkadfZ52YQAiQoRZpOI4BxDQTCs+UmubaSVRCqyk85Tu zMdAFG5fzOINm0jVzlEfJAg4T4bw4kYEAm/sDmXm6iOnbVbVBP9DxHmW1UiFNe/bUTzI QmqDZADKfN5ykN3D2JXgWU35co0zXYh1qPOxuRa+SRTvpnRTUoqI3ABIRe/06pegcSbL sQ== Received: from sc-exch01.marvell.com ([199.233.58.181]) by mx0b-0016f401.pphosted.com with ESMTP id 2yp04fjt4c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 10 Mar 2020 04:10:45 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by SC-EXCH01.marvell.com (10.93.176.81) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 10 Mar 2020 04:10:43 -0700 Received: from SC-EXCH03.marvell.com (10.93.176.83) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 10 Mar 2020 04:10:42 -0700 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 10 Mar 2020 04:10:41 -0700 Received: from amok.marvell.com (unknown [10.95.130.113]) by maili.marvell.com (Postfix) with ESMTP id 7EC9B3F7040; Tue, 10 Mar 2020 04:10:40 -0700 (PDT) From: Andrzej Ostruszka To: , Thomas Monjalon Date: Tue, 10 Mar 2020 12:10:34 +0100 Message-ID: <20200310111037.31451-2-aostruszka@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200310111037.31451-1-aostruszka@marvell.com> References: <20200306164104.15528-1-aostruszka@marvell.com> <20200310111037.31451-1-aostruszka@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-03-10_06:2020-03-10, 2020-03-10 signatures=0 Subject: [dpdk-dev] [PATCH v2 1/4] lib: introduce IF Proxy library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This library allows to designate ports visible to the system (such as Tun/Tap or KNI) as port representors serving as proxies for other DPDK ports. When such a proxy is configured this library initially queries network configuration from the system and later monitors its changes. The information gathered is passed to the application either via a set of user registered callbacks or as an event added to the configured notification queue (or a combination of these two mechanisms). This way user can use normal network utilities (like those from the iproute2 suite) to configure DPDK ports. Signed-off-by: Andrzej Ostruszka --- MAINTAINERS | 3 + config/common_base | 5 + config/common_linux | 1 + lib/Makefile | 2 + .../common/include/rte_eal_interrupts.h | 2 + lib/librte_eal/linux/eal/eal_interrupts.c | 14 +- lib/librte_if_proxy/Makefile | 29 + lib/librte_if_proxy/if_proxy_common.c | 494 +++++++++++++++ lib/librte_if_proxy/if_proxy_priv.h | 97 +++ lib/librte_if_proxy/linux/Makefile | 4 + lib/librte_if_proxy/linux/if_proxy.c | 550 +++++++++++++++++ lib/librte_if_proxy/meson.build | 19 + lib/librte_if_proxy/rte_if_proxy.h | 561 ++++++++++++++++++ lib/librte_if_proxy/rte_if_proxy_version.map | 19 + lib/meson.build | 2 +- 15 files changed, 1797 insertions(+), 5 deletions(-) create mode 100644 lib/librte_if_proxy/Makefile create mode 100644 lib/librte_if_proxy/if_proxy_common.c create mode 100644 lib/librte_if_proxy/if_proxy_priv.h create mode 100644 lib/librte_if_proxy/linux/Makefile create mode 100644 lib/librte_if_proxy/linux/if_proxy.c create mode 100644 lib/librte_if_proxy/meson.build create mode 100644 lib/librte_if_proxy/rte_if_proxy.h create mode 100644 lib/librte_if_proxy/rte_if_proxy_version.map diff --git a/MAINTAINERS b/MAINTAINERS index f4e0ed8e0..aec7326ca 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1469,6 +1469,9 @@ F: examples/bpf/ F: app/test/test_bpf.c F: doc/guides/prog_guide/bpf_lib.rst +IF Proxy - EXPERIMENTAL +M: Andrzej Ostruszka +F: lib/librte_if_proxy/ Test Applications ----------------- diff --git a/config/common_base b/config/common_base index 7ca2f28b1..dcc0a0650 100644 --- a/config/common_base +++ b/config/common_base @@ -1075,6 +1075,11 @@ CONFIG_RTE_LIBRTE_BPF_ELF=n # CONFIG_RTE_LIBRTE_IPSEC=y +# +# Compile librte_if_proxy +# +CONFIG_RTE_LIBRTE_IF_PROXY=n + # # Compile the test application # diff --git a/config/common_linux b/config/common_linux index 816810671..1244eb0ae 100644 --- a/config/common_linux +++ b/config/common_linux @@ -16,6 +16,7 @@ CONFIG_RTE_LIBRTE_VHOST_NUMA=y CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_IFC_PMD=y +CONFIG_RTE_LIBRTE_IF_PROXY=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_MEMIF=y CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y diff --git a/lib/Makefile b/lib/Makefile index 46b91ae1a..6a20806f1 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -118,6 +118,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_TELEMETRY) += librte_telemetry DEPDIRS-librte_telemetry := librte_eal librte_metrics librte_ethdev DIRS-$(CONFIG_RTE_LIBRTE_RCU) += librte_rcu DEPDIRS-librte_rcu := librte_eal +DIRS-$(CONFIG_RTE_LIBRTE_IF_PROXY) += librte_if_proxy +DEPDIRS-librte_if_proxy := librte_eal librte_ethdev ifeq ($(CONFIG_RTE_EXEC_ENV_LINUX),y) DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git a/lib/librte_eal/common/include/rte_eal_interrupts.h b/lib/librte_eal/common/include/rte_eal_interrupts.h index 773a34a42..296a3853d 100644 --- a/lib/librte_eal/common/include/rte_eal_interrupts.h +++ b/lib/librte_eal/common/include/rte_eal_interrupts.h @@ -36,6 +36,8 @@ enum rte_intr_handle_type { RTE_INTR_HANDLE_VDEV, /**< virtual device */ RTE_INTR_HANDLE_DEV_EVENT, /**< device event handle */ RTE_INTR_HANDLE_VFIO_REQ, /**< VFIO request handle */ + RTE_INTR_HANDLE_NETLINK, /**< netlink notification handle */ + RTE_INTR_HANDLE_MAX /**< count of elements */ }; diff --git a/lib/librte_eal/linux/eal/eal_interrupts.c b/lib/librte_eal/linux/eal/eal_interrupts.c index cb8e10709..16236a8c4 100644 --- a/lib/librte_eal/linux/eal/eal_interrupts.c +++ b/lib/librte_eal/linux/eal/eal_interrupts.c @@ -680,6 +680,9 @@ rte_intr_enable(const struct rte_intr_handle *intr_handle) break; /* not used at this moment */ case RTE_INTR_HANDLE_ALARM: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif return -1; #ifdef VFIO_PRESENT case RTE_INTR_HANDLE_VFIO_MSIX: @@ -796,6 +799,9 @@ rte_intr_disable(const struct rte_intr_handle *intr_handle) break; /* not used at this moment */ case RTE_INTR_HANDLE_ALARM: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif return -1; #ifdef VFIO_PRESENT case RTE_INTR_HANDLE_VFIO_MSIX: @@ -889,12 +895,12 @@ eal_intr_process_interrupts(struct epoll_event *events, int nfds) break; #endif #endif - case RTE_INTR_HANDLE_VDEV: case RTE_INTR_HANDLE_EXT: - bytes_read = 0; - call = true; - break; + case RTE_INTR_HANDLE_VDEV: case RTE_INTR_HANDLE_DEV_EVENT: +#if RTE_LIBRTE_IF_PROXY + case RTE_INTR_HANDLE_NETLINK: +#endif bytes_read = 0; call = true; break; diff --git a/lib/librte_if_proxy/Makefile b/lib/librte_if_proxy/Makefile new file mode 100644 index 000000000..43cb702a2 --- /dev/null +++ b/lib/librte_if_proxy/Makefile @@ -0,0 +1,29 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2020 Marvell International Ltd. + +include $(RTE_SDK)/mk/rte.vars.mk + +# library name +LIB = librte_if_proxy.a + +CFLAGS += -DALLOW_EXPERIMENTAL_API +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) +LDLIBS += -lrte_eal -lrte_ethdev + +EXPORT_MAP := rte_if_proxy_version.map + +LIBABIVER := 1 + +# all source are stored in SRCS-y +SRCS-$(CONFIG_RTE_LIBRTE_IF_PROXY) := if_proxy_common.c + +SYSDIR := $(patsubst "%app",%,$(CONFIG_RTE_EXEC_ENV)) +include $(SRCDIR)/$(SYSDIR)/Makefile + +SRCS-$(CONFIG_RTE_LIBRTE_IF_PROXY) += $(addprefix $(SYSDIR)/,$(SRCS)) + +# install this header file +SYMLINK-$(CONFIG_RTE_LIBRTE_IF_PROXY)-include := rte_if_proxy.h + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_if_proxy/if_proxy_common.c b/lib/librte_if_proxy/if_proxy_common.c new file mode 100644 index 000000000..546dc7810 --- /dev/null +++ b/lib/librte_if_proxy/if_proxy_common.c @@ -0,0 +1,494 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#include +#include + + +/* Definitions of data mentioned in if_proxy_priv.h and local ones. */ +int ifpx_log_type; + +uint16_t ifpx_ports[RTE_MAX_ETHPORTS]; + +rte_spinlock_t ifpx_lock = RTE_SPINLOCK_INITIALIZER; + +struct ifpx_proxies_head ifpx_proxies = TAILQ_HEAD_INITIALIZER(ifpx_proxies); + +struct ifpx_queue_node { + TAILQ_ENTRY(ifpx_queue_node) elem; + uint16_t state; + struct rte_ring *r; +}; +static +TAILQ_HEAD(ifpx_queues_head, ifpx_queue_node) ifpx_queues = + TAILQ_HEAD_INITIALIZER(ifpx_queues); + +/* All function pointers have the same size - so use this one to typecast + * different callbacks in rte_ifpx_callbacks and test their presence in a + * generic way. + */ +union cb_ptr_t { + int (*f_ptr)(void *ev); /* type for normal event notification */ + int (*cfg_done)(void); /* lib notification for finished config */ +}; +union { + struct rte_ifpx_callbacks cbs; + union cb_ptr_t funcs[RTE_IFPX_NUM_EVENTS]; +} ifpx_callbacks; + +uint64_t rte_ifpx_events_available(void) +{ + /* All events are supported on Linux. */ + return (1ULL << RTE_IFPX_NUM_EVENTS) - 1; +} + +uint16_t rte_ifpx_proxy_create(enum rte_ifpx_proxy_type type) +{ + char devargs[16] = { '\0' }; + int dev_cnt = 0, nlen; + uint16_t port_id; + + switch (type) { + case RTE_IFPX_DEFAULT: + case RTE_IFPX_TAP: + nlen = strlcpy(devargs, "net_tap", sizeof(devargs)); + break; + case RTE_IFPX_KNI: + nlen = strlcpy(devargs, "net_kni", sizeof(devargs)); + break; + default: + IFPX_LOG(ERR, "Unknown proxy type: %d", type); + return RTE_MAX_ETHPORTS; + } + + RTE_ETH_FOREACH_DEV(port_id) { + if (strcmp(rte_eth_devices[port_id].device->driver->name, + devargs) == 0) + ++dev_cnt; + } + snprintf(devargs+nlen, sizeof(devargs)-nlen, "%d", dev_cnt); + + return rte_ifpx_proxy_create_by_devarg(devargs); +} + +uint16_t rte_ifpx_proxy_create_by_devarg(const char *devarg) +{ + uint16_t port_id = RTE_MAX_ETHPORTS; + struct rte_dev_iterator iter; + + if (rte_dev_probe(devarg) < 0) { + IFPX_LOG(ERR, "Failed to create proxy port %s\n", devarg); + return RTE_MAX_ETHPORTS; + } + + if (rte_eth_iterator_init(&iter, devarg) == 0) { + port_id = rte_eth_iterator_next(&iter); + if (port_id != RTE_MAX_ETHPORTS) + rte_eth_iterator_cleanup(&iter); + } + + return port_id; +} + +int ifpx_proxy_destroy(struct ifpx_proxy_node *px) +{ + unsigned int i; + uint16_t proxy_id = px->proxy_id; + + TAILQ_REMOVE(&ifpx_proxies, px, elem); + free(px); + + /* Clear any bindings for this proxy. */ + for (i = 0; i < RTE_DIM(ifpx_ports); ++i) { + if (ifpx_ports[i] == proxy_id) { + if (i == proxy_id) /* this entry is for proxy itself */ + ifpx_ports[i] = RTE_MAX_ETHPORTS; + else + rte_ifpx_port_unbind(i); + } + } + + return rte_dev_remove(rte_eth_devices[proxy_id].device); +} + +int rte_ifpx_proxy_destroy(uint16_t proxy_id) +{ + struct ifpx_proxy_node *px; + int ec = 0; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id != proxy_id) + continue; + } + if (!px) { + ec = -EINVAL; + goto exit; + } + if (px->state & IN_USE) + px->state |= DEL_PENDING; + else + ec = ifpx_proxy_destroy(px); +exit: + rte_spinlock_unlock(&ifpx_lock); + return ec; +} + +int rte_ifpx_queue_add(struct rte_ring *r) +{ + struct ifpx_queue_node *node; + int ec = 0; + + if (!r) + return -EINVAL; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(node, &ifpx_queues, elem) { + if (node->r == r) { + ec = -EEXIST; + goto exit; + } + } + + node = malloc(sizeof(*node)); + if (!node) { + ec = -ENOMEM; + goto exit; + } + + node->r = r; + TAILQ_INSERT_TAIL(&ifpx_queues, node, elem); +exit: + rte_spinlock_unlock(&ifpx_lock); + + return ec; +} + +int rte_ifpx_queue_remove(struct rte_ring *r) +{ + struct ifpx_queue_node *node, *next; + int ec = -EINVAL; + + if (!r) + return ec; + + rte_spinlock_lock(&ifpx_lock); + for (node = TAILQ_FIRST(&ifpx_queues); node; node = next) { + next = TAILQ_NEXT(node, elem); + if (node->r != r) + continue; + TAILQ_REMOVE(&ifpx_queues, node, elem); + free(node); + ec = 0; + break; + } + rte_spinlock_unlock(&ifpx_lock); + + return ec; +} + +int rte_ifpx_port_bind(uint16_t port_id, uint16_t proxy_id) +{ + struct rte_eth_dev_info proxy_eth_info; + struct ifpx_proxy_node *px; + int ec; + + if (port_id >= RTE_MAX_ETHPORTS || proxy_id >= RTE_MAX_ETHPORTS || + /* port is a proxy */ + ifpx_ports[port_id] == port_id) { + IFPX_LOG(ERR, "Invalid port_id: %d", port_id); + return -EINVAL; + } + + /* Do automatic rebinding but issue a warning since this is not + * considered to be a valid behaviour. + */ + if (ifpx_ports[port_id] != RTE_MAX_ETHPORTS) { + IFPX_LOG(WARNING, "Port already bound: %d -> %d", port_id, + ifpx_ports[port_id]); + } + + /* Search for existing proxy - if not found add one to the list. */ + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id == proxy_id) + break; + } + if (!px) { + ec = rte_eth_dev_info_get(proxy_id, &proxy_eth_info); + if (ec < 0 || proxy_eth_info.if_index == 0) { + IFPX_LOG(ERR, "Invalid proxy: %d", proxy_id); + rte_spinlock_unlock(&ifpx_lock); + return ec < 0 ? ec : -EINVAL; + } + px = malloc(sizeof(*px)); + if (!px) { + rte_spinlock_unlock(&ifpx_lock); + return -ENOMEM; + } + px->proxy_id = proxy_id; + px->info.if_index = proxy_eth_info.if_index; + rte_eth_dev_get_mtu(proxy_id, &px->info.mtu); + rte_eth_macaddr_get(proxy_id, &px->info.mac); + memset(px->info.if_name, 0, sizeof(px->info.if_name)); + TAILQ_INSERT_TAIL(&ifpx_proxies, px, elem); + ifpx_ports[proxy_id] = proxy_id; + } + rte_spinlock_unlock(&ifpx_lock); + ifpx_ports[port_id] = proxy_id; + + /* Add proxy MAC to the port - since port will often just forward + * packets from the proxy/system they will be sent with proxy MAC as + * src. In order to pass communication in other direction we should be + * accepting packets with proxy MAC as dst. + */ + rte_eth_dev_mac_addr_add(port_id, &px->info.mac, 0); + + if (ifpx_platform.get_info) + ifpx_platform.get_info(px->info.if_index); + + return 0; +} + +int rte_ifpx_port_unbind(uint16_t port_id) +{ + if (port_id >= RTE_MAX_ETHPORTS || + ifpx_ports[port_id] == RTE_MAX_ETHPORTS || + /* port is a proxy */ + ifpx_ports[port_id] == port_id) + return -EINVAL; + + ifpx_ports[port_id] = RTE_MAX_ETHPORTS; + /* Proxy without any port bound is OK - that is the state of the proxy + * that has just been created, and it can still report routing + * information. So we do not even check if this is the case. + */ + + return 0; +} + +int rte_ifpx_callbacks_register(const struct rte_ifpx_callbacks *cbs) +{ + if (!cbs) + return -EINVAL; + + rte_spinlock_lock(&ifpx_lock); + ifpx_callbacks.cbs = *cbs; + rte_spinlock_unlock(&ifpx_lock); + + return 0; +} + +void rte_ifpx_callbacks_unregister(void) +{ + rte_spinlock_lock(&ifpx_lock); + memset(&ifpx_callbacks.cbs, 0, sizeof(ifpx_callbacks.cbs)); + rte_spinlock_unlock(&ifpx_lock); +} + +uint16_t rte_ifpx_proxy_get(uint16_t port_id) +{ + if (port_id >= RTE_MAX_ETHPORTS) + return RTE_MAX_ETHPORTS; + + return ifpx_ports[port_id]; +} + +unsigned int rte_ifpx_port_get(uint16_t proxy_id, + uint16_t *ports, unsigned int num) +{ + unsigned int p, cnt = 0; + + for (p = 0; p < RTE_DIM(ifpx_ports); ++p) { + if (ifpx_ports[p] == proxy_id && ifpx_ports[p] != p) { + ++cnt; + if (ports && num > 0) { + *ports++ = p; + --num; + } + } + } + return cnt; +} + +const struct rte_ifpx_info *rte_ifpx_info_get(uint16_t port_id) +{ + struct ifpx_proxy_node *px; + + if (port_id >= RTE_MAX_ETHPORTS || + ifpx_ports[port_id] == RTE_MAX_ETHPORTS) + return NULL; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->proxy_id == ifpx_ports[port_id]) + break; + } + rte_spinlock_unlock(&ifpx_lock); + RTE_ASSERT(px && "Internal IF Proxy library error"); + + return &px->info; +} + +static +void queue_event(const struct rte_ifpx_event *ev, struct rte_ring *r) +{ + struct rte_ifpx_event *e = malloc(sizeof(*ev)); + + if (!e) { + IFPX_LOG(ERR, "Failed to allocate event!"); + return; + } + RTE_ASSERT(r); + + *e = *ev; + rte_ring_sp_enqueue(r, e); +} + +void ifpx_notify_event(struct rte_ifpx_event *ev, struct ifpx_proxy_node *px) +{ + struct ifpx_queue_node *q; + int done = 0; + uint16_t p, proxy_id; + + if (px) { + if (px->state & DEL_PENDING) + return; + proxy_id = px->proxy_id; + RTE_ASSERT(proxy_id != RTE_MAX_ETHPORTS); + px->state |= IN_USE; + } else + proxy_id = RTE_MAX_ETHPORTS; + + RTE_ASSERT(ev); + /* This function is expected to be called with a lock held. */ + RTE_ASSERT(rte_spinlock_trylock(&ifpx_lock) == 0); + + if (ifpx_callbacks.funcs[ev->type].f_ptr) { + union cb_ptr_t cb = ifpx_callbacks.funcs[ev->type]; + + /* Drop the lock for the time of callback call. */ + rte_spinlock_unlock(&ifpx_lock); + if (px) { + for (p = 0; p < RTE_DIM(ifpx_ports); ++p) { + if (ifpx_ports[p] != proxy_id || + ifpx_ports[p] == p) + continue; + ev->data.port_id = p; + done = cb.f_ptr(&ev->data) || done; + } + } else { + RTE_ASSERT(ev->type == RTE_IFPX_CFG_DONE); + done = cb.cfg_done(); + } + rte_spinlock_lock(&ifpx_lock); + } + if (done) + goto exit; + + /* Event not "consumed" yet so try to notify via queues. */ + TAILQ_FOREACH(q, &ifpx_queues, elem) { + if (px) { + for (p = 0; p < RTE_DIM(ifpx_ports); ++p) { + if (ifpx_ports[p] != proxy_id || + ifpx_ports[p] == p) + continue; + /* Set the port_id - the remaining params should + * be filled before calling this function. + */ + ev->data.port_id = p; + queue_event(ev, q->r); + } + } else + queue_event(ev, q->r); + } +exit: + if (px) + px->state &= ~IN_USE; +} + +void ifpx_cleanup_proxies(void) +{ + struct ifpx_proxy_node *px, *next; + for (px = TAILQ_FIRST(&ifpx_proxies); px; px = next) { + next = TAILQ_NEXT(px, elem); + if (px->state & DEL_PENDING) + ifpx_proxy_destroy(px); + } +} + +int rte_ifpx_listen(void) +{ + int ec; + + if (!ifpx_platform.listen) + return -ENOTSUP; + + ec = ifpx_platform.listen(); + if (ec == 0 && ifpx_platform.get_info) + ifpx_platform.get_info(0); + + return ec; +} + +int rte_ifpx_close(void) +{ + struct ifpx_proxy_node *px; + struct ifpx_queue_node *q; + unsigned int p; + int ec = 0; + + if (ifpx_platform.close) { + ec = ifpx_platform.close(); + if (ec != 0) + IFPX_LOG(ERR, "Platform 'close' calback failed."); + } + + rte_spinlock_lock(&ifpx_lock); + /* Remove queues. */ + while (!TAILQ_EMPTY(&ifpx_queues)) { + q = TAILQ_FIRST(&ifpx_queues); + TAILQ_REMOVE(&ifpx_queues, q, elem); + free(q); + } + + /* Clear callbacks. */ + memset(&ifpx_callbacks.cbs, 0, sizeof(ifpx_callbacks.cbs)); + + /* Unbind ports. */ + for (p = 0; p < RTE_DIM(ifpx_ports); ++p) { + if (ifpx_ports[p] == RTE_MAX_ETHPORTS) + continue; + if (ifpx_ports[p] == p) + /* port is a proxy - just clear entry */ + ifpx_ports[p] = RTE_MAX_ETHPORTS; + else + rte_ifpx_port_unbind(p); + } + + /* Clear proxies. */ + while (!TAILQ_EMPTY(&ifpx_proxies)) { + px = TAILQ_FIRST(&ifpx_proxies); + TAILQ_REMOVE(&ifpx_proxies, px, elem); + free(px); + } + + rte_spinlock_unlock(&ifpx_lock); + + return ec; +} + +RTE_INIT(if_proxy_init) +{ + unsigned int i; + for (i = 0; i < RTE_DIM(ifpx_ports); ++i) + ifpx_ports[i] = RTE_MAX_ETHPORTS; + + ifpx_log_type = rte_log_register("lib.if_proxy"); + if (ifpx_log_type >= 0) + rte_log_set_level(ifpx_log_type, RTE_LOG_WARNING); + + if (ifpx_platform.init) + ifpx_platform.init(); +} diff --git a/lib/librte_if_proxy/if_proxy_priv.h b/lib/librte_if_proxy/if_proxy_priv.h new file mode 100644 index 000000000..dd7468891 --- /dev/null +++ b/lib/librte_if_proxy/if_proxy_priv.h @@ -0,0 +1,97 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ +#ifndef _IF_PROXY_PRIV_H_ +#define _IF_PROXY_PRIV_H_ + +#include +#include + +extern int ifpx_log_type; +#define IFPX_LOG(level, fmt, args...) \ + rte_log(RTE_LOG_ ## level, ifpx_log_type, "%s(): " fmt "\n", \ + __func__, ##args) + +/* Table keeping mapping between port and their proxies. */ +extern +uint16_t ifpx_ports[RTE_MAX_ETHPORTS]; + +/* Callbacks and proxies are kept in linked lists. Since this library is really + * a slow/config path we guard them with a lock - and only one for all of them + * should be enough. We don't expect a need to protect other data structures - + * e.g. data for given port is expected be accessed/modified from single thread. + */ +extern rte_spinlock_t ifpx_lock; + +enum ifpx_node_status { + IN_USE = 1U << 0, + DEL_PENDING = 1U << 1, +}; + +/* List of configured proxies */ +struct ifpx_proxy_node { + TAILQ_ENTRY(ifpx_proxy_node) elem; + uint16_t proxy_id; + uint16_t state; + struct rte_ifpx_info info; +}; +extern +TAILQ_HEAD(ifpx_proxies_head, ifpx_proxy_node) ifpx_proxies; + +/* This function should be called by the implementation whenever it notices + * change in the network configuration. The arguments are: + * - ev : pointer to filled event data structure (all fields are expected to be + * filled, with the exception of 'port_id' for all proxy/port related + * events: this function clones the event notification for each bound port + * and fills 'port_id' appropriately). + * - px : proxy node when given event is proxy/port related, otherwise pass NULL + */ +void ifpx_notify_event(struct rte_ifpx_event *ev, struct ifpx_proxy_node *px); + +/* This function should be called by the implementation whenever it is done with + * notification about network configuration change. It is only really needed + * for the case of callback based API - from the callback user might to attempt + * to remove callbacks/proxies. Removing of callbacks is handled by the + * ifpx_notify_event() function above, however only implementation really knows + * when notification for given proxy is finished so it is a duty of it to call + * this function to cleanup all proxies that has been marked for deletion. + */ +void ifpx_cleanup_proxies(void); + +/* This is the internal function removing the proxy from the list. It is + * related to the notification function above and intended to be used by the + * platform implementation for the case of callback based API. + * During notification via callback the internal lock is released so that + * operation would not deadlock on an attempt to take a lock. However + * modification (destruction) is not really performed - instead the + * callbacks/proxies are marked as "to be deleted". + * Handling of callbacks that are "to be deleted" is done by the + * ifpx_notify_event() function itself however it cannot delete the proxies (in + * particular the proxy passed as an argument) since they might still be + * referred by the calling function. So it is a responsibility of the platform + * implementation to check after calling notification function if there are any + * proxies to be removed and use ifpx_proxy_destroy() to actually release them. + */ +int ifpx_proxy_destroy(struct ifpx_proxy_node *px); + +/* Every implementation should provide definition of this structure: + * - init : called during library initialization (NULL when not needed) + * - listen : this function should start service listening to the network + * configuration events/changes, + * - close : this function should close the service started by listen() + * - get_info : this function should query system for current configuration of + * interface with index 'if_index'. After successful initialization of + * listening service this function is called with 0 as an argument. In that + * case configuration of all ports should be obtained - and when this + * procedure completes a RTE_IFPX_CFG_DONE event should be signaled via + * ifpx_notify_event(). + */ +extern +struct ifpx_platform_callbacks { + void (*init)(void); + int (*listen)(void); + int (*close)(void); + void (*get_info)(int if_index); +} ifpx_platform; + +#endif /* _IF_PROXY_PRIV_H_ */ diff --git a/lib/librte_if_proxy/linux/Makefile b/lib/librte_if_proxy/linux/Makefile new file mode 100644 index 000000000..275b7e1e3 --- /dev/null +++ b/lib/librte_if_proxy/linux/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2020 Marvell International Ltd. + +SRCS += if_proxy.c diff --git a/lib/librte_if_proxy/linux/if_proxy.c b/lib/librte_if_proxy/linux/if_proxy.c new file mode 100644 index 000000000..0204505e3 --- /dev/null +++ b/lib/librte_if_proxy/linux/if_proxy.c @@ -0,0 +1,550 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +static +struct rte_intr_handle ifpx_irq = { + .type = RTE_INTR_HANDLE_NETLINK, + .fd = -1, +}; + +static +unsigned int ifpx_pid; + +static +int request_info(int type, int index) +{ + static rte_spinlock_t send_lock = RTE_SPINLOCK_INITIALIZER; + struct info_get { + struct nlmsghdr h; + union { + struct ifinfomsg ifm; + struct ifaddrmsg ifa; + struct rtmsg rtm; + struct ndmsg ndm; + } __rte_aligned(NLMSG_ALIGNTO); + } info_req; + int ret; + + memset(&info_req, 0, sizeof(info_req)); + /* First byte of these messages is family, so just make sure that this + * memset is enough to get all families. + */ + RTE_ASSERT(AF_UNSPEC == 0); + + info_req.h.nlmsg_pid = ifpx_pid; + info_req.h.nlmsg_type = type; + info_req.h.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; + info_req.h.nlmsg_len = offsetof(struct info_get, ifm); + + switch (type) { + case RTM_GETLINK: + info_req.h.nlmsg_len += sizeof(info_req.ifm); + info_req.ifm.ifi_index = index; + break; + case RTM_GETADDR: + info_req.h.nlmsg_len += sizeof(info_req.ifa); + info_req.ifa.ifa_index = index; + break; + case RTM_GETROUTE: + info_req.h.nlmsg_len += sizeof(info_req.rtm); + break; + case RTM_GETNEIGH: + info_req.h.nlmsg_len += sizeof(info_req.ndm); + break; + default: + IFPX_LOG(WARNING, "Unhandled message type: %d", type); + return -EINVAL; + } + /* Store request type (and if it is global or link specific) in 'seq'. + * Later it is used during handling of reply to continue requesting of + * information dump from system - if needed. + */ + info_req.h.nlmsg_seq = index << 8 | type; + + IFPX_LOG(DEBUG, "\tRequesting msg %d for: %u", type, index); + + rte_spinlock_lock(&send_lock); + ret = send(ifpx_irq.fd, &info_req, info_req.h.nlmsg_len, 0); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to send netlink msg: %d", errno); + rte_errno = errno; + } + rte_spinlock_unlock(&send_lock); + + return ret; +} + +static +void handle_link(const struct nlmsghdr *h) +{ + const struct ifinfomsg *ifi = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*ifi)); + const struct rtattr *attrs[IFLA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_proxy_node *px; + struct rte_ifpx_event ev; + + IFPX_LOG(DEBUG, "\tLink action (%u): %u, 0x%x/0x%x (flags/changed)", + ifi->ifi_index, h->nlmsg_type, ifi->ifi_flags, + ifi->ifi_change); + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == (unsigned int)ifi->ifi_index) + break; + } + + /* Drop messages that are not associated with any proxy */ + if (!px) + goto exit; + /* When message is a reply to request for specific interface then keep + * it only when it contains info for this interface. + */ + if (h->nlmsg_pid == ifpx_pid && h->nlmsg_seq >> 8 && + (h->nlmsg_seq >> 8) != (unsigned int)ifi->ifi_index) + goto exit; + + for (attr = IFLA_RTA(ifi); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > IFLA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + if (ifi->ifi_change & IFF_UP) { + ev.type = RTE_IFPX_LINK_CHANGE; + ev.link_change.is_up = ifi->ifi_flags & IFF_UP; + ifpx_notify_event(&ev, px); + } + if (attrs[IFLA_MTU]) { + uint16_t mtu = *(const int *)RTA_DATA(attrs[IFLA_MTU]); + if (mtu != px->info.mtu) { + px->info.mtu = mtu; + ev.type = RTE_IFPX_MTU_CHANGE; + ev.mtu_change.mtu = mtu; + ifpx_notify_event(&ev, px); + } + } + if (attrs[IFLA_ADDRESS]) { + const struct rte_ether_addr *mac = + RTA_DATA(attrs[IFLA_ADDRESS]); + + RTE_ASSERT(RTA_PAYLOAD(attrs[IFLA_ADDRESS]) == + RTE_ETHER_ADDR_LEN); + if (memcmp(mac, &px->info.mac, RTE_ETHER_ADDR_LEN) != 0) { + rte_ether_addr_copy(mac, &px->info.mac); + ev.type = RTE_IFPX_MAC_CHANGE; + rte_ether_addr_copy(mac, &ev.mac_change.mac); + ifpx_notify_event(&ev, px); + } + } + if (h->nlmsg_pid == ifpx_pid) { + RTE_ASSERT((h->nlmsg_seq & 0xFF) == RTM_GETLINK); + /* If this is reply for specific link request (not initial + * global dump) then follow up with address request, otherwise + * just store the interface name. + */ + if (h->nlmsg_seq >> 8) + request_info(RTM_GETADDR, ifi->ifi_index); + else if (!px->info.if_name[0] && attrs[IFLA_IFNAME]) + strlcpy(px->info.if_name, RTA_DATA(attrs[IFLA_IFNAME]), + sizeof(px->info.if_name)); + } + + ifpx_cleanup_proxies(); +exit: + rte_spinlock_unlock(&ifpx_lock); +} + +static +void handle_addr(const struct nlmsghdr *h, bool needs_del) +{ + const struct ifaddrmsg *ifa = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*ifa)); + const struct rtattr *attrs[IFA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_proxy_node *px; + struct rte_ifpx_event ev; + const uint8_t *ip; + + IFPX_LOG(DEBUG, "\tAddr action (%u): %u, family: %u", + ifa->ifa_index, h->nlmsg_type, ifa->ifa_family); + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == ifa->ifa_index) + break; + } + + /* Drop messages that are not associated with any proxy */ + if (!px) + goto exit; + /* When message is a reply to request for specific interface then keep + * it only when it contains info for this interface. + */ + if (h->nlmsg_pid == ifpx_pid && h->nlmsg_seq >> 8 && + (h->nlmsg_seq >> 8) != ifa->ifa_index) + goto exit; + + for (attr = IFA_RTA(ifa); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > IFA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + if (attrs[IFA_ADDRESS]) { + ip = RTA_DATA(attrs[IFA_ADDRESS]); + if (ifa->ifa_family == AF_INET) { + ev.type = needs_del ? RTE_IFPX_ADDR_DEL + : RTE_IFPX_ADDR_ADD; + ev.addr_change.ip = + RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + } else { + ev.type = needs_del ? RTE_IFPX_ADDR6_DEL + : RTE_IFPX_ADDR6_ADD; + memcpy(ev.addr6_change.ip, ip, 16); + } + ifpx_notify_event(&ev, px); + ifpx_cleanup_proxies(); + } +exit: + rte_spinlock_unlock(&ifpx_lock); +} + +static +void handle_route(const struct nlmsghdr *h, bool needs_del) +{ + const struct rtmsg *r = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*r)); + const struct rtattr *attrs[RTA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct rte_ifpx_event ev; + struct ifpx_proxy_node *px = NULL; + const uint8_t *ip; + + IFPX_LOG(DEBUG, "\tRoute action: %u, family: %u", + h->nlmsg_type, r->rtm_family); + + for (attr = RTM_RTA(r); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > RTA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + memset(&ev, 0, sizeof(ev)); + ev.type = RTE_IFPX_NUM_EVENTS; + + rte_spinlock_lock(&ifpx_lock); + if (attrs[RTA_OIF]) { + int if_index = *((int32_t *)RTA_DATA(attrs[RTA_OIF])); + + if (if_index > 0) { + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == (uint32_t)if_index) + break; + } + } + } + /* We are only interested in routes related to the proxy interfaces and + * we need to have dst - otherwise skip the message. + */ + if (!px || !attrs[RTA_DST]) + goto exit; + + ip = RTA_DATA(attrs[RTA_DST]); + /* This is common to both IPv4/6. */ + ev.route_change.depth = r->rtm_dst_len; + if (r->rtm_family == AF_INET) { + ev.type = needs_del ? RTE_IFPX_ROUTE_DEL + : RTE_IFPX_ROUTE_ADD; + ev.route_change.ip = RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + } else { + ev.type = needs_del ? RTE_IFPX_ROUTE6_DEL + : RTE_IFPX_ROUTE6_ADD; + memcpy(ev.route6_change.ip, ip, 16); + } + if (attrs[RTA_GATEWAY]) { + ip = RTA_DATA(attrs[RTA_GATEWAY]); + if (r->rtm_family == AF_INET) + ev.route_change.gateway = + RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + else + memcpy(ev.route6_change.gateway, ip, 16); + } + + ifpx_notify_event(&ev, px); + /* Let's check for proxies to remove here too - just in case somebody + * removed the non-proxy related callback. + */ + ifpx_cleanup_proxies(); +exit: + rte_spinlock_unlock(&ifpx_lock); +} + +/* Link, addr and route related messages seem to have this macro defined but not + * neighbour one. Define one if it is missing - const qualifiers added just to + * silence compiler - for some reason it is not needed in equivalent macros for + * other messages and here compiler is complaining about (char*) cast on pointer + * to const. + */ +#ifndef NDA_RTA +#define NDA_RTA(r) ((const struct rtattr *)(((const char *)(r)) + \ + NLMSG_ALIGN(sizeof(struct ndmsg)))) +#endif + +static +void handle_neigh(const struct nlmsghdr *h, bool needs_del) +{ + const struct ndmsg *n = NLMSG_DATA(h); + int alen = h->nlmsg_len - NLMSG_LENGTH(sizeof(*n)); + const struct rtattr *attrs[NDA_MAX+1] = { NULL }; + const struct rtattr *attr; + struct ifpx_proxy_node *px; + struct rte_ifpx_event ev; + const uint8_t *ip; + + IFPX_LOG(DEBUG, "\tNeighbour action: %u, family: %u, state: %u, if: %d", + h->nlmsg_type, n->ndm_family, n->ndm_state, n->ndm_ifindex); + + for (attr = NDA_RTA(n); RTA_OK(attr, alen); + attr = RTA_NEXT(attr, alen)) { + if (attr->rta_type > NDA_MAX) + continue; + attrs[attr->rta_type] = attr; + } + + memset(&ev, 0, sizeof(ev)); + ev.type = RTE_IFPX_NUM_EVENTS; + + rte_spinlock_lock(&ifpx_lock); + TAILQ_FOREACH(px, &ifpx_proxies, elem) { + if (px->info.if_index == (unsigned int)n->ndm_ifindex) + break; + } + /* We need only subset of neighbourhood related to proxy interfaces. + * lladdr seems to be needed only for adding new entry - modifications + * (also reported via RTM_NEWLINK) and deletion include only dst. + */ + if (!px || !attrs[NDA_DST] || (!needs_del && !attrs[NDA_LLADDR])) + goto exit; + + ip = RTA_DATA(attrs[NDA_DST]); + if (n->ndm_family == AF_INET) { + ev.type = needs_del ? RTE_IFPX_NEIGH_DEL + : RTE_IFPX_NEIGH_ADD; + ev.neigh_change.ip = RTE_IPV4(ip[0], ip[1], ip[2], ip[3]); + } else { + ev.type = needs_del ? RTE_IFPX_NEIGH6_DEL + : RTE_IFPX_NEIGH6_ADD; + memcpy(ev.neigh6_change.ip, ip, 16); + } + if (attrs[NDA_LLADDR]) + rte_ether_addr_copy(RTA_DATA(attrs[NDA_LLADDR]), + &ev.neigh_change.mac); + + ifpx_notify_event(&ev, px); + /* Let's check for proxies to remove here too - just in case somebody + * removed the non-proxy related callback. + */ + ifpx_cleanup_proxies(); +exit: + rte_spinlock_unlock(&ifpx_lock); +} + +static +void if_proxy_intr_callback(void *arg __rte_unused) +{ + struct nlmsghdr *h; + struct sockaddr_nl addr; + socklen_t addr_len; + char buf[8192]; + ssize_t len; + +restart: + len = recvfrom(ifpx_irq.fd, buf, sizeof(buf), 0, + (struct sockaddr *)&addr, &addr_len); + if (len < 0) { + if (errno == EINTR) { + IFPX_LOG(DEBUG, "recvmsg() interrupted"); + goto restart; + } + IFPX_LOG(ERR, "Failed to read netlink msg: %ld (errno %d)", + len, errno); + return; + } + if (addr_len != sizeof(addr)) { + IFPX_LOG(ERR, "Invalid netlink addr size: %d", addr_len); + return; + } + IFPX_LOG(DEBUG, "Read %lu bytes (buf %lu) from %u/%u", len, + sizeof(buf), addr.nl_pid, addr.nl_groups); + + for (h = (struct nlmsghdr *)buf; NLMSG_OK(h, len); + h = NLMSG_NEXT(h, len)) { + IFPX_LOG(DEBUG, "Recv msg: %u (%u/%u/%u seq/flags/pid)", + h->nlmsg_type, h->nlmsg_seq, h->nlmsg_flags, + h->nlmsg_pid); + + switch (h->nlmsg_type) { + case RTM_NEWLINK: + case RTM_DELLINK: + handle_link(h); + break; + case RTM_NEWADDR: + case RTM_DELADDR: + handle_addr(h, h->nlmsg_type == RTM_DELADDR); + break; + case RTM_NEWROUTE: + case RTM_DELROUTE: + handle_route(h, h->nlmsg_type == RTM_DELROUTE); + break; + case RTM_NEWNEIGH: + case RTM_DELNEIGH: + handle_neigh(h, h->nlmsg_type == RTM_DELNEIGH); + break; + } + + /* If this is a reply for global request then follow up with + * additional requests and notify about finish. + */ + if (h->nlmsg_pid == ifpx_pid && (h->nlmsg_seq >> 8) == 0 && + h->nlmsg_type == NLMSG_DONE) { + if ((h->nlmsg_seq & 0xFF) == RTM_GETLINK) + request_info(RTM_GETADDR, 0); + else if ((h->nlmsg_seq & 0xFF) == RTM_GETADDR) + request_info(RTM_GETROUTE, 0); + else if ((h->nlmsg_seq & 0xFF) == RTM_GETROUTE) + request_info(RTM_GETNEIGH, 0); + else { + struct rte_ifpx_event ev = { + .type = RTE_IFPX_CFG_DONE + }; + + RTE_ASSERT((h->nlmsg_seq & 0xFF) == + RTM_GETNEIGH); + rte_spinlock_lock(&ifpx_lock); + ifpx_notify_event(&ev, NULL); + rte_spinlock_unlock(&ifpx_lock); + } + } + } + IFPX_LOG(DEBUG, "Finished msg loop: %ld bytes left", len); +} + +static +int nlink_listen(void) +{ + struct sockaddr_nl addr = { + .nl_family = AF_NETLINK, + .nl_pid = 0, + }; + socklen_t addr_len = sizeof(addr); + int ret; + + if (ifpx_irq.fd != -1) { + rte_errno = EBUSY; + return -1; + } + + addr.nl_groups = 1 << (RTNLGRP_LINK-1) + | 1 << (RTNLGRP_NEIGH-1) + | 1 << (RTNLGRP_IPV4_IFADDR-1) + | 1 << (RTNLGRP_IPV6_IFADDR-1) + | 1 << (RTNLGRP_IPV4_ROUTE-1) + | 1 << (RTNLGRP_IPV6_ROUTE-1); + + ifpx_irq.fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, + NETLINK_ROUTE); + if (ifpx_irq.fd == -1) { + IFPX_LOG(ERR, "Failed to create netlink socket: %d", errno); + goto error; + } + /* Starting with kernel 4.19 you can request dump for a specific + * interface and kernel will filter out and send only relevant info. + * Otherwise NLM_F_DUMP will generate info for all interfaces and you + * need to filter them yourself. + */ +#ifdef NETLINK_DUMP_STRICT_CHK + ret = 1; /* use this var also as an input param */ + ret = setsockopt(ifpx_irq.fd, SOL_SOCKET, NETLINK_DUMP_STRICT_CHK, + &ret, sizeof(ret)); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to set socket option: %d", errno); + goto error; + } +#endif + + ret = bind(ifpx_irq.fd, (struct sockaddr *)&addr, addr_len); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to bind socket: %d", errno); + goto error; + } + ret = getsockname(ifpx_irq.fd, (struct sockaddr *)&addr, &addr_len); + if (ret < 0) { + IFPX_LOG(ERR, "Failed to get socket addr: %d", errno); + goto error; + } else { + ifpx_pid = addr.nl_pid; + IFPX_LOG(DEBUG, "Assigned port ID: %u", addr.nl_pid); + } + + ret = rte_intr_callback_register(&ifpx_irq, if_proxy_intr_callback, + NULL); + if (ret == 0) + return 0; + +error: + rte_errno = errno; + if (ifpx_irq.fd != -1) { + close(ifpx_irq.fd); + ifpx_irq.fd = -1; + } + return -1; +} + +static +int nlink_close(void) +{ + int ec; + + if (ifpx_irq.fd < 0) + return -EBADFD; + + do + ec = rte_intr_callback_unregister(&ifpx_irq, + if_proxy_intr_callback, NULL); + while (ec == -EAGAIN); /* unlikely but possible - at least I think so */ + + close(ifpx_irq.fd); + ifpx_irq.fd = -1; + ifpx_pid = 0; + + return 0; +} + +static +void nlink_get_info(int if_index) +{ + if (ifpx_irq.fd != -1) + request_info(RTM_GETLINK, if_index); +} + +struct ifpx_platform_callbacks ifpx_platform = { + .init = NULL, + .listen = nlink_listen, + .close = nlink_close, + .get_info = nlink_get_info, +}; diff --git a/lib/librte_if_proxy/meson.build b/lib/librte_if_proxy/meson.build new file mode 100644 index 000000000..f0c1a6e15 --- /dev/null +++ b/lib/librte_if_proxy/meson.build @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(C) 2020 Marvell International Ltd. + +# Currently only implemented on Linux +if not is_linux + build = false + reason = 'only supported on linux' +endif + +version = 1 +allow_experimental_apis = true + +deps += ['ethdev'] +sources = files('if_proxy_common.c') +headers = files('rte_if_proxy.h') + +if is_linux + sources += files('linux/if_proxy.c') +endif diff --git a/lib/librte_if_proxy/rte_if_proxy.h b/lib/librte_if_proxy/rte_if_proxy.h new file mode 100644 index 000000000..70f701719 --- /dev/null +++ b/lib/librte_if_proxy/rte_if_proxy.h @@ -0,0 +1,561 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2020 Marvell International Ltd. + */ + +#ifndef _RTE_IF_PROXY_H_ +#define _RTE_IF_PROXY_H_ + +/** + * @file + * RTE IF Proxy library + * + * The IF Proxy library allows for monitoring of system network configuration + * and configuration of DPDK ports by using usual system utilities (like the + * ones from iproute2 package). + * + * It is based on the notion of "proxy interface" which actually can be any DPDK + * port which is also visible to the system - that is it has non-zero 'if_index' + * field in 'rte_eth_dev_info' structure. + * + * If application doesn't have any such port (or doesn't want to use it for + * proxy) it can create one by calling: + * + * proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT); + * + * This function is just a wrapper that constructs valid 'devargs' string based + * on the proxy type chosen (currently Tap or KNI) and creates the interface by + * calling rte_ifpx_dev_create(). + * + * Once one has DPDK port capable of being proxy one can bind target DPDK port + * to it by calling. + * + * rte_ifpx_port_bind(port_id, proxy_id); + * + * This binding is a logical one - there is no automatic packet forwarding + * between port and it's proxy since the library doesn't know the structure of + * application's packet processing. It remains application responsibility to + * forward the packets from/to proxy port (by calling the usual DPDK RX/TX burst + * API). However when the library notes some change to the proxy interface it + * will simply call appropriate callback with 'port_id' of the DPDK port that is + * bound to this proxy interface. The binding can be 1 to many - that is many + * ports can point to one proxy - in that case registered callbacks will be + * called for every bound port. + * + * The callbacks that are used for notifications are described by the + * 'rte_ifpx_callbacks' structure and they are registered by calling: + * + * rte_ifpx_callbacks_register(&cbs); + * + * Finally the application should call: + * + * rte_ifpx_listen(); + * + * which will query system for present network configuration and start listening + * to its changes. + */ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Enum naming the type of proxy to create. + * + * @see rte_ifpx_create() + */ +enum rte_ifpx_proxy_type { + RTE_IFPX_DEFAULT, /**< Use default proxy type for given arch. */ + RTE_IFPX_TAP, /**< Use Tap based port for proxy. */ + RTE_IFPX_KNI /**< Use KNI based port for proxy. */ +}; + +/** + * Create DPDK port that can serve as an interface proxy. + * + * This function is just a wrapper around rte_ifpx_create_by_devarg() that + * constructs its 'devarg' argument based on type of proxy requested. + * + * @param type + * A type of proxy to create. + * + * @return + * DPDK port id on success, RTE_MAX_ETHPORTS otherwise. + * + * @see enum rte_ifpx_type + * @see rte_ifpx_create_by_devarg() + */ +__rte_experimental +uint16_t rte_ifpx_proxy_create(enum rte_ifpx_proxy_type type); + +/** + * Create DPDK port that can serve as an interface proxy. + * + * @param devarg + * A string passed to rte_dev_probe() to create proxy port. + * + * @return + * DPDK port id on success, RTE_MAX_ETHPORTS otherwise. + */ +__rte_experimental +uint16_t rte_ifpx_proxy_create_by_devarg(const char *devarg); + +/** + * Remove DPDK proxy port. + * + * In addition to removing the proxy port the bindings (if any) are cleared. + * + * @param proxy_id + * Port id of the proxy that should be removed. + * + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_proxy_destroy(uint16_t proxy_id); + +/** + * The rte_ifpx_event_type enum lists all possible event types that can be + * signaled by this library. To learn what events are supported on your + * platform call rte_ifpx_events_available(). + * + * NOTE - do not reorder these enums freely, their values need to correspond to + * the order of the callbacks in struct rte_ifpx_callbacks. + */ +enum rte_ifpx_event_type { + RTE_IFPX_MAC_CHANGE, /**< @see struct rte_ifpx_mac_change */ + RTE_IFPX_MTU_CHANGE, /**< @see struct rte_ifpx_mtu_change */ + RTE_IFPX_LINK_CHANGE, /**< @see struct rte_ifpx_link_change */ + RTE_IFPX_ADDR_ADD, /**< @see struct rte_ifpx_addr_change */ + RTE_IFPX_ADDR_DEL, /**< @see struct rte_ifpx_addr_change */ + RTE_IFPX_ADDR6_ADD, /**< @see struct rte_ifpx_addr6_change */ + RTE_IFPX_ADDR6_DEL, /**< @see struct rte_ifpx_addr6_change */ + RTE_IFPX_ROUTE_ADD, /**< @see struct rte_ifpx_route_change */ + RTE_IFPX_ROUTE_DEL, /**< @see struct rte_ifpx_route_change */ + RTE_IFPX_ROUTE6_ADD, /**< @see struct rte_ifpx_route6_change */ + RTE_IFPX_ROUTE6_DEL, /**< @see struct rte_ifpx_route6_change */ + RTE_IFPX_NEIGH_ADD, /**< @see struct rte_ifpx_neigh_change */ + RTE_IFPX_NEIGH_DEL, /**< @see struct rte_ifpx_neigh_change */ + RTE_IFPX_NEIGH6_ADD, /**< @see struct rte_ifpx_neigh6_change */ + RTE_IFPX_NEIGH6_DEL, /**< @see struct rte_ifpx_neigh6_change */ + RTE_IFPX_CFG_DONE, /**< This event is a lib specific event - it is + * signaled when initial network configuration + * query is finished and has no event data. + */ + RTE_IFPX_NUM_EVENTS, +}; + +/** + * Get the bit mask of implemented events/callbacks for this platform. + * + * @return + * Bit mask of events/callbacks implemented: each event type can be tested by + * checking bit (1 << ev) where 'ev' is one of the rte_ifpx_event_type enum + * values. + * @see enum rte_ifpx_event_type + */ +__rte_experimental +uint64_t rte_ifpx_events_available(void); + +/** + * The rte_ifpx_event defines structure used to pass notification event to + * application. Each event type has its own dedicated inner structure - these + * structures are also used when using callbacks notifications. + */ +struct rte_ifpx_event { + enum rte_ifpx_event_type type; + union { + /** Structure used to pass notification about MAC change of the + * proxy interface. + * @see RTE_IFPX_MAC_CHANGE + */ + struct rte_ifpx_mac_change { + uint16_t port_id; + struct rte_ether_addr mac; + } mac_change; + /** Structure used to pass notification about MTU change. + * @see RTE_IFPX_MTU_CHANGE + */ + struct rte_ifpx_mtu_change { + uint16_t port_id; + uint16_t mtu; + } mtu_change; + /** Structure used to pass notification about link going + * up/down. + * @see RTE_IFPX_LINK_CHANGE + */ + struct rte_ifpx_link_change { + uint16_t port_id; + int is_up; + } link_change; + /** Structure used to pass notification about IPv4 address being + * added/removed. All IPv4 addresses reported by this library + * are in host order. + * @see RTE_IFPX_ADDR_ADD + * @see RTE_IFPX_ADDR_DEL + */ + struct rte_ifpx_addr_change { + uint16_t port_id; + uint32_t ip; + } addr_change; + /** Structure used to pass notification about IPv6 address being + * added/removed. + * @see RTE_IFPX_ADDR6_ADD + * @see RTE_IFPX_ADDR6_DEL + */ + struct rte_ifpx_addr6_change { + uint16_t port_id; + uint8_t ip[16]; + } addr6_change; + /** Structure used to pass notification about IPv4 route being + * added/removed. + * @see RTE_IFPX_ROUTE_ADD + * @see RTE_IFPX_ROUTE_DEL + */ + struct rte_ifpx_route_change { + uint16_t port_id; + uint8_t depth; + uint32_t ip; + uint32_t gateway; + } route_change; + /** Structure used to pass notification about IPv6 route being + * added/removed. + * @see RTE_IFPX_ROUTE6_ADD + * @see RTE_IFPX_ROUTE6_DEL + */ + struct rte_ifpx_route6_change { + uint16_t port_id; + uint8_t depth; + uint8_t ip[16]; + uint8_t gateway[16]; + } route6_change; + /** Structure used to pass notification about IPv4 neighbour + * info changes. + * @see RTE_IFPX_NEIGH_ADD + * @see RTE_IFPX_NEIGH_DEL + */ + struct rte_ifpx_neigh_change { + uint16_t port_id; + struct rte_ether_addr mac; + uint32_t ip; + } neigh_change; + /** Structure used to pass notification about IPv6 neighbour + * info changes. + * @see RTE_IFPX_NEIGH6_ADD + * @see RTE_IFPX_NEIGH6_DEL + */ + struct rte_ifpx_neigh6_change { + uint16_t port_id; + struct rte_ether_addr mac; + uint8_t ip[16]; + } neigh6_change; + /* This structure is used internally - to abstract common parts + * of proxy/port related events and to be able to refer to this + * union without giving it a name. + */ + struct { + uint16_t port_id; + } data; + }; +}; + +/** + * This library can deliver notification about network configuration changes + * either by the use of registered callbacks and/or by queueing change events to + * configured notification queues. The logic used is: + * 1. If there is callback registered for given event type it is called. In + * case of many ports to one proxy binding, this callback is called for every + * port bound. + * 2. If this callback returns non-zero value (for any of ports in case of + * many-1 bindings) the handling of an event is considered as complete. + * 3. Otherwise the event is added to each configured event queue. The event is + * allocated with malloc() so after dequeueing and handling the application + * should deallocate it with free(). + * + * This dual notification mechanism is meant to provide some flexibility to + * application writer. For example, if you store your data in a single writer/ + * many readers coherent data structure you could just update this structure + * from the callback. If you keep separate copy per lcore/port you could make + * some common preparations (if applicable) in the callback, return 0 and use + * notification queues to pick up the change and update data structures. Or you + * could skip the callbacks altogether and just use notification queues - and + * configure them at the level appropriate for your application design (one + * global / one per lcore / one per port ...). + */ + +/** + * Add notification queue to the list of queues. + * + * @param r + * Ring used for queueing of notification events - application can assume that + * there is only one producer. + * @return + * 0 on success, negative otherwise. + */ +int rte_ifpx_queue_add(struct rte_ring *r); + +/** + * Remove notification queue from the list of queues. + * + * @param r + * Notification ring used for queueing of notification events (previously + * added via rte_ifpx_queue_add()). + * @return + * 0 on success, negative otherwise. + */ +int rte_ifpx_queue_remove(struct rte_ring *r); + +/** + * This structure groups the callbacks that might be called as a notification + * events for changing network configuration. Not every platform might + * implement all of them and you can query the availability with + * rte_ifpx_callbacks_available() function. + * @see rte_ifpx_events_available() + * @see rte_ifpx_callbacks_register() + */ +struct rte_ifpx_callbacks { + int (*mac_change)(const struct rte_ifpx_mac_change *event); + /**< Callback for notification about MAC change of the proxy interface. + * This callback (as all other port related callbacks) is called for + * each port (with its port_id as a first argument) bound to the proxy + * interface for which change has been observed. + * @see struct rte_ifpx_mac_change + * @return non-zero if event handling is finished + */ + int (*mtu_change)(const struct rte_ifpx_mtu_change *event); + /**< Callback for notification about MTU change. + * @see struct rte_ifpx_mtu_change + * @return non-zero if event handling is finished + */ + int (*link_change)(const struct rte_ifpx_link_change *event); + /**< Callback for notification about link going up/down. + * @see struct rte_ifpx_link_change + * @return non-zero if event handling is finished + */ + int (*addr_add)(const struct rte_ifpx_addr_change *event); + /**< Callback for notification about IPv4 address being added. + * @see struct rte_ifpx_addr_change + * @return non-zero if event handling is finished + */ + int (*addr_del)(const struct rte_ifpx_addr_change *event); + /**< Callback for notification about IPv4 address removal. + * @see struct rte_ifpx_addr_change + * @return non-zero if event handling is finished + */ + int (*addr6_add)(const struct rte_ifpx_addr6_change *event); + /**< Callback for notification about IPv6 address being added. + * @see struct rte_ifpx_addr6_change + */ + int (*addr6_del)(const struct rte_ifpx_addr6_change *event); + /**< Callback for notification about IPv4 address removal. + * @see struct rte_ifpx_addr6_change + * @return non-zero if event handling is finished + */ + /* Please note that "route" callbacks might be also called when user + * adds address to the interface (that is in addition to address related + * callbacks). + */ + int (*route_add)(const struct rte_ifpx_route_change *event); + /**< Callback for notification about IPv4 route being added. + * @see struct rte_ifpx_route_change + * @return non-zero if event handling is finished + */ + int (*route_del)(const struct rte_ifpx_route_change *event); + /**< Callback for notification about IPv4 route removal. + * @see struct rte_ifpx_route_change + * @return non-zero if event handling is finished + */ + int (*route6_add)(const struct rte_ifpx_route6_change *event); + /**< Callback for notification about IPv6 route being added. + * @see struct rte_ifpx_route6_change + * @return non-zero if event handling is finished + */ + int (*route6_del)(const struct rte_ifpx_route6_change *event); + /**< Callback for notification about IPv6 route removal. + * @see struct rte_ifpx_route6_change + * @return non-zero if event handling is finished + */ + int (*neigh_add)(const struct rte_ifpx_neigh_change *event); + /**< Callback for notification about IPv4 neighbour being added. + * @see struct rte_ifpx_neigh_change + * @return non-zero if event handling is finished + */ + int (*neigh_del)(const struct rte_ifpx_neigh_change *event); + /**< Callback for notification about IPv4 neighbour removal. + * @see struct rte_ifpx_neigh_change + * @return non-zero if event handling is finished + */ + int (*neigh6_add)(const struct rte_ifpx_neigh6_change *event); + /**< Callback for notification about IPv6 neighbour being added. + * @see struct rte_ifpx_neigh_change + */ + int (*neigh6_del)(const struct rte_ifpx_neigh6_change *event); + /**< Callback for notification about IPv6 neighbour removal. + * @see struct rte_ifpx_neigh_change + * @return non-zero if event handling is finished + */ + int (*cfg_done)(void); + /**< Lib specific callback - called when initial network configuration + * query is finished. + * @return non-zero if event handling is finished + */ +}; + +/** + * Register proxy callbacks. + * + * This function registers callbacks to be called upon appropriate network + * event notification. + * + * @param cbs + * Set of callbacks that will be called. The library does not take any + * ownership of the pointer passed - the callbacks are stored internally. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_callbacks_register(const struct rte_ifpx_callbacks *cbs); + +/** + * Unregister proxy callbacks. + * + * This function unregisters callbacks previously registered with + * rte_ifpx_callbacks_register(). + * + * @param cbs + * Handle/pointer returned on previous callback registration. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +void rte_ifpx_callbacks_unregister(void); + +/** + * Bind the port to its proxy. + * + * After calling this function all network configuration of the proxy (and it's + * changes) will be passed to given port by calling registered callbacks with + * 'port_id' as an argument. + * + * Note: since both arguments are of the same type in order to not mix them and + * ease remembering the order the first one is kept the same for bind/unbind. + * + * @param port_id + * Id of the port to be bound. + * @param proxy_id + * Id of the proxy the port needs to be bound to. + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_port_bind(uint16_t port_id, uint16_t proxy_id); + +/** + * Unbind the port from its proxy. + * + * After calling this function registered callbacks will no longer be called for + * this port (but they might be called for other ports in one to many binding + * scenario). + * + * @param port_id + * Id of the port to unbind. + * @return + * 0 on success, negative on error. + */ +__rte_experimental +int rte_ifpx_port_unbind(uint16_t port_id); + +/** + * Get the system network configuration and start listening to its changes. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_listen(void); + +/** + * Remove all bindings/callbacks and stop listening to network configuration. + * + * @return + * 0 on success, negative otherwise. + */ +__rte_experimental +int rte_ifpx_close(void); + +/** + * Get the id of the proxy the port is bound to. + * + * @param port_id + * Id of the port for which to get proxy. + * @return + * Port id of the proxy on success, RTE_MAX_ETHPORTS on error. + */ +__rte_experimental +uint16_t rte_ifpx_proxy_get(uint16_t port_id); + +/** + * Test for port acting as a proxy. + * + * @param port_id + * Id of the port. + * @return + * 1 if port acts as a proxy, 0 otherwise. + */ +static inline +int rte_ifpx_is_proxy(uint16_t port_id) +{ + return rte_ifpx_proxy_get(port_id) == port_id; +} + +/** + * Get the ids of the ports bound to the proxy. + * + * @param proxy_id + * Id of the proxy for which to get ports. + * @param ports + * Array where to store the port ids. + * @param num + * Size of the 'ports' array. + * @return + * The number of ports bound to given proxy. Note that bound ports are filled + * in 'ports' array up to its size but the return value is always the total + * number of ports bound - so you can make call first with NULL/0 to query for + * the size of the buffer to create or call it with the buffer you have and + * later check if it was large enough. + */ +__rte_experimental +unsigned int rte_ifpx_port_get(uint16_t proxy_id, + uint16_t *ports, unsigned int num); + +/** + * The structure containing some properties of the proxy interface. + */ +struct rte_ifpx_info { + unsigned int if_index; /* entry valid iff if_index != 0 */ + uint16_t mtu; + struct rte_ether_addr mac; + char if_name[RTE_ETH_NAME_MAX_LEN]; +}; + +/** + * Get the properties of the proxy interface. Argument can be either id of the + * proxy or an id of a port that is bound to it. + * + * @param port_id + * Id of the port (or proxy) for which to get proxy properties. + * @return + * Pointer to the proxy information structure. + */ +__rte_experimental +const struct rte_ifpx_info *rte_ifpx_info_get(uint16_t port_id); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_IF_PROXY_H_ */ diff --git a/lib/librte_if_proxy/rte_if_proxy_version.map b/lib/librte_if_proxy/rte_if_proxy_version.map new file mode 100644 index 000000000..e2093137d --- /dev/null +++ b/lib/librte_if_proxy/rte_if_proxy_version.map @@ -0,0 +1,19 @@ +EXPERIMENTAL { + global: + + rte_ifpx_proxy_create; + rte_ifpx_proxy_create_by_devarg; + rte_ifpx_proxy_destroy; + rte_ifpx_events_available; + rte_ifpx_callbacks_register; + rte_ifpx_callbacks_unregister; + rte_ifpx_port_bind; + rte_ifpx_port_unbind; + rte_ifpx_listen; + rte_ifpx_close; + rte_ifpx_proxy_get; + rte_ifpx_port_get; + rte_ifpx_info_get; + + local: *; +}; diff --git a/lib/meson.build b/lib/meson.build index 0af3efab2..c913b33dd 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -19,7 +19,7 @@ libraries = [ 'acl', 'bbdev', 'bitratestats', 'cfgfile', 'compressdev', 'cryptodev', 'distributor', 'efd', 'eventdev', - 'gro', 'gso', 'ip_frag', 'jobstats', + 'gro', 'gso', 'if_proxy', 'ip_frag', 'jobstats', 'kni', 'latencystats', 'lpm', 'member', 'power', 'pdump', 'rawdev', 'rcu', 'rib', 'reorder', 'sched', 'security', 'stack', 'vhost', -- 2.17.1