* [dpdk-dev] [RFC] Introduce virtual PMD for Hyper-V/Azure platforms [not found] ` <20171124164812.GV4062@6wind.com> @ 2017-11-24 17:21 ` Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-11-24 17:21 UTC (permalink / raw) To: dev; +Cc: Stephen Hemminger Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS [1] instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM [2]. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual PMD (referred to as "hyper-v" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, fail-safe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications, as summarized by the following diagram: .-------------. | DPDK ethdev | `------+------' | .------+------. | hyper-v PMD | `------+------' | .------------+------------. | fail-safe PMD | `--+-------------------+--' | | | .........|......... | : | : .----+----. : .------+------. : | tap PMD | : | $vendor PMD | : `----+----' : `------+------' :--- hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: Given this RFC targets DPDK 18.02, this approach has the least impact on applications while work is being performed to enhance public DPDK APIs to improve it (e.g. hot-plug notification, vdev bus scanning and so on). Some highlights: - Enables existing applications to run unmodified with maximum performance on Hyper-V/Azure platforms. - All changes should be restricted to the hyper-v PMD (possibly a few in fail-safe PMD), no API change in DPDK. - Modular approach with little maintenance overhead (not much code) that will rely on existing PMDs for all the heavy lifting. [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms 2017-11-24 17:21 ` [dpdk-dev] [RFC] Introduce virtual PMD for Hyper-V/Azure platforms Adrien Mazarguil @ 2017-12-18 16:46 ` Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver Adrien Mazarguil ` (4 more replies) 0 siblings, 5 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 16:46 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS [1] instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM [2]. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual PMD (referred to as "hyperv" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .------------. | failsafe PMD +---------+ hyperv PMD | `--+-------------------+--' `------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: Note this diagram differs from that of the original RFC [3], with hyperv no longer acting as a data plane layer. This initial version of the driver only works in whitelist mode. Users have to provide the --vdev net_hyperv EAL option at least once to trigger it. Subsequent work will add support for blacklist mode based on automatic detection of the host environment. [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v [3] http://dpdk.org/ml/archives/dev/2017-November/082339.html Adrien Mazarguil (3): net/hyperv: introduce MS Hyper-V platform driver net/hyperv: implement core functionality net/hyperv: add "force" parameter MAINTAINERS | 6 + config/common_base | 6 + config/common_linuxapp | 1 + doc/guides/nics/features/hyperv.ini | 12 + doc/guides/nics/hyperv.rst | 119 +++ doc/guides/nics/index.rst | 1 + drivers/net/Makefile | 1 + drivers/net/hyperv/Makefile | 58 ++ drivers/net/hyperv/hyperv.c | 799 +++++++++++++++++++++ drivers/net/hyperv/rte_pmd_hyperv_version.map | 4 + mk/rte.app.mk | 1 + 11 files changed, 1008 insertions(+) create mode 100644 doc/guides/nics/features/hyperv.ini create mode 100644 doc/guides/nics/hyperv.rst create mode 100644 drivers/net/hyperv/Makefile create mode 100644 drivers/net/hyperv/hyperv.c create mode 100644 drivers/net/hyperv/rte_pmd_hyperv_version.map -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil @ 2017-12-18 16:46 ` Adrien Mazarguil 2017-12-18 18:28 ` Stephen Hemminger 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil ` (3 subsequent siblings) 4 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 16:46 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger This patch lays the groundwork for this PMD (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- MAINTAINERS | 6 + config/common_base | 6 + config/common_linuxapp | 1 + doc/guides/nics/features/hyperv.ini | 12 ++ doc/guides/nics/hyperv.rst | 49 ++++++++ doc/guides/nics/index.rst | 1 + drivers/net/Makefile | 1 + drivers/net/hyperv/Makefile | 54 +++++++++ drivers/net/hyperv/hyperv.c | 135 +++++++++++++++++++++ drivers/net/hyperv/rte_pmd_hyperv_version.map | 4 + mk/rte.app.mk | 1 + 11 files changed, 270 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5a63b40c2..fe686f4c5 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -451,6 +451,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft hyperv +M: Adrien Mazarguil <adrien.mazarguil@6wind.com> +F: drivers/net/hyperv/ +F: doc/guides/nics/hyperv.rst +F: doc/guides/nics/features/hyperv.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index b8ee8f91c..8bc83c8c9 100644 --- a/config/common_base +++ b/config/common_base @@ -280,6 +280,12 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile Microsoft Hyper-V/Azure driver +# +CONFIG_RTE_LIBRTE_HYPERV_PMD=n +CONFIG_RTE_LIBRTE_HYPERV_DEBUG=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64ec..fac6cb172 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_HYPERV_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/hyperv.ini b/doc/guides/nics/features/hyperv.ini new file mode 100644 index 000000000..170912c25 --- /dev/null +++ b/doc/guides/nics/features/hyperv.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'hyperv' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/hyperv.rst b/doc/guides/nics/hyperv.rst new file mode 100644 index 000000000..28c4443d6 --- /dev/null +++ b/doc/guides/nics/hyperv.rst @@ -0,0 +1,49 @@ +.. BSD LICENSE + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +HYPERV poll mode driver +======================= + +The HYPERV PMD (librte_pmd_hyperv) provides support for NetVSC interfaces +and associated SR-IOV virtual function (VF) devices found in Linux virtual +machines running on Microsoft Hyper-V_ (including Azure) platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_HYPERV_PMD`` (default ``y``) + + Toggle compilation of this driver. + +- ``CONFIG_RTE_LIBRTE_HYPERV_DEBUG`` (default ``n``) + + Toggle additional debugging code. diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe933..9d66353a1 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -49,6 +49,7 @@ Network Interface Controller Drivers ena enic fm10k + hyperv i40e ixgbe intel_vf diff --git a/drivers/net/Makefile b/drivers/net/Makefile index ef09b4e16..5bcc37cb3 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -55,6 +55,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4 DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5 DIRS-$(CONFIG_RTE_LIBRTE_MRVL_PMD) += mrvl +DIRS-$(CONFIG_RTE_LIBRTE_HYPERV_PMD) += hyperv DIRS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp DIRS-$(CONFIG_RTE_LIBRTE_BNXT_PMD) += bnxt DIRS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += null diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile new file mode 100644 index 000000000..82c720353 --- /dev/null +++ b/drivers/net/hyperv/Makefile @@ -0,0 +1,54 @@ +# BSD LICENSE +# +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of 6WIND S.A. nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_hyperv.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_hyperv_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_HYPERV_PMD) += hyperv.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/hyperv/hyperv.c b/drivers/net/hyperv/hyperv.c new file mode 100644 index 000000000..2f940c76f --- /dev/null +++ b/drivers/net/hyperv/hyperv.c @@ -0,0 +1,135 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <stddef.h> +#include <string.h> + +#include <rte_bus_vdev.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define HYPERV_DRIVER net_hyperv +#define HYPERV_ARG_IFACE "iface" +#define HYPERV_ARG_MAC "mac" + +#ifdef RTE_LIBRTE_HYPERV_DEBUG + +#define PMD_DRV_LOG(level, ...) \ + RTE_LOG(level, PMD, \ + RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + strrchr("/" __FILE__, '/') + 1, \ + __LINE__, \ + __func__, \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +#else /* RTE_LIBRTE_HYPERV_DEBUG */ + +#define PMD_DRV_LOG(level, ...) \ + RTE_LOG(level, PMD, \ + RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ + +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) + +/** Number of PMD instances relying on context list. */ +static unsigned int hyperv_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for PMD instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +hyperv_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const hyperv_arg[] = { + HYPERV_ARG_IFACE, + HYPERV_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + hyperv_arg); + + DEBUG("invoked as \"%s\", using arguments \"%s\"", name, args); + if (!kvargs) { + ERROR("cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++hyperv_ctx_inst; + return 0; +} + +/** + * Remove PMD instance. + * + * @param dev + * Virtual device context for PMD instance. + * + * @return + * Always 0. + */ +static int +hyperv_vdev_remove(struct rte_vdev_device *dev) +{ + (void)dev; + --hyperv_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver hyperv_vdev = { + .probe = hyperv_vdev_probe, + .remove = hyperv_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(HYPERV_DRIVER, hyperv_vdev); +RTE_PMD_REGISTER_ALIAS(HYPERV_DRIVER, eth_hyperv); +RTE_PMD_REGISTER_PARAM_STRING(net_hyperv, + HYPERV_ARG_IFACE "=<string> " + HYPERV_ARG_MAC "=<string>"); diff --git a/drivers/net/hyperv/rte_pmd_hyperv_version.map b/drivers/net/hyperv/rte_pmd_hyperv_version.map new file mode 100644 index 000000000..179140fb8 --- /dev/null +++ b/drivers/net/hyperv/rte_pmd_hyperv_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 6a6a7452e..b0701c49f 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -134,6 +134,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD) += -lrte_pmd_e1000 _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += -lrte_pmd_ena _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += -lrte_pmd_enic _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += -lrte_pmd_fm10k +_LDLIBS-$(CONFIG_RTE_LIBRTE_HYPERV_PMD) += -lrte_pmd_hyperv _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += -lrte_pmd_failsafe _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += -lrte_pmd_i40e _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += -lrte_pmd_ixgbe -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver Adrien Mazarguil @ 2017-12-18 18:28 ` Stephen Hemminger 2017-12-18 19:54 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 18:28 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Mon, 18 Dec 2017 17:46:21 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > +#ifdef RTE_LIBRTE_HYPERV_DEBUG > + > +#define PMD_DRV_LOG(level, ...) \ > + RTE_LOG(level, PMD, \ > + RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > + strrchr("/" __FILE__, '/') + 1, \ > + __LINE__, \ > + __func__, \ > + RTE_FMT_TAIL(__VA_ARGS__,))) > + > +#else /* RTE_LIBRTE_HYPERV_DEBUG */ > + > +#define PMD_DRV_LOG(level, ...) \ > + RTE_LOG(level, PMD, \ > + RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ > + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > + RTE_FMT_TAIL(__VA_ARGS__,))) > + > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > + > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > + Please don't use DEBUG() etc macros. It makes it easier for tools that do global updates or scans if all drivers use the same model of PMD_DRV_LOG ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-18 18:28 ` Stephen Hemminger @ 2017-12-18 19:54 ` Thomas Monjalon 2017-12-18 21:17 ` Stephen Hemminger 0 siblings, 1 reply; 112+ messages in thread From: Thomas Monjalon @ 2017-12-18 19:54 UTC (permalink / raw) To: Stephen Hemminger, Adrien Mazarguil; +Cc: dev, Ferruh Yigit 18/12/2017 19:28, Stephen Hemminger: > On Mon, 18 Dec 2017 17:46:21 +0100 > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > +#ifdef RTE_LIBRTE_HYPERV_DEBUG > > + > > +#define PMD_DRV_LOG(level, ...) \ > > + RTE_LOG(level, PMD, \ > > + RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > > + strrchr("/" __FILE__, '/') + 1, \ > > + __LINE__, \ > > + __func__, \ > > + RTE_FMT_TAIL(__VA_ARGS__,))) > > + > > +#else /* RTE_LIBRTE_HYPERV_DEBUG */ > > + > > +#define PMD_DRV_LOG(level, ...) \ > > + RTE_LOG(level, PMD, \ > > + RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ > > + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > > + RTE_FMT_TAIL(__VA_ARGS__,))) > > + > > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > + > > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > + > > Please don't use DEBUG() etc macros. It makes it easier for tools that do > global updates or scans if all drivers use the same model of PMD_DRV_LOG The new standard is to use dynamic logtype. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-18 19:54 ` Thomas Monjalon @ 2017-12-18 21:17 ` Stephen Hemminger 2017-12-19 10:01 ` Adrien Mazarguil 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 21:17 UTC (permalink / raw) To: Thomas Monjalon; +Cc: Adrien Mazarguil, dev, Ferruh Yigit On Mon, 18 Dec 2017 20:54:16 +0100 Thomas Monjalon <thomas@monjalon.net> wrote: > > > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > > + > > > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > > + > > > > Please don't use DEBUG() etc macros. It makes it easier for tools that do > > global updates or scans if all drivers use the same model of PMD_DRV_LOG > > The new standard is to use dynamic logtype. Agree, please use dynamic logging, and also don't redefine new macros like DEBUG/INFO/WARN/ERROR. Instead use PMD_DRV_LOG or equivalent macros. The base rule here is that all drivers should look the same as much as reasonably possible. This makes reviewers of other subsystems more likely to see problems. It also allows for later changes where some developer does a global improvement across many PMD's. Drivers should not be snowflakes, each one is not unique. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-18 21:17 ` Stephen Hemminger @ 2017-12-19 10:01 ` Adrien Mazarguil 2017-12-19 11:15 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-19 10:01 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Thomas Monjalon, dev, Ferruh Yigit On Mon, Dec 18, 2017 at 01:17:51PM -0800, Stephen Hemminger wrote: > On Mon, 18 Dec 2017 20:54:16 +0100 > Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > > > + > > > > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > > > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > > > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > > > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > > > + > > > > > > Please don't use DEBUG() etc macros. It makes it easier for tools that do > > > global updates or scans if all drivers use the same model of PMD_DRV_LOG > > > > The new standard is to use dynamic logtype. > > Agree, please use dynamic logging, and also don't redefine new macros like DEBUG/INFO/WARN/ERROR. > Instead use PMD_DRV_LOG or equivalent macros. Wait, the above definitions are only convenience wrappers to PMD_DRV_LOG(), itself a wrapper to RTE_LOG(), itself a wrapper to rte_log(), their presence is not triggered according to compilation options, did I miss something? Let me bring back some context from the original patch: #ifdef RTE_LIBRTE_HYPERV_DEBUG #define PMD_DRV_LOG(level, ...) \ RTE_LOG(level, PMD, \ RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ strrchr("/" __FILE__, '/') + 1, \ __LINE__, \ __func__, \ RTE_FMT_TAIL(__VA_ARGS__,))) #else /* RTE_LIBRTE_HYPERV_DEBUG */ #define PMD_DRV_LOG(level, ...) \ RTE_LOG(level, PMD, \ RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ RTE_FMT_TAIL(__VA_ARGS__,))) #endif /* RTE_LIBRTE_HYPERV_DEBUG */ #define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) #define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) Enabling RTE_LIBRTE_HYPERV_DEBUG adds file and line information to log output, messages are otherwise unaffected by that compilation option. Adding this information required some sort of wrapper to avoid needless clutter. Nothing against outputting file/line information when compiled in debug mode right? > The base rule here is that all drivers should look the same as much > as reasonably possible. This makes reviewers of other subsystems more likely > to see problems. It also allows for later changes where some developer does a global > improvement across many PMD's. > > Drivers should not be snowflakes, each one is not unique. Point taken, do you confirm replacing i.e. WARN(...) with PMD_DRV_LOG(WARN, ...) and friends is all that's needed? -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-19 10:01 ` Adrien Mazarguil @ 2017-12-19 11:15 ` Thomas Monjalon 2017-12-19 13:13 ` Adrien Mazarguil 0 siblings, 1 reply; 112+ messages in thread From: Thomas Monjalon @ 2017-12-19 11:15 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: dev, Stephen Hemminger, Ferruh Yigit 19/12/2017 11:01, Adrien Mazarguil: > On Mon, Dec 18, 2017 at 01:17:51PM -0800, Stephen Hemminger wrote: > > On Mon, 18 Dec 2017 20:54:16 +0100 > > Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > > > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > > > > + > > > > > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > > > > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > > > > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > > > > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > > > > + > > > > > > > > Please don't use DEBUG() etc macros. It makes it easier for tools that do > > > > global updates or scans if all drivers use the same model of PMD_DRV_LOG > > > > > > The new standard is to use dynamic logtype. > > > > Agree, please use dynamic logging, and also don't redefine new macros like DEBUG/INFO/WARN/ERROR. > > Instead use PMD_DRV_LOG or equivalent macros. > > Wait, the above definitions are only convenience wrappers to PMD_DRV_LOG(), > itself a wrapper to RTE_LOG(), itself a wrapper to rte_log(), their presence > is not triggered according to compilation options, did I miss something? > > Let me bring back some context from the original patch: > > #ifdef RTE_LIBRTE_HYPERV_DEBUG > > #define PMD_DRV_LOG(level, ...) \ > RTE_LOG(level, PMD, \ > RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > strrchr("/" __FILE__, '/') + 1, \ > __LINE__, \ > __func__, \ > RTE_FMT_TAIL(__VA_ARGS__,))) > > #else /* RTE_LIBRTE_HYPERV_DEBUG */ > > #define PMD_DRV_LOG(level, ...) \ > RTE_LOG(level, PMD, \ > RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ > RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > RTE_FMT_TAIL(__VA_ARGS__,))) > > #endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > #define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > #define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > Enabling RTE_LIBRTE_HYPERV_DEBUG adds file and line information to log > output, messages are otherwise unaffected by that compilation option. Adding > this information required some sort of wrapper to avoid needless clutter. > > Nothing against outputting file/line information when compiled in debug mode > right? I am not sure __FILE__, __LINE__ and __func__ are so much useful. The log message should be unique enough. > > The base rule here is that all drivers should look the same as much > > as reasonably possible. This makes reviewers of other subsystems more likely > > to see problems. It also allows for later changes where some developer does a global > > improvement across many PMD's. > > > > Drivers should not be snowflakes, each one is not unique. > > Point taken, do you confirm replacing i.e. WARN(...) with > PMD_DRV_LOG(WARN, ...) and friends is all that's needed? You need to remove the compile-time option for DEBUG, and rely on dynamic log type, thanks to rte_log_register(). ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver 2017-12-19 11:15 ` Thomas Monjalon @ 2017-12-19 13:13 ` Adrien Mazarguil 0 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-19 13:13 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Stephen Hemminger, Ferruh Yigit On Tue, Dec 19, 2017 at 12:15:38PM +0100, Thomas Monjalon wrote: > 19/12/2017 11:01, Adrien Mazarguil: > > On Mon, Dec 18, 2017 at 01:17:51PM -0800, Stephen Hemminger wrote: > > > On Mon, 18 Dec 2017 20:54:16 +0100 > > > Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > > > > > +#endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > > > > > + > > > > > > +#define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > > > > > +#define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > > > > > +#define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > > > > > +#define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > > > > > + > > > > > > > > > > Please don't use DEBUG() etc macros. It makes it easier for tools that do > > > > > global updates or scans if all drivers use the same model of PMD_DRV_LOG > > > > > > > > The new standard is to use dynamic logtype. > > > > > > Agree, please use dynamic logging, and also don't redefine new macros like DEBUG/INFO/WARN/ERROR. > > > Instead use PMD_DRV_LOG or equivalent macros. > > > > Wait, the above definitions are only convenience wrappers to PMD_DRV_LOG(), > > itself a wrapper to RTE_LOG(), itself a wrapper to rte_log(), their presence > > is not triggered according to compilation options, did I miss something? > > > > Let me bring back some context from the original patch: > > > > #ifdef RTE_LIBRTE_HYPERV_DEBUG > > > > #define PMD_DRV_LOG(level, ...) \ > > RTE_LOG(level, PMD, \ > > RTE_FMT("%s:%u: %s(): " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > > strrchr("/" __FILE__, '/') + 1, \ > > __LINE__, \ > > __func__, \ > > RTE_FMT_TAIL(__VA_ARGS__,))) > > > > #else /* RTE_LIBRTE_HYPERV_DEBUG */ > > > > #define PMD_DRV_LOG(level, ...) \ > > RTE_LOG(level, PMD, \ > > RTE_FMT(RTE_STR(HYPERV_DRIVER) ": " \ > > RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ > > RTE_FMT_TAIL(__VA_ARGS__,))) > > > > #endif /* RTE_LIBRTE_HYPERV_DEBUG */ > > > > #define DEBUG(...) PMD_DRV_LOG(DEBUG, __VA_ARGS__) > > #define INFO(...) PMD_DRV_LOG(INFO, __VA_ARGS__) > > #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > > #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > > > Enabling RTE_LIBRTE_HYPERV_DEBUG adds file and line information to log > > output, messages are otherwise unaffected by that compilation option. Adding > > this information required some sort of wrapper to avoid needless clutter. > > > > Nothing against outputting file/line information when compiled in debug mode > > right? > > I am not sure __FILE__, __LINE__ and __func__ are so much useful. > The log message should be unique enough. I don't share your opinion. mlx4/mlx5 PMDs output similar information when compiled in debug mode and that proved quite useful during development and when tracking down bugs. Thing is, mere users are not the target audience, it's a development tool that doesn't need to be part of distributed binaries, hence the compilation option. > > > The base rule here is that all drivers should look the same as much > > > as reasonably possible. This makes reviewers of other subsystems more likely > > > to see problems. It also allows for later changes where some developer does a global > > > improvement across many PMD's. > > > > > > Drivers should not be snowflakes, each one is not unique. > > > > Point taken, do you confirm replacing i.e. WARN(...) with > > PMD_DRV_LOG(WARN, ...) and friends is all that's needed? > > You need to remove the compile-time option for DEBUG, > and rely on dynamic log type, thanks to rte_log_register(). OK, I didn't know about rte_log_register() which may explain some of the confusion, I'll add it in v2 then. To summarize what needs to be done for v2: - Call rte_log_register() during init. - Use its return value in place of the second argument to RTE_LOG(). - Replace DEBUG/WARN/INFO/ERROR() wrappers with direct calls to PMD_DRV_LOG() for consistency with other PMDs. - Finally, remove debugging code/information and related compilation option since they're useless to end users. -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver Adrien Mazarguil @ 2017-12-18 16:46 ` Adrien Mazarguil 2017-12-18 17:04 ` Wiles, Keith ` (4 more replies) 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 3/3] net/hyperv: add "force" parameter Adrien Mazarguil ` (2 subsequent siblings) 4 siblings, 5 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 16:46 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the hyperv driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- doc/guides/nics/hyperv.rst | 65 ++++ drivers/net/hyperv/Makefile | 4 + drivers/net/hyperv/hyperv.c | 654 ++++++++++++++++++++++++++++++++++++++- 3 files changed, 722 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/hyperv.rst b/doc/guides/nics/hyperv.rst index 28c4443d6..8f7a8b153 100644 --- a/doc/guides/nics/hyperv.rst +++ b/doc/guides/nics/hyperv.rst @@ -37,6 +37,50 @@ machines running on Microsoft Hyper-V_ (including Azure) platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .------------. + | failsafe PMD +---------+ hyperv PMD | + `--+-------------------+--' `------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + Build options ------------- @@ -47,3 +91,24 @@ Build options - ``CONFIG_RTE_LIBRTE_HYPERV_DEBUG`` (default ``n``) Toggle additional debugging code. + +Run-time parameters +------------------- + +To invoke this PMD, applications have to explicitly provide the +``--vdev=net_hyperv`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this PMD + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile index 82c720353..0a7d2986c 100644 --- a/drivers/net/hyperv/Makefile +++ b/drivers/net/hyperv/Makefile @@ -40,6 +40,9 @@ EXPORT_MAP := rte_pmd_hyperv_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -47,6 +50,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_HYPERV_PMD) += hyperv.c diff --git a/drivers/net/hyperv/hyperv.c b/drivers/net/hyperv/hyperv.c index 2f940c76f..bad224be9 100644 --- a/drivers/net/hyperv/hyperv.c +++ b/drivers/net/hyperv/hyperv.c @@ -31,17 +31,40 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <errno.h> +#include <fcntl.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> #include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> +#include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define HYPERV_DRIVER net_hyperv #define HYPERV_ARG_IFACE "iface" #define HYPERV_ARG_MAC "mac" +#define HYPERV_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #ifdef RTE_LIBRTE_HYPERV_DEBUG @@ -68,12 +91,603 @@ #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) +/** + * Convert a MAC address string to binary form. + * + * Note: this function should be exposed by rte_ether.h as the reverse of + * ether_format_addr(). + * + * Several MAC string formats are supported on input for convenience: + * + * 1. "12:34:56:78:9a:bc" + * 2. "12-34-56-78-9a-bc" + * 3. "123456789abc" + * 4. Upper/lowercase hexadecimal. + * 5. Any combination of the above, e.g. "12:34-5678-9aBC". + * 6. Partial addresses are allowed, with low-order bytes filled first: + * - "5:6:78c" translates to "00:00:05:06:07:8c", + * - "5678c" translates to "00:00:00:05:67:8c". + * + * Non-hexadecimal characters, unknown separators and strings specifying + * more than 6 bytes are not allowed. + * + * @param[out] eth_addr + * Pointer to conversion result buffer. + * @param[in] str + * MAC address string to convert. + * + * @return + * 0 on success, -EINVAL in case of unsupported format. + */ +static int +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) +{ + static const uint8_t conv[0x100] = { + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, + ['\0'] = 0x60, + }; + uint64_t addr = 0; + uint64_t buf = 0; + unsigned int i = 0; + unsigned int n = 0; + uint8_t tmp; + + do { + tmp = conv[(int)*(str++)]; + if (!tmp) + return -EINVAL; + if (tmp & 0x40) { + i += (i & 1) + (!i << 1); + addr = (addr << (i << 2)) | buf; + n += i; + buf = 0; + i = 0; + } else { + buf = (buf << 4) | (tmp & 0xf); + ++i; + } + } while (!(tmp & 0x20)); + if (n > 12) + return -EINVAL; + i = RTE_DIM(eth_addr->addr_bytes); + while (i) { + eth_addr->addr_bytes[--i] = addr & 0xff; + addr >>= 8; + } + return 0; +} + +/** Context structure for a hyperv instance. */ +struct hyperv_ctx { + LIST_ENTRY(hyperv_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< ID used to generate unique names. */ + char name[64]; /**< Unique name for hyperv instance. */ + char devname[64]; /**< Fail-safe PMD instance name. */ + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Communication pipe with fail-safe instance. */ + char yield[256]; /**< Current device string used with fail-safe. */ +}; + +/** Context list is common to all PMD instances. */ +static LIST_HEAD(, hyperv_ctx) hyperv_ctx_list = + LIST_HEAD_INITIALIZER(hyperv_ctx_list); + +/** Number of entries in context list. */ +static unsigned int hyperv_ctx_count; + /** Number of PMD instances relying on context list. */ static unsigned int hyperv_ctx_inst; /** + * Destroy a hyperv context instance. + * + * @param ctx + * Context to destroy. + */ +static void +hyperv_ctx_destroy(struct hyperv_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + /* Poisoning for debugging purposes. */ + memset(ctx, 0x22, sizeof(*ctx)); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +hyperv_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + ERROR("cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + ERROR("cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + WARN("cannot retrieve information about interface" + " \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +hyperv_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; + FILE *f; + int ret; + int len = 0; + + snprintf(path, sizeof(path), temp, iface->if_name); + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve the last component of a path. + * + * This is a simplified basename() that does not modify its input buffer to + * handle trailing backslashes. + * + * @param[in] path + * Path to retrieve the last component from. + * + * @return + * Pointer to the last component. + */ +static const char * +hyperv_basename(const char *path) +{ + const char *tmp = path; + + while (*tmp) + if (*(tmp++) == '/') + path = tmp; + return path; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +hyperv_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size - 1) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with hyperv context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the hyperv context and communicates + * its bus address to the fail-safe PMD instance if so. + * + * It is normally used with hyperv_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct hyperv_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +hyperv_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct hyperv_ctx *ctx = va_arg(ap, struct hyperv_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + DEBUG("NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (hyperv_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + if (strcmp(hyperv_basename(buf), "pci")) + return 0; + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = hyperv_basename(buf); + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance if updated. */ + if (!strcmp(addr, ctx->yield)) + return 1; + DEBUG("associating PCI device \"%s\" with NetVSC interface \"%s\"" + " (index %u)", + addr, ctx->if_name, ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + WARN("cannot associate PCI device name \"%s\" with interface" + " \"%s\": %s", + addr, ctx->if_name, rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by HYPERV_PROBE_MS as long + * as an hyperv context instance exists. + * + * @param arg + * Ignored. + */ +static void +hyperv_alarm(void *arg) +{ + struct hyperv_ctx *ctx; + int ret; + + (void)arg; + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) { + ret = hyperv_foreach_iface(hyperv_device_probe, ctx); + if (ret) + break; + } + if (!hyperv_ctx_count) + return; + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); + if (ret < 0) { + ERROR("unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a hyperv context from. + * + * This function instantiates hyperv contexts either for all NetVSC devices + * found on the system or only a subset provided as device arguments. + * + * It is normally used with hyperv_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +hyperv_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct hyperv_ctx *ctx; + uint16_t port_id; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, HYPERV_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, HYPERV_ARG_MAC)) { + struct ether_addr tmp; + + if (ether_addr_from_str(&tmp, pair->value)) { + ERROR("invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (!is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + WARN("interface \"%s\" (index %u) is already handled, skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!hyperv_iface_is_netvsc(iface)) { + if (!specified) + return 0; + WARN("interface \"%s\" (index %u) is not NetVSC, skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + ERROR("cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = hyperv_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + ERROR("cannot allocate control pipe for interface \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + int fdf = fcntl(ctx->pipe[i], F_GETFD); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1 && + fdf != -1 && + fcntl(ctx->pipe[i], F_SETFD, + i ? fdf | FD_CLOEXEC : fdf & ~FD_CLOEXEC) != -1) + continue; + ret = -errno; + ERROR("cannot toggle non-blocking or close-on-exec flags on" + " control file descriptor #%u (%d): %s", + i, ctx->pipe[i], rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name) - 1) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname) - 1) + ++i; + /* + * Note: bash replaces the default sh interpreter used by popen() + * because as seen with dash, POSIX-compliant shells do not + * necessarily support redirections with file descriptor numbers + * above 9. + */ + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "exec(exec bash -c " + "'while read -r tmp <&%u 2> /dev/null;" + " do dev=$tmp; done;" + " echo $dev" + "'),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs) - 1) + ++i; + if (i) { + ret = -ENOBUFS; + ERROR("generated virtual device name or argument list too long" + " for interface \"%s\"", ctx->if_name); + goto error; + } + /* + * Remove any competing rte_eth_dev entries sharing the same MAC + * address, fail-safe instances created by this PMD will handle them + * as sub-devices later. + */ + RTE_ETH_FOREACH_DEV(port_id) { + struct rte_device *dev = rte_eth_devices[port_id].device; + struct rte_bus *bus = rte_bus_find_by_device(dev); + struct ether_addr tmp; + + rte_eth_macaddr_get(port_id, &tmp); + if (!is_same_ether_addr(eth_addr, &tmp)) + continue; + WARN("removing device \"%s\" with identical MAC address to" + " re-create it as a fail-safe sub-device", + dev->name); + if (!bus) + ret = -EINVAL; + else + ret = rte_eal_hotplug_remove(bus->name, dev->name); + if (ret < 0) { + ERROR("unable to remove device \"%s\": %s", + dev->name, rte_strerror(-ret)); + goto error; + } + } + /* Request virtual device generation. */ + DEBUG("generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&hyperv_ctx_list, ctx, entry); + ++hyperv_ctx_count; + DEBUG("added NetVSC interface \"%s\" to context list", ctx->if_name); + return 0; +error: + if (ctx) + hyperv_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for PMD instance. * @@ -92,12 +706,38 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", hyperv_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; DEBUG("invoked as \"%s\", using arguments \"%s\"", name, args); if (!kvargs) { ERROR("cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, HYPERV_ARG_IFACE) || + !strcmp(pair->key, HYPERV_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(hyperv_alarm, NULL); + /* Gather interfaces. */ + ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + WARN("some of the specified parameters did not match valid" + " network interfaces"); + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); + if (ret < 0) { + ERROR("unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -108,6 +748,9 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) /** * Remove PMD instance. * + * The alarm callback and underlying hyperv context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for PMD instance. * @@ -118,7 +761,16 @@ static int hyperv_vdev_remove(struct rte_vdev_device *dev) { (void)dev; - --hyperv_ctx_inst; + if (--hyperv_ctx_inst) + return 0; + rte_eal_alarm_cancel(hyperv_alarm, NULL); + while (!LIST_EMPTY(&hyperv_ctx_list)) { + struct hyperv_ctx *ctx = LIST_FIRST(&hyperv_ctx_list); + + LIST_REMOVE(ctx, entry); + --hyperv_ctx_count; + hyperv_ctx_destroy(ctx); + } return 0; } -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil @ 2017-12-18 17:04 ` Wiles, Keith 2017-12-18 17:59 ` Adrien Mazarguil 2017-12-18 18:26 ` Stephen Hemminger ` (3 subsequent siblings) 4 siblings, 1 reply; 112+ messages in thread From: Wiles, Keith @ 2017-12-18 17:04 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Yigit, Ferruh, dev, Stephen Hemminger > On Dec 18, 2017, at 10:46 AM, Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > As described in more details in the attached documentation (see patch > contents), this virtual device driver manages NetVSC interfaces in virtual > machines hosted by Hyper-V/Azure platforms. > > This driver does not manage traffic nor Ethernet devices directly; it acts > as a thin configuration layer that automatically instantiates and controls > fail-safe PMD instances combining tap and PCI sub-devices, so that each > NetVSC interface is exposed as a single consolidated port to DPDK > applications. > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > applications automatically benefit from increased throughput when present > and automatic fallback on NetVSC otherwise without interruption thanks to > fail-safe's hot-plug handling. > > Once initialized, the sole job of the hyperv driver is to regularly scan > for PCI devices to associate with NetVSC interfaces and feed their > addresses to corresponding fail-safe instances. > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > --- > doc/guides/nics/hyperv.rst | 65 ++++ > drivers/net/hyperv/Makefile | 4 + > drivers/net/hyperv/hyperv.c | 654 ++++++++++++++++++++++++++++++++++++++- > 3 files changed, 722 insertions(+), 1 deletion(-) > > diff --git a/doc/guides/nics/hyperv.rst b/doc/guides/nics/hyperv.rst > index 28c4443d6..8f7a8b153 100644 > --- a/doc/guides/nics/hyperv.rst > +++ b/doc/guides/nics/hyperv.rst > @@ -37,6 +37,50 @@ machines running on Microsoft Hyper-V_ (including Azure) platforms. > > .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v > > +Implementation details > +---------------------- > + > +Each instance of this driver effectively needs to drive two devices: the > +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from > +this point on) counterpart sharing the same MAC address. > + > +Physical devices are part of the host system and cannot be maintained during > +VM migration. From a VM standpoint they appear as hot-plug devices that come > +and go without prior notice. > + > +When the physical device is present, egress and most of the ingress traffic > +flows through it; only multicasts and other hypervisor control still flow > +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. > + > +To avoid unnecessary code duplication and ensure maximum performance, > +handling of physical devices is left to their original PMDs; this virtual > +device driver (also known as *vdev*) manages other PMDs as summarized by the > +following block diagram:: > + > + .------------------. > + | DPDK application | > + `--------+---------' > + | > + .------+------. > + | DPDK ethdev | > + `------+------' Control > + | | > + .------------+------------. v .------------. > + | failsafe PMD +---------+ hyperv PMD | > + `--+-------------------+--' `------------' > + | | > + | .........|......... > + | : | : > + .----+----. : .----+----. : > + | tap PMD | : | any PMD | : > + `----+----' : `----+----' : <-- Hot-pluggable > + | : | : > + .------+-------. : .-----+-----. : > + | NetVSC-based | : | SR-IOV VF | : > + | netdevice | : | device | : > + `--------------' : `-----------' : > + :.................: > + > Build options > ------------- > > @@ -47,3 +91,24 @@ Build options > - ``CONFIG_RTE_LIBRTE_HYPERV_DEBUG`` (default ``n``) > > Toggle additional debugging code. > + > +Run-time parameters > +------------------- > + > +To invoke this PMD, applications have to explicitly provide the > +``--vdev=net_hyperv`` EAL option. > + > +The following device parameters are supported: > + > +- ``iface`` [string] > + > + Provide a specific NetVSC interface (netdevice) name to attach this PMD > + to. Can be provided multiple times for additional instances. > + > +- ``mac`` [string] > + > + Same as ``iface`` except a suitable NetVSC interface is located using its > + MAC address. > + > +Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to > +all NetVSC interfaces found on the system. > diff --git a/drivers/net/hyperv/Makefile b/drivers/net/hyperv/Makefile > index 82c720353..0a7d2986c 100644 > --- a/drivers/net/hyperv/Makefile > +++ b/drivers/net/hyperv/Makefile > @@ -40,6 +40,9 @@ EXPORT_MAP := rte_pmd_hyperv_version.map > CFLAGS += -O3 > CFLAGS += -g > CFLAGS += -std=c11 -pedantic -Wall -Wextra > +CFLAGS += -D_XOPEN_SOURCE=600 > +CFLAGS += -D_BSD_SOURCE > +CFLAGS += -D_DEFAULT_SOURCE > CFLAGS += $(WERROR_FLAGS) > > # Dependencies. > @@ -47,6 +50,7 @@ LDLIBS += -lrte_bus_vdev > LDLIBS += -lrte_eal > LDLIBS += -lrte_ethdev > LDLIBS += -lrte_kvargs > +LDLIBS += -lrte_net > > # Source files. > SRCS-$(CONFIG_RTE_LIBRTE_HYPERV_PMD) += hyperv.c > diff --git a/drivers/net/hyperv/hyperv.c b/drivers/net/hyperv/hyperv.c > index 2f940c76f..bad224be9 100644 > --- a/drivers/net/hyperv/hyperv.c > +++ b/drivers/net/hyperv/hyperv.c > @@ -31,17 +31,40 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > */ > > +#include <errno.h> > +#include <fcntl.h> > +#include <linux/sockios.h> > +#include <net/if.h> > +#include <netinet/ip.h> > +#include <stdarg.h> > #include <stddef.h> > +#include <stdlib.h> > +#include <stdint.h> > +#include <stdio.h> > #include <string.h> > +#include <sys/ioctl.h> > +#include <sys/queue.h> > +#include <sys/socket.h> > +#include <unistd.h> > > +#include <rte_alarm.h> > +#include <rte_bus.h> > #include <rte_bus_vdev.h> > +#include <rte_common.h> > #include <rte_config.h> > +#include <rte_dev.h> > +#include <rte_errno.h> > +#include <rte_ethdev.h> > +#include <rte_ether.h> > #include <rte_kvargs.h> > #include <rte_log.h> > > #define HYPERV_DRIVER net_hyperv > #define HYPERV_ARG_IFACE "iface" > #define HYPERV_ARG_MAC "mac" > +#define HYPERV_PROBE_MS 1000 > + > +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > #ifdef RTE_LIBRTE_HYPERV_DEBUG > > @@ -68,12 +91,603 @@ > #define WARN(...) PMD_DRV_LOG(WARNING, __VA_ARGS__) > #define ERROR(...) PMD_DRV_LOG(ERR, __VA_ARGS__) > > +/** > + * Convert a MAC address string to binary form. > + * > + * Note: this function should be exposed by rte_ether.h as the reverse of > + * ether_format_addr(). > + * > + * Several MAC string formats are supported on input for convenience: > + * > + * 1. "12:34:56:78:9a:bc" > + * 2. "12-34-56-78-9a-bc" > + * 3. "123456789abc" > + * 4. Upper/lowercase hexadecimal. > + * 5. Any combination of the above, e.g. "12:34-5678-9aBC". > + * 6. Partial addresses are allowed, with low-order bytes filled first: > + * - "5:6:78c" translates to "00:00:05:06:07:8c", > + * - "5678c" translates to "00:00:00:05:67:8c". > + * > + * Non-hexadecimal characters, unknown separators and strings specifying > + * more than 6 bytes are not allowed. > + * > + * @param[out] eth_addr > + * Pointer to conversion result buffer. > + * @param[in] str > + * MAC address string to convert. > + * > + * @return > + * 0 on success, -EINVAL in case of unsupported format. > + */ > +static int > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > +{ > + static const uint8_t conv[0x100] = { > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > + ['\0'] = 0x60, > + }; > + uint64_t addr = 0; > + uint64_t buf = 0; > + unsigned int i = 0; > + unsigned int n = 0; > + uint8_t tmp; > + > + do { > + tmp = conv[(int)*(str++)]; > + if (!tmp) > + return -EINVAL; > + if (tmp & 0x40) { > + i += (i & 1) + (!i << 1); > + addr = (addr << (i << 2)) | buf; > + n += i; > + buf = 0; > + i = 0; > + } else { > + buf = (buf << 4) | (tmp & 0xf); > + ++i; > + } > + } while (!(tmp & 0x20)); > + if (n > 12) > + return -EINVAL; > + i = RTE_DIM(eth_addr->addr_bytes); > + while (i) { > + eth_addr->addr_bytes[--i] = addr & 0xff; > + addr >>= 8; > + } > + return 0; > +} You already called this out above, why not just push this into rte_ether.h file. I know I could use it if it were public. > + > +/** Context structure for a hyperv instance. */ > +struct hyperv_ctx { > + LIST_ENTRY(hyperv_ctx) entry; /**< Next entry in list. */ > + unsigned int id; /**< ID used to generate unique names. */ > + char name[64]; /**< Unique name for hyperv instance. */ > + char devname[64]; /**< Fail-safe PMD instance name. */ > + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ > + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ > + unsigned int if_index; /**< NetVSC netdevice index. */ > + struct ether_addr if_addr; /**< NetVSC MAC address. */ > + int pipe[2]; /**< Communication pipe with fail-safe instance. */ > + char yield[256]; /**< Current device string used with fail-safe. */ > +}; > + > +/** Context list is common to all PMD instances. */ > +static LIST_HEAD(, hyperv_ctx) hyperv_ctx_list = > + LIST_HEAD_INITIALIZER(hyperv_ctx_list); > + > +/** Number of entries in context list. */ > +static unsigned int hyperv_ctx_count; > + > /** Number of PMD instances relying on context list. */ > static unsigned int hyperv_ctx_inst; > > /** > + * Destroy a hyperv context instance. > + * > + * @param ctx > + * Context to destroy. > + */ > +static void > +hyperv_ctx_destroy(struct hyperv_ctx *ctx) > +{ > + if (ctx->pipe[0] != -1) > + close(ctx->pipe[0]); > + if (ctx->pipe[1] != -1) > + close(ctx->pipe[1]); > + /* Poisoning for debugging purposes. */ > + memset(ctx, 0x22, sizeof(*ctx)); > + free(ctx); > +} > + > +/** > + * Iterate over system network interfaces. > + * > + * This function runs a given callback function for each netdevice found on > + * the system. > + * > + * @param func > + * Callback function pointer. List traversal is aborted when this function > + * returns a nonzero value. > + * @param ... > + * Variable parameter list passed as @p va_list to @p func. > + * > + * @return > + * 0 when the entire list is traversed successfully, a negative error code > + * in case or failure, or the nonzero value returned by @p func when list > + * traversal is aborted. > + */ > +static int > +hyperv_foreach_iface(int (*func)(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap), ...) > +{ > + struct if_nameindex *iface = if_nameindex(); > + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > + unsigned int i; > + int ret = 0; > + > + if (!iface) { > + ret = -ENOBUFS; > + ERROR("cannot retrieve system network interfaces"); > + goto error; > + } > + if (s == -1) { > + ret = -errno; > + ERROR("cannot open socket: %s", rte_strerror(errno)); > + goto error; > + } > + for (i = 0; iface[i].if_name; ++i) { > + struct ifreq req; > + struct ether_addr eth_addr; > + va_list ap; > + > + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); > + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { > + WARN("cannot retrieve information about interface" > + " \"%s\": %s", > + req.ifr_name, rte_strerror(errno)); > + continue; > + } > + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, > + RTE_DIM(eth_addr.addr_bytes)); > + va_start(ap, func); > + ret = func(&iface[i], ð_addr, ap); > + va_end(ap); > + if (ret) > + break; > + } > +error: > + if (s != -1) > + close(s); > + if (iface) > + if_freenameindex(iface); > + return ret; > +} > + > +/** > + * Determine if a network interface is NetVSC. > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * > + * @return > + * A nonzero value when interface is detected as NetVSC. In case of error, > + * rte_errno is updated and 0 returned. > + */ > +static int > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > +{ > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; > + FILE *f; > + int ret; > + int len = 0; > + > + snprintf(path, sizeof(path), temp, iface->if_name); > + f = fopen(path, "r"); > + if (!f) { > + rte_errno = errno; > + return 0; > + } > + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); > + if (ret == EOF) > + rte_errno = errno; > + ret = len == (int)strlen(NETVSC_CLASS_ID); > + fclose(f); > + return ret; > +} > + > +/** > + * Retrieve the last component of a path. > + * > + * This is a simplified basename() that does not modify its input buffer to > + * handle trailing backslashes. > + * > + * @param[in] path > + * Path to retrieve the last component from. > + * > + * @return > + * Pointer to the last component. > + */ > +static const char * > +hyperv_basename(const char *path) > +{ > + const char *tmp = path; > + > + while (*tmp) > + if (*(tmp++) == '/') > + path = tmp; > + return path; > +} Why not just user rindex() to find the last ‘/‘ instead of this routine? I know it is not performance critical. > + > +/** > + * Retrieve network interface data from sysfs symbolic link. > + * > + * @param[out] buf > + * Output data buffer. > + * @param size > + * Output buffer size. > + * @param[in] if_name > + * Netdevice name. > + * @param[in] relpath > + * Symbolic link path relative to netdevice sysfs entry. > + * > + * @return > + * 0 on success, a negative error code otherwise. > + */ > +static int > +hyperv_sysfs_readlink(char *buf, size_t size, const char *if_name, > + const char *relpath) > +{ > + int ret; > + > + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); > + if (ret == -1 || (size_t)ret >= size - 1) > + return -ENOBUFS; > + ret = readlink(buf, buf, size); > + if (ret == -1) > + return -errno; > + if ((size_t)ret >= size - 1) > + return -ENOBUFS; > + buf[ret] = '\0'; > + return 0; > +} > + > +/** > + * Probe a network interface to associate with hyperv context. > + * > + * This function determines if the network device matches the properties of > + * the NetVSC interface associated with the hyperv context and communicates > + * its bus address to the fail-safe PMD instance if so. > + * > + * It is normally used with hyperv_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - struct hyperv_ctx *ctx: > + * Context to associate network interface with. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +hyperv_device_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + struct hyperv_ctx *ctx = va_arg(ap, struct hyperv_ctx *); > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > + const char *addr; > + size_t len; > + int ret; > + > + /* Skip non-matching or unwanted NetVSC interfaces. */ > + if (ctx->if_index == iface->if_index) { > + if (!strcmp(ctx->if_name, iface->if_name)) > + return 0; > + DEBUG("NetVSC interface \"%s\" (index %u) renamed \"%s\"", > + ctx->if_name, ctx->if_index, iface->if_name); > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + return 0; > + } > + if (hyperv_iface_is_netvsc(iface)) > + return 0; > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > + return 0; > + /* Look for associated PCI device. */ > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device/subsystem"); > + if (ret) > + return 0; > + if (strcmp(hyperv_basename(buf), "pci")) > + return 0; > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device"); > + if (ret) > + return 0; > + addr = hyperv_basename(buf); > + len = strlen(addr); > + if (!len) > + return 0; > + /* Send PCI device argument to fail-safe PMD instance if updated. */ > + if (!strcmp(addr, ctx->yield)) > + return 1; > + DEBUG("associating PCI device \"%s\" with NetVSC interface \"%s\"" > + " (index %u)", > + addr, ctx->if_name, ctx->if_index); > + memmove(buf, addr, len + 1); > + addr = buf; > + buf[len] = '\n'; > + ret = write(ctx->pipe[1], addr, len + 1); > + buf[len] = '\0'; > + if (ret == -1) { > + if (errno == EINTR || errno == EAGAIN) > + return 1; > + WARN("cannot associate PCI device name \"%s\" with interface" > + " \"%s\": %s", > + addr, ctx->if_name, rte_strerror(errno)); > + return 1; > + } > + if ((size_t)ret != len + 1) { > + /* > + * Attempt to override previous partial write, no need to > + * recover if that fails. > + */ > + ret = write(ctx->pipe[1], "\n", 1); > + (void)ret; > + return 1; > + } > + fsync(ctx->pipe[1]); > + memcpy(ctx->yield, addr, len + 1); > + return 1; > +} Not to criticize style, but a few blank lines could help in readability for these files IMHO. Unless blank lines are illegal :-) > + > +/** > + * Alarm callback that regularly probes system network interfaces. > + * > + * This callback runs at a frequency determined by HYPERV_PROBE_MS as long > + * as an hyperv context instance exists. > + * > + * @param arg > + * Ignored. > + */ > +static void > +hyperv_alarm(void *arg) > +{ > + struct hyperv_ctx *ctx; > + int ret; > + > + (void)arg; > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) { > + ret = hyperv_foreach_iface(hyperv_device_probe, ctx); > + if (ret) > + break; > + } > + if (!hyperv_ctx_count) > + return; > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > + if (ret < 0) { > + ERROR("unable to reschedule alarm callback: %s", > + rte_strerror(-ret)); > + } > +} > + > +/** > + * Probe a NetVSC interface to generate a hyperv context from. > + * > + * This function instantiates hyperv contexts either for all NetVSC devices > + * found on the system or only a subset provided as device arguments. > + * > + * It is normally used with hyperv_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - const char *name: > + * Name associated with current driver instance. > + * > + * - struct rte_kvargs *kvargs: > + * Device arguments provided to current driver instance. > + * > + * - unsigned int specified: > + * Number of specific netdevices provided as device arguments. > + * > + * - unsigned int *matched: > + * The number of specified netdevices matched by this function. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +hyperv_netvsc_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + const char *name = va_arg(ap, const char *); > + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); > + unsigned int specified = va_arg(ap, unsigned int); > + unsigned int *matched = va_arg(ap, unsigned int *); > + unsigned int i; > + struct hyperv_ctx *ctx; > + uint16_t port_id; > + int ret; > + > + /* Probe all interfaces when none are specified. */ > + if (specified) { > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, HYPERV_ARG_IFACE)) { > + if (!strcmp(pair->value, iface->if_name)) > + break; > + } else if (!strcmp(pair->key, HYPERV_ARG_MAC)) { > + struct ether_addr tmp; > + > + if (ether_addr_from_str(&tmp, pair->value)) { > + ERROR("invalid MAC address format" > + " \"%s\"", > + pair->value); > + return -EINVAL; > + } > + if (!is_same_ether_addr(eth_addr, &tmp)) > + break; > + } > + } > + if (i == kvargs->count) > + return 0; > + ++(*matched); > + } > + /* Weed out interfaces already handled. */ > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) > + if (ctx->if_index == iface->if_index) > + break; > + if (ctx) { > + if (!specified) > + return 0; > + WARN("interface \"%s\" (index %u) is already handled, skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + if (!hyperv_iface_is_netvsc(iface)) { > + if (!specified) > + return 0; > + WARN("interface \"%s\" (index %u) is not NetVSC, skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + /* Create interface context. */ > + ctx = calloc(1, sizeof(*ctx)); > + if (!ctx) { > + ret = -errno; > + ERROR("cannot allocate context for interface \"%s\": %s", > + iface->if_name, rte_strerror(errno)); > + goto error; > + } > + ctx->id = hyperv_ctx_count; > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + ctx->if_index = iface->if_index; > + ctx->if_addr = *eth_addr; > + ctx->pipe[0] = -1; > + ctx->pipe[1] = -1; > + ctx->yield[0] = '\0'; > + if (pipe(ctx->pipe) == -1) { > + ret = -errno; > + ERROR("cannot allocate control pipe for interface \"%s\": %s", > + ctx->if_name, rte_strerror(errno)); > + goto error; > + } > + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { > + int flf = fcntl(ctx->pipe[i], F_GETFL); > + int fdf = fcntl(ctx->pipe[i], F_GETFD); > + > + if (flf != -1 && > + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1 && > + fdf != -1 && > + fcntl(ctx->pipe[i], F_SETFD, > + i ? fdf | FD_CLOEXEC : fdf & ~FD_CLOEXEC) != -1) > + continue; > + ret = -errno; > + ERROR("cannot toggle non-blocking or close-on-exec flags on" > + " control file descriptor #%u (%d): %s", > + i, ctx->pipe[i], rte_strerror(errno)); > + goto error; > + } > + /* Generate virtual device name and arguments. */ > + i = 0; > + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", > + name, ctx->id); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->name) - 1) > + ++i; > + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", > + ctx->name); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname) - 1) > + ++i; > + /* > + * Note: bash replaces the default sh interpreter used by popen() > + * because as seen with dash, POSIX-compliant shells do not > + * necessarily support redirections with file descriptor numbers > + * above 9. > + */ > + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), > + "exec(exec bash -c " > + "'while read -r tmp <&%u 2> /dev/null;" > + " do dev=$tmp; done;" > + " echo $dev" > + "'),dev(net_tap_%s,remote=%s)", > + ctx->pipe[0], ctx->name, ctx->if_name); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs) - 1) > + ++i; > + if (i) { > + ret = -ENOBUFS; > + ERROR("generated virtual device name or argument list too long" > + " for interface \"%s\"", ctx->if_name); > + goto error; > + } > + /* > + * Remove any competing rte_eth_dev entries sharing the same MAC > + * address, fail-safe instances created by this PMD will handle them > + * as sub-devices later. > + */ > + RTE_ETH_FOREACH_DEV(port_id) { > + struct rte_device *dev = rte_eth_devices[port_id].device; > + struct rte_bus *bus = rte_bus_find_by_device(dev); > + struct ether_addr tmp; > + > + rte_eth_macaddr_get(port_id, &tmp); > + if (!is_same_ether_addr(eth_addr, &tmp)) > + continue; > + WARN("removing device \"%s\" with identical MAC address to" > + " re-create it as a fail-safe sub-device", > + dev->name); > + if (!bus) > + ret = -EINVAL; > + else > + ret = rte_eal_hotplug_remove(bus->name, dev->name); > + if (ret < 0) { > + ERROR("unable to remove device \"%s\": %s", > + dev->name, rte_strerror(-ret)); > + goto error; > + } > + } > + /* Request virtual device generation. */ > + DEBUG("generating virtual device \"%s\" with arguments \"%s\"", > + ctx->devname, ctx->devargs); > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > + if (ret) > + goto error; > + LIST_INSERT_HEAD(&hyperv_ctx_list, ctx, entry); > + ++hyperv_ctx_count; > + DEBUG("added NetVSC interface \"%s\" to context list", ctx->if_name); > + return 0; > +error: > + if (ctx) > + hyperv_ctx_destroy(ctx); > + return ret; > +} > + > +/** > * Probe NetVSC interfaces. > * > + * This function probes system netdevices according to the specified device > + * arguments and starts a periodic alarm callback to notify the resulting > + * fail-safe PMD instances of their sub-devices whereabouts. > + * > * @param dev > * Virtual device context for PMD instance. > * > @@ -92,12 +706,38 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > const char *args = rte_vdev_device_args(dev); > struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", > hyperv_arg); > + unsigned int specified = 0; > + unsigned int matched = 0; > + unsigned int i; > + int ret; > > DEBUG("invoked as \"%s\", using arguments \"%s\"", name, args); > if (!kvargs) { > ERROR("cannot parse arguments list"); > goto error; > } > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, HYPERV_ARG_IFACE) || > + !strcmp(pair->key, HYPERV_ARG_MAC)) > + ++specified; > + } > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > + /* Gather interfaces. */ > + ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, > + specified, &matched); > + if (ret < 0) > + goto error; > + if (matched < specified) > + WARN("some of the specified parameters did not match valid" > + " network interfaces"); > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > + if (ret < 0) { > + ERROR("unable to schedule alarm callback: %s", > + rte_strerror(-ret)); > + goto error; > + } > error: > if (kvargs) > rte_kvargs_free(kvargs); > @@ -108,6 +748,9 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > /** > * Remove PMD instance. > * > + * The alarm callback and underlying hyperv context instances are only > + * destroyed after the last PMD instance is removed. > + * > * @param dev > * Virtual device context for PMD instance. > * > @@ -118,7 +761,16 @@ static int > hyperv_vdev_remove(struct rte_vdev_device *dev) > { > (void)dev; > - --hyperv_ctx_inst; > + if (--hyperv_ctx_inst) > + return 0; > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > + while (!LIST_EMPTY(&hyperv_ctx_list)) { > + struct hyperv_ctx *ctx = LIST_FIRST(&hyperv_ctx_list); > + > + LIST_REMOVE(ctx, entry); > + --hyperv_ctx_count; > + hyperv_ctx_destroy(ctx); > + } > return 0; > } > > -- > 2.11.0 Regards, Keith ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 17:04 ` Wiles, Keith @ 2017-12-18 17:59 ` Adrien Mazarguil 2017-12-18 18:43 ` Wiles, Keith 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 17:59 UTC (permalink / raw) To: Wiles, Keith; +Cc: Yigit, Ferruh, dev, Stephen Hemminger On Mon, Dec 18, 2017 at 05:04:23PM +0000, Wiles, Keith wrote: > > On Dec 18, 2017, at 10:46 AM, Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > > As described in more details in the attached documentation (see patch > > contents), this virtual device driver manages NetVSC interfaces in virtual > > machines hosted by Hyper-V/Azure platforms. > > > > This driver does not manage traffic nor Ethernet devices directly; it acts > > as a thin configuration layer that automatically instantiates and controls > > fail-safe PMD instances combining tap and PCI sub-devices, so that each > > NetVSC interface is exposed as a single consolidated port to DPDK > > applications. > > > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > > applications automatically benefit from increased throughput when present > > and automatic fallback on NetVSC otherwise without interruption thanks to > > fail-safe's hot-plug handling. > > > > Once initialized, the sole job of the hyperv driver is to regularly scan > > for PCI devices to associate with NetVSC interfaces and feed their > > addresses to corresponding fail-safe instances. > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > --- > > doc/guides/nics/hyperv.rst | 65 ++++ > > drivers/net/hyperv/Makefile | 4 + > > drivers/net/hyperv/hyperv.c | 654 ++++++++++++++++++++++++++++++++++++++- > > 3 files changed, 722 insertions(+), 1 deletion(-) <snip> > > diff --git a/drivers/net/hyperv/hyperv.c b/drivers/net/hyperv/hyperv.c > > index 2f940c76f..bad224be9 100644 > > --- a/drivers/net/hyperv/hyperv.c > > +++ b/drivers/net/hyperv/hyperv.c <snip> > > +/** > > + * Convert a MAC address string to binary form. > > + * > > + * Note: this function should be exposed by rte_ether.h as the reverse of > > + * ether_format_addr(). > > + * > > + * Several MAC string formats are supported on input for convenience: > > + * > > + * 1. "12:34:56:78:9a:bc" > > + * 2. "12-34-56-78-9a-bc" > > + * 3. "123456789abc" > > + * 4. Upper/lowercase hexadecimal. > > + * 5. Any combination of the above, e.g. "12:34-5678-9aBC". > > + * 6. Partial addresses are allowed, with low-order bytes filled first: > > + * - "5:6:78c" translates to "00:00:05:06:07:8c", > > + * - "5678c" translates to "00:00:00:05:67:8c". > > + * > > + * Non-hexadecimal characters, unknown separators and strings specifying > > + * more than 6 bytes are not allowed. > > + * > > + * @param[out] eth_addr > > + * Pointer to conversion result buffer. > > + * @param[in] str > > + * MAC address string to convert. > > + * > > + * @return > > + * 0 on success, -EINVAL in case of unsupported format. > > + */ > > +static int > > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > > +{ > > + static const uint8_t conv[0x100] = { > > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > > + ['\0'] = 0x60, > > + }; > > + uint64_t addr = 0; > > + uint64_t buf = 0; > > + unsigned int i = 0; > > + unsigned int n = 0; > > + uint8_t tmp; > > + > > + do { > > + tmp = conv[(int)*(str++)]; > > + if (!tmp) > > + return -EINVAL; > > + if (tmp & 0x40) { > > + i += (i & 1) + (!i << 1); > > + addr = (addr << (i << 2)) | buf; > > + n += i; > > + buf = 0; > > + i = 0; > > + } else { > > + buf = (buf << 4) | (tmp & 0xf); > > + ++i; > > + } > > + } while (!(tmp & 0x20)); > > + if (n > 12) > > + return -EINVAL; > > + i = RTE_DIM(eth_addr->addr_bytes); > > + while (i) { > > + eth_addr->addr_bytes[--i] = addr & 0xff; > > + addr >>= 8; > > + } > > + return 0; > > +} > > You already called this out above, why not just push this into rte_ether.h file. I know I could use it if it were public. Hehe, that was to highlight how this driver didn't require any modifications in public APIs. I planned to do just that in v2 or in a subsequent patch. <snip> > > +/** > > + * Retrieve the last component of a path. > > + * > > + * This is a simplified basename() that does not modify its input buffer to > > + * handle trailing backslashes. > > + * > > + * @param[in] path > > + * Path to retrieve the last component from. > > + * > > + * @return > > + * Pointer to the last component. > > + */ > > +static const char * > > +hyperv_basename(const char *path) > > +{ > > + const char *tmp = path; > > + > > + while (*tmp) > > + if (*(tmp++) == '/') > > + path = tmp; > > + return path; > > +} > > Why not just user rindex() to find the last ‘/‘ instead of this routine? I know it is not performance critical. Right, however both rindex() and strrchr() return NULL when no '/' is present. strchrnul() works but is GNU-specific (i.e. probably not found on BSD), I didn't want to perform an additional check for that, so actually given the size of that function I didn't give it a second thought. I can modify that if needed. <snip> > > +/** > > + * Probe a network interface to associate with hyperv context. > > + * > > + * This function determines if the network device matches the properties of > > + * the NetVSC interface associated with the hyperv context and communicates > > + * its bus address to the fail-safe PMD instance if so. > > + * > > + * It is normally used with hyperv_foreach_iface(). > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * @param[in] eth_addr > > + * MAC address associated with @p iface. > > + * @param ap > > + * Variable arguments list comprising: > > + * > > + * - struct hyperv_ctx *ctx: > > + * Context to associate network interface with. > > + * > > + * @return > > + * A nonzero value when interface matches, 0 otherwise or in case of > > + * error. > > + */ > > +static int > > +hyperv_device_probe(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap) > > +{ > > + struct hyperv_ctx *ctx = va_arg(ap, struct hyperv_ctx *); > > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > > + const char *addr; > > + size_t len; > > + int ret; > > + > > + /* Skip non-matching or unwanted NetVSC interfaces. */ > > + if (ctx->if_index == iface->if_index) { > > + if (!strcmp(ctx->if_name, iface->if_name)) > > + return 0; > > + DEBUG("NetVSC interface \"%s\" (index %u) renamed \"%s\"", > > + ctx->if_name, ctx->if_index, iface->if_name); > > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > > + return 0; > > + } > > + if (hyperv_iface_is_netvsc(iface)) > > + return 0; > > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > > + return 0; > > + /* Look for associated PCI device. */ > > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device/subsystem"); > > + if (ret) > > + return 0; > > + if (strcmp(hyperv_basename(buf), "pci")) > > + return 0; > > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device"); > > + if (ret) > > + return 0; > > + addr = hyperv_basename(buf); > > + len = strlen(addr); > > + if (!len) > > + return 0; > > + /* Send PCI device argument to fail-safe PMD instance if updated. */ > > + if (!strcmp(addr, ctx->yield)) > > + return 1; > > + DEBUG("associating PCI device \"%s\" with NetVSC interface \"%s\"" > > + " (index %u)", > > + addr, ctx->if_name, ctx->if_index); > > + memmove(buf, addr, len + 1); > > + addr = buf; > > + buf[len] = '\n'; > > + ret = write(ctx->pipe[1], addr, len + 1); > > + buf[len] = '\0'; > > + if (ret == -1) { > > + if (errno == EINTR || errno == EAGAIN) > > + return 1; > > + WARN("cannot associate PCI device name \"%s\" with interface" > > + " \"%s\": %s", > > + addr, ctx->if_name, rte_strerror(errno)); > > + return 1; > > + } > > + if ((size_t)ret != len + 1) { > > + /* > > + * Attempt to override previous partial write, no need to > > + * recover if that fails. > > + */ > > + ret = write(ctx->pipe[1], "\n", 1); > > + (void)ret; > > + return 1; > > + } > > + fsync(ctx->pipe[1]); > > + memcpy(ctx->yield, addr, len + 1); > > + return 1; > > +} > > Not to criticize style, but a few blank lines could help in readability for these files IMHO. Unless blank lines are illegal :-) It's a matter of taste, I think people tend to add random blank lines where they think doing so clarifies things for themselves, resulting in inconsistent coding style not much clearer for everyone after several iterations. As a maintainer I've grown tired of discussions related to blank lines while reviewing patches. That's why except for a few special cases, I now enforce exactly the bare minimum of one blank line between variable declarations and the rest of the code inside each block. If doing so makes a function unreadable then perhaps it needs to be split :) I'm sure you'll understand! Regards, -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 17:59 ` Adrien Mazarguil @ 2017-12-18 18:43 ` Wiles, Keith 2017-12-19 8:25 ` Nelio Laranjeiro 0 siblings, 1 reply; 112+ messages in thread From: Wiles, Keith @ 2017-12-18 18:43 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Yigit, Ferruh, dev, Stephen Hemminger > On Dec 18, 2017, at 11:59 AM, Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: >> Not to criticize style, but a few blank lines could help in readability for these files IMHO. Unless blank lines are illegal :-) > > It's a matter of taste, I think people tend to add random blank lines where > they think doing so clarifies things for themselves, resulting in > inconsistent coding style not much clearer for everyone after several > iterations. > > As a maintainer I've grown tired of discussions related to blank lines while > reviewing patches. That's why except for a few special cases, I now enforce > exactly the bare minimum of one blank line between variable declarations and > the rest of the code inside each block. > > If doing so makes a function unreadable then perhaps it needs to be split :) > I'm sure you'll understand! I do not really understand the problem as I have not seen any complaints about blank lines unless two or more in a row. I have never seen someone complain about a given blank line in a function, unless a missing one to split up the declared variables and code in a function or block of code. It is a shame you have decided to take the minimum approach to blank lines, IMO it does not make a lot of sense. I only bring it up to help others with reading your code like our customers. We do not have rule for this so I can not force anyone to add blank lines for readability, so I have to live with it. :-( > > Regards, > > -- > Adrien Mazarguil > 6WIND Regards, Keith ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 18:43 ` Wiles, Keith @ 2017-12-19 8:25 ` Nelio Laranjeiro 0 siblings, 0 replies; 112+ messages in thread From: Nelio Laranjeiro @ 2017-12-19 8:25 UTC (permalink / raw) To: Wiles, Keith; +Cc: Adrien Mazarguil, Yigit, Ferruh, dev, Stephen Hemminger Hi Keith, On Mon, Dec 18, 2017 at 06:43:35PM +0000, Wiles, Keith wrote: > > > > On Dec 18, 2017, at 11:59 AM, Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > >> Not to criticize style, but a few blank lines could help in > >> readability for these files IMHO. Unless blank lines are illegal > >> :-) > > > > It's a matter of taste, I think people tend to add random blank lines where > > they think doing so clarifies things for themselves, resulting in > > inconsistent coding style not much clearer for everyone after several > > iterations. > > > > As a maintainer I've grown tired of discussions related to blank lines while > > reviewing patches. That's why except for a few special cases, I now enforce > > exactly the bare minimum of one blank line between variable declarations and > > the rest of the code inside each block. > > > > If doing so makes a function unreadable then perhaps it needs to be split :) > > I'm sure you'll understand! > > I do not really understand the problem as I have not seen any > complaints about blank lines unless two or more in a row. I have never > seen someone complain about a given blank line in a function, unless a > missing one to split up the declared variables and code in a function > or block of code. It is true when the amount of blank lines are few and logical, but we generally see patch where in the same file we see random blank lines added without any logic, generally to easily identify where the modification are done. > It is a shame you have decided to take the minimum approach to blank > lines, IMO it does not make a lot of sense. I only bring it up to help > others with reading your code like our customers. > > We do not have rule for this so I can not force anyone to add blank > lines for readability, so I have to live with it. :-( As there is no clear rules, the best one is limiting this situation to the extreme minimal, otherwise explaining the logic behind it is very difficult as it will differ from one maintainer to another one, it will increase the amount of patches refused due to coding style issues. > > Regards, > > > > -- > > Adrien Mazarguil > > 6WIND > > Regards, > Keith > Regards, -- Nélio Laranjeiro 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil 2017-12-18 17:04 ` Wiles, Keith @ 2017-12-18 18:26 ` Stephen Hemminger 2017-12-18 20:21 ` Adrien Mazarguil 2017-12-18 18:34 ` Stephen Hemminger ` (2 subsequent siblings) 4 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 18:26 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Mon, 18 Dec 2017 17:46:23 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > +static int > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > +{ > + static const uint8_t conv[0x100] = { > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > + ['\0'] = 0x60, > + }; > + uint64_t addr = 0; > + uint64_t buf = 0; > + unsigned int i = 0; > + unsigned int n = 0; > + uint8_t tmp; > + > + do { > + tmp = conv[(int)*(str++)]; > + if (!tmp) > + return -EINVAL; > + if (tmp & 0x40) { > + i += (i & 1) + (!i << 1); > + addr = (addr << (i << 2)) | buf; > + n += i; > + buf = 0; > + i = 0; > + } else { > + buf = (buf << 4) | (tmp & 0xf); > + ++i; > + } > + } while (!(tmp & 0x20)); > + if (n > 12) > + return -EINVAL; > + i = RTE_DIM(eth_addr->addr_bytes); > + while (i) { > + eth_addr->addr_bytes[--i] = addr & 0xff; > + addr >>= 8; > + } > + return 0; > +} > + Why not ether_ntoa? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 18:26 ` Stephen Hemminger @ 2017-12-18 20:21 ` Adrien Mazarguil 2017-12-18 21:03 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 20:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, dev On Mon, Dec 18, 2017 at 10:26:29AM -0800, Stephen Hemminger wrote: > On Mon, 18 Dec 2017 17:46:23 +0100 > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > +static int > > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > > +{ > > + static const uint8_t conv[0x100] = { > > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > > + ['\0'] = 0x60, > > + }; > > + uint64_t addr = 0; > > + uint64_t buf = 0; > > + unsigned int i = 0; > > + unsigned int n = 0; > > + uint8_t tmp; > > + > > + do { > > + tmp = conv[(int)*(str++)]; > > + if (!tmp) > > + return -EINVAL; > > + if (tmp & 0x40) { > > + i += (i & 1) + (!i << 1); > > + addr = (addr << (i << 2)) | buf; > > + n += i; > > + buf = 0; > > + i = 0; > > + } else { > > + buf = (buf << 4) | (tmp & 0xf); > > + ++i; > > + } > > + } while (!(tmp & 0x20)); > > + if (n > 12) > > + return -EINVAL; > > + i = RTE_DIM(eth_addr->addr_bytes); > > + while (i) { > > + eth_addr->addr_bytes[--i] = addr & 0xff; > > + addr >>= 8; > > + } > > + return 0; > > +} > > + > > > Why not ether_ntoa? Good question. For the following reasons: - I forgot about the existence of ether_ntoa() and didn't look it up seeing struct ether_addr is (re-)defined by rte_ether.h. What happens when one includes netinet/ether.h together with that file results in various conflicts that trigger a compilation error. This problem should be addressed first. - ether_ntoa() returns a static buffer and is not reentrant, ether_ntoa_r() is but as a GNU extension, I'm not sure it exists on other OSes. Even if this driver is currently targeted at Linux, this is likely not the case for other DPDK code relying on rte_ether.h. - I had ether_addr_from_str()'s code already ready and lying around for a future update in testpmd's flow command parser. No other MAC-48 conversion function I know of is as flexible as this version. The ability to omit ":" and entering partial addresses is a big plus IMO. I think both can coexist on their own merits. Since rte_ether.h needs to be fixed either way, how about I move this function in a separate commit and address the conflict with netinet/ether.h while there? -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 20:21 ` Adrien Mazarguil @ 2017-12-18 21:03 ` Thomas Monjalon 2017-12-18 21:19 ` Stephen Hemminger 0 siblings, 1 reply; 112+ messages in thread From: Thomas Monjalon @ 2017-12-18 21:03 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: dev, Stephen Hemminger, Ferruh Yigit 18/12/2017 21:21, Adrien Mazarguil: > On Mon, Dec 18, 2017 at 10:26:29AM -0800, Stephen Hemminger wrote: > > On Mon, 18 Dec 2017 17:46:23 +0100 > > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > > > +static int > > > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > > > +{ > > > + static const uint8_t conv[0x100] = { > > > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > > > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > > > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > > > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > > > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > > > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > > > + ['\0'] = 0x60, > > > + }; > > > + uint64_t addr = 0; > > > + uint64_t buf = 0; > > > + unsigned int i = 0; > > > + unsigned int n = 0; > > > + uint8_t tmp; > > > + > > > + do { > > > + tmp = conv[(int)*(str++)]; > > > + if (!tmp) > > > + return -EINVAL; > > > + if (tmp & 0x40) { > > > + i += (i & 1) + (!i << 1); > > > + addr = (addr << (i << 2)) | buf; > > > + n += i; > > > + buf = 0; > > > + i = 0; > > > + } else { > > > + buf = (buf << 4) | (tmp & 0xf); > > > + ++i; > > > + } > > > + } while (!(tmp & 0x20)); > > > + if (n > 12) > > > + return -EINVAL; > > > + i = RTE_DIM(eth_addr->addr_bytes); > > > + while (i) { > > > + eth_addr->addr_bytes[--i] = addr & 0xff; > > > + addr >>= 8; > > > + } > > > + return 0; > > > +} > > > + > > > > > > Why not ether_ntoa? > > Good question. For the following reasons: > > - I forgot about the existence of ether_ntoa() and didn't look it up seeing > struct ether_addr is (re-)defined by rte_ether.h. What happens when one > includes netinet/ether.h together with that file results in various > conflicts that trigger a compilation error. This problem should be > addressed first. > > - ether_ntoa() returns a static buffer and is not reentrant, ether_ntoa_r() > is but as a GNU extension, I'm not sure it exists on other OSes. Even if > this driver is currently targeted at Linux, this is likely not the case > for other DPDK code relying on rte_ether.h. > > - I had ether_addr_from_str()'s code already ready and lying around for a > future update in testpmd's flow command parser. No other MAC-48 conversion > function I know of is as flexible as this version. The ability to omit ":" > and entering partial addresses is a big plus IMO. > > I think both can coexist on their own merits. Since rte_ether.h needs to be > fixed either way, how about I move this function in a separate commit and > address the conflict with netinet/ether.h while there? Looks to be a good plan. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 21:03 ` Thomas Monjalon @ 2017-12-18 21:19 ` Stephen Hemminger 0 siblings, 0 replies; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 21:19 UTC (permalink / raw) To: Thomas Monjalon; +Cc: Adrien Mazarguil, dev, Ferruh Yigit On Mon, 18 Dec 2017 22:03:55 +0100 Thomas Monjalon <thomas@monjalon.net> wrote: > > > > Good question. For the following reasons: > > > > - I forgot about the existence of ether_ntoa() and didn't look it up seeing > > struct ether_addr is (re-)defined by rte_ether.h. What happens when one > > includes netinet/ether.h together with that file results in various > > conflicts that trigger a compilation error. This problem should be > > addressed first. > > > > - ether_ntoa() returns a static buffer and is not reentrant, ether_ntoa_r() > > is but as a GNU extension, I'm not sure it exists on other OSes. Even if > > this driver is currently targeted at Linux, this is likely not the case > > for other DPDK code relying on rte_ether.h. > > > > - I had ether_addr_from_str()'s code already ready and lying around for a > > future update in testpmd's flow command parser. No other MAC-48 conversion > > function I know of is as flexible as this version. The ability to omit ":" > > and entering partial addresses is a big plus IMO. > > > > I think both can coexist on their own merits. Since rte_ether.h needs to be > > fixed either way, how about I move this function in a separate commit and > > address the conflict with netinet/ether.h while there? > > Looks to be a good plan. Agree, rte_ether is where it should go. Please put functions for parsing there. The name and logic conflict between netinet/ether.h and rte is both a blessing and a curse. Although the definitions of ether_addr overlap, they are equivalent. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil 2017-12-18 17:04 ` Wiles, Keith 2017-12-18 18:26 ` Stephen Hemminger @ 2017-12-18 18:34 ` Stephen Hemminger 2017-12-18 20:23 ` Adrien Mazarguil 2017-12-18 23:59 ` Stephen Hemminger 2017-12-19 1:54 ` Ferruh Yigit 4 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 18:34 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Mon, 18 Dec 2017 17:46:23 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > /** > + * Destroy a hyperv context instance. > + * > + * @param ctx > + * Context to destroy. > + */ > +static void > +hyperv_ctx_destroy(struct hyperv_ctx *ctx) > +{ > + if (ctx->pipe[0] != -1) > + close(ctx->pipe[0]); > + if (ctx->pipe[1] != -1) > + close(ctx->pipe[1]); > + /* Poisoning for debugging purposes. */ > + memset(ctx, 0x22, sizeof(*ctx)); Don't leave debug code in submitted drivers > + free(ctx); > +} > + > +/** > + * Iterate over system network interfaces. > + * > + * This function runs a given callback function for each netdevice found on > + * the system. > + * > + * @param func > + * Callback function pointer. List traversal is aborted when this function > + * returns a nonzero value. > + * @param ... > + * Variable parameter list passed as @p va_list to @p func. > + * > + * @return > + * 0 when the entire list is traversed successfully, a negative error code > + * in case or failure, or the nonzero value returned by @p func when list > + * traversal is aborted. > + */ > +static int > +hyperv_foreach_iface(int (*func)(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap), ...) > +{ > + struct if_nameindex *iface = if_nameindex(); > + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > + unsigned int i; > + int ret = 0; > + > + if (!iface) { > + ret = -ENOBUFS; > + ERROR("cannot retrieve system network interfaces"); > + goto error; > + } > + if (s == -1) { > + ret = -errno; > + ERROR("cannot open socket: %s", rte_strerror(errno)); > + goto error; > + } > + for (i = 0; iface[i].if_name; ++i) { > + struct ifreq req; > + struct ether_addr eth_addr; > + va_list ap; > + > + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); > + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { > + WARN("cannot retrieve information about interface" > + " \"%s\": %s", > + req.ifr_name, rte_strerror(errno)); > + continue; > + } > + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, > + RTE_DIM(eth_addr.addr_bytes)); > + va_start(ap, func); > + ret = func(&iface[i], ð_addr, ap); > + va_end(ap); > + if (ret) > + break; > + } > +error: > + if (s != -1) > + close(s); > + if (iface) > + if_freenameindex(iface); > + return ret; > +} > + > +/** > + * Determine if a network interface is NetVSC. > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * > + * @return > + * A nonzero value when interface is detected as NetVSC. In case of error, > + * rte_errno is updated and 0 returned. > + */ > +static int > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > +{ > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; Doing this snprintf is gross. Either use PATH_MAX or asprintf > + FILE *f; > + int ret; > + int len = 0; > + > + snprintf(path, sizeof(path), temp, iface->if_name); > + f = fopen(path, "r"); > + if (!f) { > + rte_errno = errno; > + return 0; > + } > + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); > + if (ret == EOF) > + rte_errno = errno; > + ret = len == (int)strlen(NETVSC_CLASS_ID); > + fclose(f); > + return ret; > +} > + > +/** > + * Retrieve the last component of a path. > + * > + * This is a simplified basename() that does not modify its input buffer to > + * handle trailing backslashes. > + * > + * @param[in] path > + * Path to retrieve the last component from. > + * > + * @return > + * Pointer to the last component. > + */ > +static const char * > +hyperv_basename(const char *path) > +{ > + const char *tmp = path; > + > + while (*tmp) > + if (*(tmp++) == '/') Too may () > + path = tmp; > + return path; > +} > + > +/** > + * Retrieve network interface data from sysfs symbolic link. > + * > + * @param[out] buf > + * Output data buffer. > + * @param size > + * Output buffer size. > + * @param[in] if_name > + * Netdevice name. > + * @param[in] relpath > + * Symbolic link path relative to netdevice sysfs entry. > + * > + * @return > + * 0 on success, a negative error code otherwise. > + */ > +static int > +hyperv_sysfs_readlink(char *buf, size_t size, const char *if_name, > + const char *relpath) > +{ > + int ret; > + > + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); > + if (ret == -1 || (size_t)ret >= size - 1) > + return -ENOBUFS; > + ret = readlink(buf, buf, size); > + if (ret == -1) > + return -errno; > + if ((size_t)ret >= size - 1) > + return -ENOBUFS; > + buf[ret] = '\0'; > + return 0; > +} > + > +/** > + * Probe a network interface to associate with hyperv context. > + * > + * This function determines if the network device matches the properties of > + * the NetVSC interface associated with the hyperv context and communicates > + * its bus address to the fail-safe PMD instance if so. > + * > + * It is normally used with hyperv_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - struct hyperv_ctx *ctx: > + * Context to associate network interface with. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +hyperv_device_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + struct hyperv_ctx *ctx = va_arg(ap, struct hyperv_ctx *); > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > + const char *addr; > + size_t len; > + int ret; > + > + /* Skip non-matching or unwanted NetVSC interfaces. */ > + if (ctx->if_index == iface->if_index) { > + if (!strcmp(ctx->if_name, iface->if_name)) > + return 0; > + DEBUG("NetVSC interface \"%s\" (index %u) renamed \"%s\"", > + ctx->if_name, ctx->if_index, iface->if_name); > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + return 0; > + } > + if (hyperv_iface_is_netvsc(iface)) > + return 0; > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > + return 0; > + /* Look for associated PCI device. */ > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device/subsystem"); > + if (ret) > + return 0; > + if (strcmp(hyperv_basename(buf), "pci")) > + return 0; > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device"); > + if (ret) > + return 0; > + addr = hyperv_basename(buf); > + len = strlen(addr); > + if (!len) > + return 0; > + /* Send PCI device argument to fail-safe PMD instance if updated. */ > + if (!strcmp(addr, ctx->yield)) > + return 1; > + DEBUG("associating PCI device \"%s\" with NetVSC interface \"%s\"" > + " (index %u)", > + addr, ctx->if_name, ctx->if_index); > + memmove(buf, addr, len + 1); > + addr = buf; > + buf[len] = '\n'; > + ret = write(ctx->pipe[1], addr, len + 1); > + buf[len] = '\0'; > + if (ret == -1) { > + if (errno == EINTR || errno == EAGAIN) > + return 1; > + WARN("cannot associate PCI device name \"%s\" with interface" > + " \"%s\": %s", > + addr, ctx->if_name, rte_strerror(errno)); > + return 1; > + } > + if ((size_t)ret != len + 1) { > + /* > + * Attempt to override previous partial write, no need to > + * recover if that fails. > + */ > + ret = write(ctx->pipe[1], "\n", 1); > + (void)ret; > + return 1; > + } > + fsync(ctx->pipe[1]); > + memcpy(ctx->yield, addr, len + 1); > + return 1; > +} > + > +/** > + * Alarm callback that regularly probes system network interfaces. > + * > + * This callback runs at a frequency determined by HYPERV_PROBE_MS as long > + * as an hyperv context instance exists. > + * > + * @param arg > + * Ignored. > + */ > +static void > +hyperv_alarm(void *arg) > +{ > + struct hyperv_ctx *ctx; > + int ret; > + > + (void)arg; I assume you are trying to suppress unused warnings. The DPDK method of doing this __rte_unused > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) { > + ret = hyperv_foreach_iface(hyperv_device_probe, ctx); > + if (ret) > + break; > + } > + if (!hyperv_ctx_count) > + return; > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > + if (ret < 0) { > + ERROR("unable to reschedule alarm callback: %s", > + rte_strerror(-ret)); > + } > +} > + > +/** > + * Probe a NetVSC interface to generate a hyperv context from. > + * > + * This function instantiates hyperv contexts either for all NetVSC devices > + * found on the system or only a subset provided as device arguments. > + * > + * It is normally used with hyperv_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - const char *name: > + * Name associated with current driver instance. > + * > + * - struct rte_kvargs *kvargs: > + * Device arguments provided to current driver instance. > + * > + * - unsigned int specified: > + * Number of specific netdevices provided as device arguments. > + * > + * - unsigned int *matched: > + * The number of specified netdevices matched by this function. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +hyperv_netvsc_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + const char *name = va_arg(ap, const char *); > + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); > + unsigned int specified = va_arg(ap, unsigned int); > + unsigned int *matched = va_arg(ap, unsigned int *); > + unsigned int i; > + struct hyperv_ctx *ctx; > + uint16_t port_id; > + int ret; > + > + /* Probe all interfaces when none are specified. */ > + if (specified) { > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, HYPERV_ARG_IFACE)) { > + if (!strcmp(pair->value, iface->if_name)) > + break; > + } else if (!strcmp(pair->key, HYPERV_ARG_MAC)) { > + struct ether_addr tmp; > + > + if (ether_addr_from_str(&tmp, pair->value)) { > + ERROR("invalid MAC address format" > + " \"%s\"", > + pair->value); > + return -EINVAL; > + } > + if (!is_same_ether_addr(eth_addr, &tmp)) > + break; > + } > + } > + if (i == kvargs->count) > + return 0; > + ++(*matched); > + } > + /* Weed out interfaces already handled. */ > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) > + if (ctx->if_index == iface->if_index) > + break; > + if (ctx) { > + if (!specified) > + return 0; > + WARN("interface \"%s\" (index %u) is already handled, skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + if (!hyperv_iface_is_netvsc(iface)) { > + if (!specified) > + return 0; > + WARN("interface \"%s\" (index %u) is not NetVSC, skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + /* Create interface context. */ > + ctx = calloc(1, sizeof(*ctx)); > + if (!ctx) { > + ret = -errno; > + ERROR("cannot allocate context for interface \"%s\": %s", > + iface->if_name, rte_strerror(errno)); > + goto error; > + } > + ctx->id = hyperv_ctx_count; > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + ctx->if_index = iface->if_index; > + ctx->if_addr = *eth_addr; > + ctx->pipe[0] = -1; > + ctx->pipe[1] = -1; > + ctx->yield[0] = '\0'; > + if (pipe(ctx->pipe) == -1) { > + ret = -errno; > + ERROR("cannot allocate control pipe for interface \"%s\": %s", > + ctx->if_name, rte_strerror(errno)); > + goto error; > + } > + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { > + int flf = fcntl(ctx->pipe[i], F_GETFL); > + int fdf = fcntl(ctx->pipe[i], F_GETFD); > + > + if (flf != -1 && > + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1 && > + fdf != -1 && > + fcntl(ctx->pipe[i], F_SETFD, > + i ? fdf | FD_CLOEXEC : fdf & ~FD_CLOEXEC) != -1) > + continue; > + ret = -errno; > + ERROR("cannot toggle non-blocking or close-on-exec flags on" > + " control file descriptor #%u (%d): %s", > + i, ctx->pipe[i], rte_strerror(errno)); > + goto error; > + } > + /* Generate virtual device name and arguments. */ > + i = 0; > + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", > + name, ctx->id); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->name) - 1) > + ++i; > + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", > + ctx->name); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname) - 1) > + ++i; > + /* > + * Note: bash replaces the default sh interpreter used by popen() > + * because as seen with dash, POSIX-compliant shells do not > + * necessarily support redirections with file descriptor numbers > + * above 9. > + */ > + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), > + "exec(exec bash -c " > + "'while read -r tmp <&%u 2> /dev/null;" > + " do dev=$tmp; done;" > + " echo $dev" > + "'),dev(net_tap_%s,remote=%s)", > + ctx->pipe[0], ctx->name, ctx->if_name); Write real code. Shelling out to bash is messy, error prone and potential security issue. > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs) - 1) > + ++i; > + if (i) { > + ret = -ENOBUFS; > + ERROR("generated virtual device name or argument list too long" > + " for interface \"%s\"", ctx->if_name); > + goto error; > + } > + /* > + * Remove any competing rte_eth_dev entries sharing the same MAC > + * address, fail-safe instances created by this PMD will handle them > + * as sub-devices later. > + */ > + RTE_ETH_FOREACH_DEV(port_id) { > + struct rte_device *dev = rte_eth_devices[port_id].device; > + struct rte_bus *bus = rte_bus_find_by_device(dev); > + struct ether_addr tmp; > + > + rte_eth_macaddr_get(port_id, &tmp); > + if (!is_same_ether_addr(eth_addr, &tmp)) > + continue; > + WARN("removing device \"%s\" with identical MAC address to" > + " re-create it as a fail-safe sub-device", > + dev->name); > + if (!bus) > + ret = -EINVAL; > + else > + ret = rte_eal_hotplug_remove(bus->name, dev->name); > + if (ret < 0) { > + ERROR("unable to remove device \"%s\": %s", > + dev->name, rte_strerror(-ret)); > + goto error; > + } > + } > + /* Request virtual device generation. */ > + DEBUG("generating virtual device \"%s\" with arguments \"%s\"", > + ctx->devname, ctx->devargs); > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > + if (ret) > + goto error; > + LIST_INSERT_HEAD(&hyperv_ctx_list, ctx, entry); > + ++hyperv_ctx_count; > + DEBUG("added NetVSC interface \"%s\" to context list", ctx->if_name); > + return 0; > +error: > + if (ctx) > + hyperv_ctx_destroy(ctx); > + return ret; > +} > + > +/** > * Probe NetVSC interfaces. > * > + * This function probes system netdevices according to the specified device > + * arguments and starts a periodic alarm callback to notify the resulting > + * fail-safe PMD instances of their sub-devices whereabouts. > + * > * @param dev > * Virtual device context for PMD instance. > * > @@ -92,12 +706,38 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > const char *args = rte_vdev_device_args(dev); > struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", > hyperv_arg); > + unsigned int specified = 0; > + unsigned int matched = 0; > + unsigned int i; > + int ret; > > DEBUG("invoked as \"%s\", using arguments \"%s\"", name, args); > if (!kvargs) { > ERROR("cannot parse arguments list"); > goto error; > } > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, HYPERV_ARG_IFACE) || > + !strcmp(pair->key, HYPERV_ARG_MAC)) > + ++specified; > + } > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > + /* Gather interfaces. */ > + ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, > + specified, &matched); > + if (ret < 0) > + goto error; > + if (matched < specified) > + WARN("some of the specified parameters did not match valid" > + " network interfaces"); > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > + if (ret < 0) { > + ERROR("unable to schedule alarm callback: %s", > + rte_strerror(-ret)); > + goto error; > + } > error: > if (kvargs) > rte_kvargs_free(kvargs); > @@ -108,6 +748,9 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > /** > * Remove PMD instance. > * > + * The alarm callback and underlying hyperv context instances are only > + * destroyed after the last PMD instance is removed. > + * > * @param dev > * Virtual device context for PMD instance. > * > @@ -118,7 +761,16 @@ static int > hyperv_vdev_remove(struct rte_vdev_device *dev) > { > (void)dev; > - --hyperv_ctx_inst; > + if (--hyperv_ctx_inst) > + return 0; > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > + while (!LIST_EMPTY(&hyperv_ctx_list)) { > + struct hyperv_ctx *ctx = LIST_FIRST(&hyperv_ctx_list); > + > + LIST_REMOVE(ctx, entry); > + --hyperv_ctx_count; > + hyperv_ctx_destroy(ctx); > + } > return 0; > } > ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 18:34 ` Stephen Hemminger @ 2017-12-18 20:23 ` Adrien Mazarguil 2017-12-19 9:53 ` Bruce Richardson 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 20:23 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, dev On Mon, Dec 18, 2017 at 10:34:12AM -0800, Stephen Hemminger wrote: > On Mon, 18 Dec 2017 17:46:23 +0100 > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > > > /** > > + * Destroy a hyperv context instance. > > + * > > + * @param ctx > > + * Context to destroy. > > + */ > > +static void > > +hyperv_ctx_destroy(struct hyperv_ctx *ctx) > > +{ > > + if (ctx->pipe[0] != -1) > > + close(ctx->pipe[0]); > > + if (ctx->pipe[1] != -1) > > + close(ctx->pipe[1]); > > + /* Poisoning for debugging purposes. */ > > + memset(ctx, 0x22, sizeof(*ctx)); > > Don't leave debug code in submitted drivers Granted this line should be behind #ifdef RTE_LIBRTE_HYPERV_DEBUG. Surely you don't mean *no* debugging code at all? This memset() allows an application to crash early in case its control path parallelizes things it shouldn't. > > > + free(ctx); > > +} > > + > > +/** > > + * Iterate over system network interfaces. > > + * > > + * This function runs a given callback function for each netdevice found on > > + * the system. > > + * > > + * @param func > > + * Callback function pointer. List traversal is aborted when this function > > + * returns a nonzero value. > > + * @param ... > > + * Variable parameter list passed as @p va_list to @p func. > > + * > > + * @return > > + * 0 when the entire list is traversed successfully, a negative error code > > + * in case or failure, or the nonzero value returned by @p func when list > > + * traversal is aborted. > > + */ > > +static int > > +hyperv_foreach_iface(int (*func)(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap), ...) > > +{ > > + struct if_nameindex *iface = if_nameindex(); > > + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > > + unsigned int i; > > + int ret = 0; > > + > > + if (!iface) { > > + ret = -ENOBUFS; > > + ERROR("cannot retrieve system network interfaces"); > > + goto error; > > + } > > + if (s == -1) { > > + ret = -errno; > > + ERROR("cannot open socket: %s", rte_strerror(errno)); > > + goto error; > > + } > > + for (i = 0; iface[i].if_name; ++i) { > > + struct ifreq req; > > + struct ether_addr eth_addr; > > + va_list ap; > > + > > + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); > > + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { > > + WARN("cannot retrieve information about interface" > > + " \"%s\": %s", > > + req.ifr_name, rte_strerror(errno)); > > + continue; > > + } > > + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, > > + RTE_DIM(eth_addr.addr_bytes)); > > + va_start(ap, func); > > + ret = func(&iface[i], ð_addr, ap); > > + va_end(ap); > > + if (ret) > > + break; > > + } > > +error: > > + if (s != -1) > > + close(s); > > + if (iface) > > + if_freenameindex(iface); > > + return ret; > > +} > > + > > +/** > > + * Determine if a network interface is NetVSC. > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * > > + * @return > > + * A nonzero value when interface is detected as NetVSC. In case of error, > > + * rte_errno is updated and 0 returned. > > + */ > > +static int > > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > > +{ > > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; > > Doing this snprintf is gross. Either use PATH_MAX or asprintf I don't think allocating more stack space than necessary or on the heap with a possible allocation failure to deal with is any better, sorry. Prove this snprintf() call can fail and you'll have a point. > > + FILE *f; > > + int ret; > > + int len = 0; > > + > > + snprintf(path, sizeof(path), temp, iface->if_name); > > + f = fopen(path, "r"); > > + if (!f) { > > + rte_errno = errno; > > + return 0; > > + } > > + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); > > + if (ret == EOF) > > + rte_errno = errno; > > + ret = len == (int)strlen(NETVSC_CLASS_ID); > > + fclose(f); > > + return ret; > > +} > > + > > +/** > > + * Retrieve the last component of a path. > > + * > > + * This is a simplified basename() that does not modify its input buffer to > > + * handle trailing backslashes. > > + * > > + * @param[in] path > > + * Path to retrieve the last component from. > > + * > > + * @return > > + * Pointer to the last component. > > + */ > > +static const char * > > +hyperv_basename(const char *path) > > +{ > > + const char *tmp = path; > > + > > + while (*tmp) > > + if (*(tmp++) == '/') > > Too may () Will remove it, I'm considering using strrchr() in the caller and remove this function entirely following Keith's comment. > > > + path = tmp; > > + return path; > > +} > > + > > +/** > > + * Retrieve network interface data from sysfs symbolic link. > > + * > > + * @param[out] buf > > + * Output data buffer. > > + * @param size > > + * Output buffer size. > > + * @param[in] if_name > > + * Netdevice name. > > + * @param[in] relpath > > + * Symbolic link path relative to netdevice sysfs entry. > > + * > > + * @return > > + * 0 on success, a negative error code otherwise. > > + */ > > +static int > > +hyperv_sysfs_readlink(char *buf, size_t size, const char *if_name, > > + const char *relpath) > > +{ > > + int ret; > > + > > + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); > > + if (ret == -1 || (size_t)ret >= size - 1) > > + return -ENOBUFS; > > + ret = readlink(buf, buf, size); > > + if (ret == -1) > > + return -errno; > > + if ((size_t)ret >= size - 1) > > + return -ENOBUFS; > > + buf[ret] = '\0'; > > + return 0; > > +} > > + > > +/** > > + * Probe a network interface to associate with hyperv context. > > + * > > + * This function determines if the network device matches the properties of > > + * the NetVSC interface associated with the hyperv context and communicates > > + * its bus address to the fail-safe PMD instance if so. > > + * > > + * It is normally used with hyperv_foreach_iface(). > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * @param[in] eth_addr > > + * MAC address associated with @p iface. > > + * @param ap > > + * Variable arguments list comprising: > > + * > > + * - struct hyperv_ctx *ctx: > > + * Context to associate network interface with. > > + * > > + * @return > > + * A nonzero value when interface matches, 0 otherwise or in case of > > + * error. > > + */ > > +static int > > +hyperv_device_probe(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap) > > +{ > > + struct hyperv_ctx *ctx = va_arg(ap, struct hyperv_ctx *); > > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > > + const char *addr; > > + size_t len; > > + int ret; > > + > > + /* Skip non-matching or unwanted NetVSC interfaces. */ > > + if (ctx->if_index == iface->if_index) { > > + if (!strcmp(ctx->if_name, iface->if_name)) > > + return 0; > > + DEBUG("NetVSC interface \"%s\" (index %u) renamed \"%s\"", > > + ctx->if_name, ctx->if_index, iface->if_name); > > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > > + return 0; > > + } > > + if (hyperv_iface_is_netvsc(iface)) > > + return 0; > > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > > + return 0; > > + /* Look for associated PCI device. */ > > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device/subsystem"); > > + if (ret) > > + return 0; > > + if (strcmp(hyperv_basename(buf), "pci")) > > + return 0; > > + ret = hyperv_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device"); > > + if (ret) > > + return 0; > > + addr = hyperv_basename(buf); > > + len = strlen(addr); > > + if (!len) > > + return 0; > > + /* Send PCI device argument to fail-safe PMD instance if updated. */ > > + if (!strcmp(addr, ctx->yield)) > > + return 1; > > + DEBUG("associating PCI device \"%s\" with NetVSC interface \"%s\"" > > + " (index %u)", > > + addr, ctx->if_name, ctx->if_index); > > + memmove(buf, addr, len + 1); > > + addr = buf; > > + buf[len] = '\n'; > > + ret = write(ctx->pipe[1], addr, len + 1); > > + buf[len] = '\0'; > > + if (ret == -1) { > > + if (errno == EINTR || errno == EAGAIN) > > + return 1; > > + WARN("cannot associate PCI device name \"%s\" with interface" > > + " \"%s\": %s", > > + addr, ctx->if_name, rte_strerror(errno)); > > + return 1; > > + } > > + if ((size_t)ret != len + 1) { > > + /* > > + * Attempt to override previous partial write, no need to > > + * recover if that fails. > > + */ > > + ret = write(ctx->pipe[1], "\n", 1); > > + (void)ret; > > + return 1; > > + } > > + fsync(ctx->pipe[1]); > > + memcpy(ctx->yield, addr, len + 1); > > + return 1; > > +} > > + > > +/** > > + * Alarm callback that regularly probes system network interfaces. > > + * > > + * This callback runs at a frequency determined by HYPERV_PROBE_MS as long > > + * as an hyperv context instance exists. > > + * > > + * @param arg > > + * Ignored. > > + */ > > +static void > > +hyperv_alarm(void *arg) > > +{ > > + struct hyperv_ctx *ctx; > > + int ret; > > + > > + (void)arg; > > I assume you are trying to suppress unused warnings. > The DPDK method of doing this __rte_unused This syntax is the standard method for suppressing such warnings, __rte_unused relies on a GNU syntax extension for that, and I usually tend to favor standard forms when they exist. Given DPDK coding rules don't say anything about this, I don't mind to update it if you really insist. > > > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) { > > + ret = hyperv_foreach_iface(hyperv_device_probe, ctx); > > + if (ret) > > + break; > > + } > > + if (!hyperv_ctx_count) > > + return; > > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > > + if (ret < 0) { > > + ERROR("unable to reschedule alarm callback: %s", > > + rte_strerror(-ret)); > > + } > > +} > > + > > +/** > > + * Probe a NetVSC interface to generate a hyperv context from. > > + * > > + * This function instantiates hyperv contexts either for all NetVSC devices > > + * found on the system or only a subset provided as device arguments. > > + * > > + * It is normally used with hyperv_foreach_iface(). > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * @param[in] eth_addr > > + * MAC address associated with @p iface. > > + * @param ap > > + * Variable arguments list comprising: > > + * > > + * - const char *name: > > + * Name associated with current driver instance. > > + * > > + * - struct rte_kvargs *kvargs: > > + * Device arguments provided to current driver instance. > > + * > > + * - unsigned int specified: > > + * Number of specific netdevices provided as device arguments. > > + * > > + * - unsigned int *matched: > > + * The number of specified netdevices matched by this function. > > + * > > + * @return > > + * A nonzero value when interface matches, 0 otherwise or in case of > > + * error. > > + */ > > +static int > > +hyperv_netvsc_probe(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap) > > +{ > > + const char *name = va_arg(ap, const char *); > > + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); > > + unsigned int specified = va_arg(ap, unsigned int); > > + unsigned int *matched = va_arg(ap, unsigned int *); > > + unsigned int i; > > + struct hyperv_ctx *ctx; > > + uint16_t port_id; > > + int ret; > > + > > + /* Probe all interfaces when none are specified. */ > > + if (specified) { > > + for (i = 0; i != kvargs->count; ++i) { > > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > > + > > + if (!strcmp(pair->key, HYPERV_ARG_IFACE)) { > > + if (!strcmp(pair->value, iface->if_name)) > > + break; > > + } else if (!strcmp(pair->key, HYPERV_ARG_MAC)) { > > + struct ether_addr tmp; > > + > > + if (ether_addr_from_str(&tmp, pair->value)) { > > + ERROR("invalid MAC address format" > > + " \"%s\"", > > + pair->value); > > + return -EINVAL; > > + } > > + if (!is_same_ether_addr(eth_addr, &tmp)) > > + break; > > + } > > + } > > + if (i == kvargs->count) > > + return 0; > > + ++(*matched); > > + } > > + /* Weed out interfaces already handled. */ > > + LIST_FOREACH(ctx, &hyperv_ctx_list, entry) > > + if (ctx->if_index == iface->if_index) > > + break; > > + if (ctx) { > > + if (!specified) > > + return 0; > > + WARN("interface \"%s\" (index %u) is already handled, skipping", > > + iface->if_name, iface->if_index); > > + return 0; > > + } > > + if (!hyperv_iface_is_netvsc(iface)) { > > + if (!specified) > > + return 0; > > + WARN("interface \"%s\" (index %u) is not NetVSC, skipping", > > + iface->if_name, iface->if_index); > > + return 0; > > + } > > + /* Create interface context. */ > > + ctx = calloc(1, sizeof(*ctx)); > > + if (!ctx) { > > + ret = -errno; > > + ERROR("cannot allocate context for interface \"%s\": %s", > > + iface->if_name, rte_strerror(errno)); > > + goto error; > > + } > > + ctx->id = hyperv_ctx_count; > > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > > + ctx->if_index = iface->if_index; > > + ctx->if_addr = *eth_addr; > > + ctx->pipe[0] = -1; > > + ctx->pipe[1] = -1; > > + ctx->yield[0] = '\0'; > > + if (pipe(ctx->pipe) == -1) { > > + ret = -errno; > > + ERROR("cannot allocate control pipe for interface \"%s\": %s", > > + ctx->if_name, rte_strerror(errno)); > > + goto error; > > + } > > + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { > > + int flf = fcntl(ctx->pipe[i], F_GETFL); > > + int fdf = fcntl(ctx->pipe[i], F_GETFD); > > + > > + if (flf != -1 && > > + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1 && > > + fdf != -1 && > > + fcntl(ctx->pipe[i], F_SETFD, > > + i ? fdf | FD_CLOEXEC : fdf & ~FD_CLOEXEC) != -1) > > + continue; > > + ret = -errno; > > + ERROR("cannot toggle non-blocking or close-on-exec flags on" > > + " control file descriptor #%u (%d): %s", > > + i, ctx->pipe[i], rte_strerror(errno)); > > + goto error; > > + } > > + /* Generate virtual device name and arguments. */ > > + i = 0; > > + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", > > + name, ctx->id); > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->name) - 1) > > + ++i; > > + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", > > + ctx->name); > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname) - 1) > > + ++i; > > + /* > > + * Note: bash replaces the default sh interpreter used by popen() > > + * because as seen with dash, POSIX-compliant shells do not > > + * necessarily support redirections with file descriptor numbers > > + * above 9. > > + */ > > + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), > > + "exec(exec bash -c " > > + "'while read -r tmp <&%u 2> /dev/null;" > > + " do dev=$tmp; done;" > > + " echo $dev" > > + "'),dev(net_tap_%s,remote=%s)", > > + ctx->pipe[0], ctx->name, ctx->if_name); > > > Write real code. Shelling out to bash is messy, error prone and potential > security issue. Right, this code brings the basic idea. I forgot to mention it in the cover letter, I plan a subsequent commit in fail-safe PMD to add file descriptors as a possible control means in addition to its exec() parameter. > > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs) - 1) > > + ++i; > > + if (i) { > > + ret = -ENOBUFS; > > + ERROR("generated virtual device name or argument list too long" > > + " for interface \"%s\"", ctx->if_name); > > + goto error; > > + } > > + /* > > + * Remove any competing rte_eth_dev entries sharing the same MAC > > + * address, fail-safe instances created by this PMD will handle them > > + * as sub-devices later. > > + */ > > + RTE_ETH_FOREACH_DEV(port_id) { > > + struct rte_device *dev = rte_eth_devices[port_id].device; > > + struct rte_bus *bus = rte_bus_find_by_device(dev); > > + struct ether_addr tmp; > > + > > + rte_eth_macaddr_get(port_id, &tmp); > > + if (!is_same_ether_addr(eth_addr, &tmp)) > > + continue; > > + WARN("removing device \"%s\" with identical MAC address to" > > + " re-create it as a fail-safe sub-device", > > + dev->name); > > + if (!bus) > > + ret = -EINVAL; > > + else > > + ret = rte_eal_hotplug_remove(bus->name, dev->name); > > + if (ret < 0) { > > + ERROR("unable to remove device \"%s\": %s", > > + dev->name, rte_strerror(-ret)); > > + goto error; > > + } > > + } > > + /* Request virtual device generation. */ > > + DEBUG("generating virtual device \"%s\" with arguments \"%s\"", > > + ctx->devname, ctx->devargs); > > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > > + if (ret) > > + goto error; > > + LIST_INSERT_HEAD(&hyperv_ctx_list, ctx, entry); > > + ++hyperv_ctx_count; > > + DEBUG("added NetVSC interface \"%s\" to context list", ctx->if_name); > > + return 0; > > +error: > > + if (ctx) > > + hyperv_ctx_destroy(ctx); > > + return ret; > > +} > > + > > +/** > > * Probe NetVSC interfaces. > > * > > + * This function probes system netdevices according to the specified device > > + * arguments and starts a periodic alarm callback to notify the resulting > > + * fail-safe PMD instances of their sub-devices whereabouts. > > + * > > * @param dev > > * Virtual device context for PMD instance. > > * > > @@ -92,12 +706,38 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > > const char *args = rte_vdev_device_args(dev); > > struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", > > hyperv_arg); > > + unsigned int specified = 0; > > + unsigned int matched = 0; > > + unsigned int i; > > + int ret; > > > > DEBUG("invoked as \"%s\", using arguments \"%s\"", name, args); > > if (!kvargs) { > > ERROR("cannot parse arguments list"); > > goto error; > > } > > + for (i = 0; i != kvargs->count; ++i) { > > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > > + > > + if (!strcmp(pair->key, HYPERV_ARG_IFACE) || > > + !strcmp(pair->key, HYPERV_ARG_MAC)) > > + ++specified; > > + } > > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > > + /* Gather interfaces. */ > > + ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, > > + specified, &matched); > > + if (ret < 0) > > + goto error; > > + if (matched < specified) > > + WARN("some of the specified parameters did not match valid" > > + " network interfaces"); > > + ret = rte_eal_alarm_set(HYPERV_PROBE_MS * 1000, hyperv_alarm, NULL); > > + if (ret < 0) { > > + ERROR("unable to schedule alarm callback: %s", > > + rte_strerror(-ret)); > > + goto error; > > + } > > error: > > if (kvargs) > > rte_kvargs_free(kvargs); > > @@ -108,6 +748,9 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) > > /** > > * Remove PMD instance. > > * > > + * The alarm callback and underlying hyperv context instances are only > > + * destroyed after the last PMD instance is removed. > > + * > > * @param dev > > * Virtual device context for PMD instance. > > * > > @@ -118,7 +761,16 @@ static int > > hyperv_vdev_remove(struct rte_vdev_device *dev) > > { > > (void)dev; > > - --hyperv_ctx_inst; > > + if (--hyperv_ctx_inst) > > + return 0; > > + rte_eal_alarm_cancel(hyperv_alarm, NULL); > > + while (!LIST_EMPTY(&hyperv_ctx_list)) { > > + struct hyperv_ctx *ctx = LIST_FIRST(&hyperv_ctx_list); > > + > > + LIST_REMOVE(ctx, entry); > > + --hyperv_ctx_count; > > + hyperv_ctx_destroy(ctx); > > + } > > return 0; > > } > > > In any case, thanks for the quick review! -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 20:23 ` Adrien Mazarguil @ 2017-12-19 9:53 ` Bruce Richardson 2017-12-19 10:15 ` Adrien Mazarguil 0 siblings, 1 reply; 112+ messages in thread From: Bruce Richardson @ 2017-12-19 9:53 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Stephen Hemminger, Ferruh Yigit, dev On Mon, Dec 18, 2017 at 09:23:41PM +0100, Adrien Mazarguil wrote: > On Mon, Dec 18, 2017 at 10:34:12AM -0800, Stephen Hemminger wrote: > > On Mon, 18 Dec 2017 17:46:23 +0100 > > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > <snip> > > > +static int > > > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > > > +{ > > > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > > > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; > > > > Doing this snprintf is gross. Either use PATH_MAX or asprintf > > I don't think allocating more stack space than necessary or on the heap with > a possible allocation failure to deal with is any better, sorry. > > Prove this snprintf() call can fail and you'll have a point. > While I get your point, I'd tend to go with Stephen's view on this that it's looking a bit "gross". What's the problem with allocating a bit more stack space for it? /Bruce ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 9:53 ` Bruce Richardson @ 2017-12-19 10:15 ` Adrien Mazarguil 2017-12-19 15:31 ` Stephen Hemminger 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-19 10:15 UTC (permalink / raw) To: Bruce Richardson; +Cc: Stephen Hemminger, Ferruh Yigit, dev On Tue, Dec 19, 2017 at 09:53:27AM +0000, Bruce Richardson wrote: > On Mon, Dec 18, 2017 at 09:23:41PM +0100, Adrien Mazarguil wrote: > > On Mon, Dec 18, 2017 at 10:34:12AM -0800, Stephen Hemminger wrote: > > > On Mon, 18 Dec 2017 17:46:23 +0100 > > > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > > <snip> > > > > +static int > > > > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > > > > +{ > > > > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > > > > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; > > > > > > Doing this snprintf is gross. Either use PATH_MAX or asprintf > > > > I don't think allocating more stack space than necessary or on the heap with > > a possible allocation failure to deal with is any better, sorry. > > > > Prove this snprintf() call can fail and you'll have a point. > > > While I get your point, I'd tend to go with Stephen's view on this that > it's looking a bit "gross". What's the problem with allocating a bit > more stack space for it? Well, apart from making a stand, none really. Too "unusual" perhaps, but I don't think "gross" is a valid argument to reject a perfectly valid piece of code that doesn't rely on obscure knowledge nor weird side effects. I'll update this in v2 to make it look more acceptable in any case. -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 10:15 ` Adrien Mazarguil @ 2017-12-19 15:31 ` Stephen Hemminger 0 siblings, 0 replies; 112+ messages in thread From: Stephen Hemminger @ 2017-12-19 15:31 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Bruce Richardson, Ferruh Yigit, dev On Tue, 19 Dec 2017 11:15:38 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > On Tue, Dec 19, 2017 at 09:53:27AM +0000, Bruce Richardson wrote: > > On Mon, Dec 18, 2017 at 09:23:41PM +0100, Adrien Mazarguil wrote: > > > On Mon, Dec 18, 2017 at 10:34:12AM -0800, Stephen Hemminger wrote: > > > > On Mon, 18 Dec 2017 17:46:23 +0100 > > > > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > > > > <snip> > > > > > +static int > > > > > +hyperv_iface_is_netvsc(const struct if_nameindex *iface) > > > > > +{ > > > > > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > > > > > + char path[snprintf(NULL, 0, temp, iface->if_name) + 1]; > > > > > > > > Doing this snprintf is gross. Either use PATH_MAX or asprintf > > > > > > I don't think allocating more stack space than necessary or on the heap with > > > a possible allocation failure to deal with is any better, sorry. > > > > > > Prove this snprintf() call can fail and you'll have a point. > > > > > While I get your point, I'd tend to go with Stephen's view on this that > > it's looking a bit "gross". What's the problem with allocating a bit > > more stack space for it? > > Well, apart from making a stand, none really. Too "unusual" perhaps, but I > don't think "gross" is a valid argument to reject a perfectly valid piece of > code that doesn't rely on obscure knowledge nor weird side effects. > > I'll update this in v2 to make it look more acceptable in any case. > In this particular case, you can easily show that the maximum length of the string would be less than the format plus maximum length of interface name. Why not: char path[sizeof(temp) + IFNAMSIZ]; which keeps the flexibility but also can be evaluated at compile time. Upleveling. You need to understand that open source software is a collabrative effort. And like doing improvisational theatre, the best answer to any feedback is yes unless there is a technical reason otherwise. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil ` (2 preceding siblings ...) 2017-12-18 18:34 ` Stephen Hemminger @ 2017-12-18 23:59 ` Stephen Hemminger 2017-12-19 10:01 ` Adrien Mazarguil 2017-12-19 1:54 ` Ferruh Yigit 4 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 23:59 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Mon, 18 Dec 2017 17:46:23 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > +static int > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > +{ > + static const uint8_t conv[0x100] = { > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > + ['\0'] = 0x60, > + }; > + uint64_t addr = 0; > + uint64_t buf = 0; > + unsigned int i = 0; > + unsigned int n = 0; > + uint8_t tmp; > + > + do { > + tmp = conv[(int)*(str++)]; Cast to int will cause out of bounds reference on non-ascii strings. The parser will get confused by: 001:aa:bb:cc:dd:ee:ff or invalid strings. Why not use sscanf which would be safer in this case. /** * Parse 48bits Ethernet address in pattern xx:xx:xx:xx:xx:xx. * * @param eth_addr * A pointer to a ether_addr structure. * @param str * A pointer to string contains the formatted MAC address. * @return * 0 if the address is valid * -EINVAL if address is not formatted properly */ static inline int ether_parse_addr(struct ether_addr *eth_addr, const char *str) { int n; n = sscanf(str, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", ð_addr->addr_bytes[0], ð_addr->addr_bytes[1], ð_addr->addr_bytes[2], ð_addr->addr_bytes[3], ð_addr->addr_bytes[4], ð_addr->addr_bytes[5]); return (n == ETHER_ADDR_LEN) ? 0 : -EINVAL; } > + if (!tmp) > + return -EINVAL; > + if (tmp & 0x40) { > + i += (i & 1) + (!i << 1); > + addr = (addr << (i << 2)) | buf; > + n += i; > + buf = 0; > + i = 0; > + } else { > + buf = (buf << 4) | (tmp & 0xf); > + ++i; > + } > + } while (!(tmp & 0x20)); > + if (n > 12) > + return -EINVAL; > + i = RTE_DIM(eth_addr->addr_bytes); > + while (i) { > + eth_addr->addr_bytes[--i] = addr & 0xff; > + addr >>= 8; > + } > + return 0; > +} > + ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 23:59 ` Stephen Hemminger @ 2017-12-19 10:01 ` Adrien Mazarguil 2017-12-19 15:37 ` Stephen Hemminger 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-19 10:01 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, dev On Mon, Dec 18, 2017 at 03:59:46PM -0800, Stephen Hemminger wrote: > On Mon, 18 Dec 2017 17:46:23 +0100 > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > +static int > > +ether_addr_from_str(struct ether_addr *eth_addr, const char *str) > > +{ > > + static const uint8_t conv[0x100] = { > > + ['0'] = 0x80, ['1'] = 0x81, ['2'] = 0x82, ['3'] = 0x83, > > + ['4'] = 0x84, ['5'] = 0x85, ['6'] = 0x86, ['7'] = 0x87, > > + ['8'] = 0x88, ['9'] = 0x89, ['a'] = 0x8a, ['b'] = 0x8b, > > + ['c'] = 0x8c, ['d'] = 0x8d, ['e'] = 0x8e, ['f'] = 0x8f, > > + ['A'] = 0x8a, ['B'] = 0x8b, ['C'] = 0x8c, ['D'] = 0x8d, > > + ['E'] = 0x8e, ['F'] = 0x8f, [':'] = 0x40, ['-'] = 0x40, > > + ['\0'] = 0x60, > > + }; > > + uint64_t addr = 0; > > + uint64_t buf = 0; > > + unsigned int i = 0; > > + unsigned int n = 0; > > + uint8_t tmp; > > + > > + do { > > + tmp = conv[(int)*(str++)]; > > Cast to int will cause out of bounds reference on non-ascii strings. > The parser will get confused by: > 001:aa:bb:cc:dd:ee:ff or invalid strings. Nice catch! I added the (int) cast to shut up a GCC complaint about using char as index type. The proper fix taking care of integer conversion and array bounds safety check should read: tmp = conv[*str++ & 0xffu]; > Why not use sscanf which would be safer in this case. Right, this is indeed the obvious implementation, however not only the fixed MAC-48 format is not the most convenient to use for user input (somewhat like forcing them to enter fully expanded IPv6 addresses every time), sscanf() also ignores leading white spaces and successfully parses weird expressions like " -42: 0x66: 0af: 0: 44:-6", which I think is a problem. > /** > * Parse 48bits Ethernet address in pattern xx:xx:xx:xx:xx:xx. > * > * @param eth_addr > * A pointer to a ether_addr structure. > * @param str > * A pointer to string contains the formatted MAC address. > * @return > * 0 if the address is valid > * -EINVAL if address is not formatted properly > */ > static inline int > ether_parse_addr(struct ether_addr *eth_addr, const char *str) > { > int n; > > n = sscanf(str, > "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", > ð_addr->addr_bytes[0], > ð_addr->addr_bytes[1], > ð_addr->addr_bytes[2], > ð_addr->addr_bytes[3], > ð_addr->addr_bytes[4], > ð_addr->addr_bytes[5]); > return (n == ETHER_ADDR_LEN) ? 0 : -EINVAL; > } > > > + if (!tmp) > > + return -EINVAL; > > + if (tmp & 0x40) { > > + i += (i & 1) + (!i << 1); > > + addr = (addr << (i << 2)) | buf; > > + n += i; > > + buf = 0; > > + i = 0; > > + } else { > > + buf = (buf << 4) | (tmp & 0xf); > > + ++i; > > + } > > + } while (!(tmp & 0x20)); > > + if (n > 12) > > + return -EINVAL; > > + i = RTE_DIM(eth_addr->addr_bytes); > > + while (i) { > > + eth_addr->addr_bytes[--i] = addr & 0xff; > > + addr >>= 8; > > + } > > + return 0; > > +} > > + -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 10:01 ` Adrien Mazarguil @ 2017-12-19 15:37 ` Stephen Hemminger 0 siblings, 0 replies; 112+ messages in thread From: Stephen Hemminger @ 2017-12-19 15:37 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Tue, 19 Dec 2017 11:01:55 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > Why not use sscanf which would be safer in this case. > > Right, this is indeed the obvious implementation, however not only the fixed > MAC-48 format is not the most convenient to use for user input (somewhat > like forcing them to enter fully expanded IPv6 addresses every time), > sscanf() also ignores leading white spaces and successfully parses weird > expressions like " -42: 0x66: 0af: 0: 44:-6", which I think is a > problem. There is a standard for ethernet representation, that is all you need to accept. The only simplifications are optional leading zeros 02 vs 2 and upper and lower case a-f. Don't overthink this. The FreeBSD version of ether_aton_r is: struct ether_addr * ether_aton_r(const char *a, struct ether_addr *e) { int i; unsigned int o0, o1, o2, o3, o4, o5; i = sscanf(a, "%x:%x:%x:%x:%x:%x", &o0, &o1, &o2, &o3, &o4, &o5); if (i != 6) return (NULL); e->octet[0]=o0; e->octet[1]=o1; e->octet[2]=o2; e->octet[3]=o3; e->octet[4]=o4; e->octet[5]=o5; return (e); } ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil ` (3 preceding siblings ...) 2017-12-18 23:59 ` Stephen Hemminger @ 2017-12-19 1:54 ` Ferruh Yigit 2017-12-19 15:06 ` Adrien Mazarguil 4 siblings, 1 reply; 112+ messages in thread From: Ferruh Yigit @ 2017-12-19 1:54 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: dev, Stephen Hemminger On 12/18/2017 8:46 AM, Adrien Mazarguil wrote: > As described in more details in the attached documentation (see patch > contents), this virtual device driver manages NetVSC interfaces in virtual > machines hosted by Hyper-V/Azure platforms. > > This driver does not manage traffic nor Ethernet devices directly; it acts > as a thin configuration layer that automatically instantiates and controls > fail-safe PMD instances combining tap and PCI sub-devices, so that each > NetVSC interface is exposed as a single consolidated port to DPDK > applications. > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > applications automatically benefit from increased throughput when present > and automatic fallback on NetVSC otherwise without interruption thanks to > fail-safe's hot-plug handling. > > Once initialized, the sole job of the hyperv driver is to regularly scan > for PCI devices to associate with NetVSC interfaces and feed their > addresses to corresponding fail-safe instances. > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> <...> > + RTE_ETH_FOREACH_DEV(port_id) { <..> > + ret = rte_eal_hotplug_remove(bus->name, dev->name); <..> > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); Overall why this logic implemented as network PMD? Yes technically you can implement *anything* as PMD :), but should we? This code does eal level work (scans bus, add/remove devices), and for control path, and not a generic solution either (specific to netvsc and failsafe). Only device argument part of a PMD seems used, rest is unrelated to being a PMD. Scans netvsc changes in background and reflects them into failsafe PMD... Why this is implemented as PMD, not another entity, like bus driver perhaps? Or indeed why this in DPDK instead of being in application? <...> ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 1:54 ` Ferruh Yigit @ 2017-12-19 15:06 ` Adrien Mazarguil 2017-12-19 20:44 ` Ferruh Yigit 0 siblings, 1 reply; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-19 15:06 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger On Mon, Dec 18, 2017 at 05:54:45PM -0800, Ferruh Yigit wrote: > On 12/18/2017 8:46 AM, Adrien Mazarguil wrote: > > As described in more details in the attached documentation (see patch > > contents), this virtual device driver manages NetVSC interfaces in virtual > > machines hosted by Hyper-V/Azure platforms. > > > > This driver does not manage traffic nor Ethernet devices directly; it acts > > as a thin configuration layer that automatically instantiates and controls > > fail-safe PMD instances combining tap and PCI sub-devices, so that each > > NetVSC interface is exposed as a single consolidated port to DPDK > > applications. > > > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > > applications automatically benefit from increased throughput when present > > and automatic fallback on NetVSC otherwise without interruption thanks to > > fail-safe's hot-plug handling. > > > > Once initialized, the sole job of the hyperv driver is to regularly scan > > for PCI devices to associate with NetVSC interfaces and feed their > > addresses to corresponding fail-safe instances. > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > <...> > > > + RTE_ETH_FOREACH_DEV(port_id) { > <..> > > + ret = rte_eal_hotplug_remove(bus->name, dev->name); > <..> > > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > > Overall why this logic implemented as network PMD? > Yes technically you can implement *anything* as PMD :), but should we? > > This code does eal level work (scans bus, add/remove devices), and for control > path, and not a generic solution either (specific to netvsc and failsafe). > > Only device argument part of a PMD seems used, rest is unrelated to being a PMD. > Scans netvsc changes in background and reflects them into failsafe PMD... > > Why this is implemented as PMD, not another entity, like bus driver perhaps? > > Or indeed why this in DPDK instead of being in application? I'll address that last question first: the point of this driver is enabling existing applications to run within a Hyper-V environment unmodified, because they'd otherwise need to manage two driver instances correctly on their own in addition to hot-plug events during VM migration. Some kind of driver generating a front end to what otherwise appears as two distinct ethdev to applications is therefore necessary. Currently without it, users have to manually configure failsafe properly for each NetVSC interface on their system. Besides the inconvenience, it's not even a possibility with DPDK applications that don't rely on EAL command-line arguments. As such it's more correctly defined as a "platform" driver rather than a true PMD. It leaves VF device handling to their respective PMDs while automatically managing the platform-specific part itself. There's no simpler alternative when running in blacklist mode (i.e. not specifying any device parameters on the command line). Regarding its presence in drivers/net rather than drivers/bus, the end result from an application standpoint is that each instance exposes a single ethdev, even if not its own (failsafe's). Busses don't do that. It also allows passing arguments to individual devices through --vdev if needed. You're right about putting device detection at the bus level though, and I think there's work in progress to do just that, this driver will be updated to benefit from it once applied. In the meantime, the code as submitted works fine with the current DPDK code base and addresses an existing use case for which there is no solution at this point. -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 15:06 ` Adrien Mazarguil @ 2017-12-19 20:44 ` Ferruh Yigit 2017-12-20 14:13 ` Thomas Monjalon 2017-12-21 16:19 ` Adrien Mazarguil 0 siblings, 2 replies; 112+ messages in thread From: Ferruh Yigit @ 2017-12-19 20:44 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: dev, Stephen Hemminger On 12/19/2017 7:06 AM, Adrien Mazarguil wrote: > On Mon, Dec 18, 2017 at 05:54:45PM -0800, Ferruh Yigit wrote: >> On 12/18/2017 8:46 AM, Adrien Mazarguil wrote: >>> As described in more details in the attached documentation (see patch >>> contents), this virtual device driver manages NetVSC interfaces in virtual >>> machines hosted by Hyper-V/Azure platforms. >>> >>> This driver does not manage traffic nor Ethernet devices directly; it acts >>> as a thin configuration layer that automatically instantiates and controls >>> fail-safe PMD instances combining tap and PCI sub-devices, so that each >>> NetVSC interface is exposed as a single consolidated port to DPDK >>> applications. >>> >>> PCI sub-devices being hot-pluggable (e.g. during VM migration), >>> applications automatically benefit from increased throughput when present >>> and automatic fallback on NetVSC otherwise without interruption thanks to >>> fail-safe's hot-plug handling. >>> >>> Once initialized, the sole job of the hyperv driver is to regularly scan >>> for PCI devices to associate with NetVSC interfaces and feed their >>> addresses to corresponding fail-safe instances. >>> >>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> >> >> <...> >> >>> + RTE_ETH_FOREACH_DEV(port_id) { >> <..> >>> + ret = rte_eal_hotplug_remove(bus->name, dev->name); >> <..> >>> + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); >> >> Overall why this logic implemented as network PMD? >> Yes technically you can implement *anything* as PMD :), but should we? >> >> This code does eal level work (scans bus, add/remove devices), and for control >> path, and not a generic solution either (specific to netvsc and failsafe). >> >> Only device argument part of a PMD seems used, rest is unrelated to being a PMD. >> Scans netvsc changes in background and reflects them into failsafe PMD... >> >> Why this is implemented as PMD, not another entity, like bus driver perhaps? >> >> Or indeed why this in DPDK instead of being in application? > > I'll address that last question first: the point of this driver is enabling > existing applications to run within a Hyper-V environment unmodified, > because they'd otherwise need to manage two driver instances correctly on > their own in addition to hot-plug events during VM migration. > > Some kind of driver generating a front end to what otherwise appears as two > distinct ethdev to applications is therefore necessary. > > Currently without it, users have to manually configure failsafe properly for > each NetVSC interface on their system. Besides the inconvenience, it's not > even a possibility with DPDK applications that don't rely on EAL > command-line arguments. > > As such it's more correctly defined as a "platform" driver rather than a > true PMD. It leaves VF device handling to their respective PMDs while > automatically managing the platform-specific part itself. There's no simpler > alternative when running in blacklist mode (i.e. not specifying any device > parameters on the command line). > > Regarding its presence in drivers/net rather than drivers/bus, the end > result from an application standpoint is that each instance exposes a single > ethdev, even if not its own (failsafe's). Busses don't do that. It also > allows passing arguments to individual devices through --vdev if needed. > > You're right about putting device detection at the bus level though, and I > think there's work in progress to do just that, this driver will be updated > to benefit from it once applied. In the meantime, the code as submitted > works fine with the current DPDK code base and addresses an existing use > case for which there is no solution at this point. This may be working but this looks like a hack to me. If we need a platform driver why not properly work on it. If we need to improve eal hotplug, this is a good motivation to improve it. And if this logic needs to be in application let it be, your argument is to not change the existing application but this logic may lead implementing many unrelated things as PMD to not change application, what is the line here. What is the work in progress, exact list, that will replace this solution? If this hackish solution will prevent that real work, I am against this solution. Is there a way to ensure this will be a temporary solution and that real work will happen? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 20:44 ` Ferruh Yigit @ 2017-12-20 14:13 ` Thomas Monjalon 2017-12-21 16:19 ` Adrien Mazarguil 1 sibling, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2017-12-20 14:13 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil; +Cc: dev, Stephen Hemminger 19/12/2017 21:44, Ferruh Yigit: > On 12/19/2017 7:06 AM, Adrien Mazarguil wrote: > > On Mon, Dec 18, 2017 at 05:54:45PM -0800, Ferruh Yigit wrote: > >> On 12/18/2017 8:46 AM, Adrien Mazarguil wrote: > >>> As described in more details in the attached documentation (see patch > >>> contents), this virtual device driver manages NetVSC interfaces in virtual > >>> machines hosted by Hyper-V/Azure platforms. > >>> > >>> This driver does not manage traffic nor Ethernet devices directly; it acts > >>> as a thin configuration layer that automatically instantiates and controls > >>> fail-safe PMD instances combining tap and PCI sub-devices, so that each > >>> NetVSC interface is exposed as a single consolidated port to DPDK > >>> applications. > >>> > >>> PCI sub-devices being hot-pluggable (e.g. during VM migration), > >>> applications automatically benefit from increased throughput when present > >>> and automatic fallback on NetVSC otherwise without interruption thanks to > >>> fail-safe's hot-plug handling. > >>> > >>> Once initialized, the sole job of the hyperv driver is to regularly scan > >>> for PCI devices to associate with NetVSC interfaces and feed their > >>> addresses to corresponding fail-safe instances. > >>> > >>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > >> > >> <...> > >> > >>> + RTE_ETH_FOREACH_DEV(port_id) { > >> <..> > >>> + ret = rte_eal_hotplug_remove(bus->name, dev->name); > >> <..> > >>> + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > >> > >> Overall why this logic implemented as network PMD? > >> Yes technically you can implement *anything* as PMD :), but should we? > >> > >> This code does eal level work (scans bus, add/remove devices), and for control > >> path, and not a generic solution either (specific to netvsc and failsafe). > >> > >> Only device argument part of a PMD seems used, rest is unrelated to being a PMD. > >> Scans netvsc changes in background and reflects them into failsafe PMD... > >> > >> Why this is implemented as PMD, not another entity, like bus driver perhaps? > >> > >> Or indeed why this in DPDK instead of being in application? > > > > I'll address that last question first: the point of this driver is enabling > > existing applications to run within a Hyper-V environment unmodified, > > because they'd otherwise need to manage two driver instances correctly on > > their own in addition to hot-plug events during VM migration. > > > > Some kind of driver generating a front end to what otherwise appears as two > > distinct ethdev to applications is therefore necessary. > > > > Currently without it, users have to manually configure failsafe properly for > > each NetVSC interface on their system. Besides the inconvenience, it's not > > even a possibility with DPDK applications that don't rely on EAL > > command-line arguments. > > > > As such it's more correctly defined as a "platform" driver rather than a > > true PMD. It leaves VF device handling to their respective PMDs while > > automatically managing the platform-specific part itself. There's no simpler > > alternative when running in blacklist mode (i.e. not specifying any device > > parameters on the command line). > > > > Regarding its presence in drivers/net rather than drivers/bus, the end > > result from an application standpoint is that each instance exposes a single > > ethdev, even if not its own (failsafe's). Busses don't do that. It also > > allows passing arguments to individual devices through --vdev if needed. > > > > You're right about putting device detection at the bus level though, and I > > think there's work in progress to do just that, this driver will be updated > > to benefit from it once applied. In the meantime, the code as submitted > > works fine with the current DPDK code base and addresses an existing use > > case for which there is no solution at this point. > > This may be working but this looks like a hack to me. > > If we need a platform driver why not properly work on it. If we need to improve > eal hotplug, this is a good motivation to improve it. I agree this code looks to be a platform driver. It is the first one of this kind. Usually, things are managed either in a device driver, a bus driver, or in EAL. I also agree that hotplug should be managed in EAL and bus drivers. > And if this logic needs to be in application let it be, your argument is to not > change the existing application but this logic may lead implementing many > unrelated things as PMD to not change application, what is the line here. The line is hardware management. The application should not have to implement device-specific or platform-specific code. The same application should be able to work on any platform. > What is the work in progress, exact list, that will replace this solution? If > this hackish solution will prevent that real work, I am against this solution. > Is there a way to ensure this will be a temporary solution and that real work > will happen? I think we should explicitly mark this code as temporary, or use the EXPERIMENTAL tag. It should motivate us to implement what is needed to completely remove this code later. About the work in progress: - When hotplug will be fully supported in EAL and bus drivers, the scan part of this platform driver should be removed. - When ethdev probe notifications will be integrated, it may also clean a part of this code. - We may also think how the future port ownership can improve the behaviour of this driver. - NetVSC is currently supported by the TAP PMD, but it may be replaced by a new NetVSC PMD (VMBUS driver is already sent). - We should also continue the work on the configuration file. Such user configuration may help for platform behaviours. As a conclusion, there are a lot of improvements in progress, and I am really happy to see Hyper-V supported in DPDK. I think this driver must be only a step towards a first class support, like KVM/Qemu/vhost/virtio. As there is no API implied here, I am OK to progress step by step. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality 2017-12-19 20:44 ` Ferruh Yigit 2017-12-20 14:13 ` Thomas Monjalon @ 2017-12-21 16:19 ` Adrien Mazarguil 1 sibling, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-21 16:19 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger, Thomas Monjalon Disclaimer: I agree with Thomas's suggestions in his reply [1] to your message, I'm replying below as well to provide more details of my own and clarify the motivations behind this approach a bit more. On Tue, Dec 19, 2017 at 12:44:35PM -0800, Ferruh Yigit wrote: > On 12/19/2017 7:06 AM, Adrien Mazarguil wrote: > > On Mon, Dec 18, 2017 at 05:54:45PM -0800, Ferruh Yigit wrote: > >> On 12/18/2017 8:46 AM, Adrien Mazarguil wrote: > >>> As described in more details in the attached documentation (see patch > >>> contents), this virtual device driver manages NetVSC interfaces in virtual > >>> machines hosted by Hyper-V/Azure platforms. > >>> > >>> This driver does not manage traffic nor Ethernet devices directly; it acts > >>> as a thin configuration layer that automatically instantiates and controls > >>> fail-safe PMD instances combining tap and PCI sub-devices, so that each > >>> NetVSC interface is exposed as a single consolidated port to DPDK > >>> applications. > >>> > >>> PCI sub-devices being hot-pluggable (e.g. during VM migration), > >>> applications automatically benefit from increased throughput when present > >>> and automatic fallback on NetVSC otherwise without interruption thanks to > >>> fail-safe's hot-plug handling. > >>> > >>> Once initialized, the sole job of the hyperv driver is to regularly scan > >>> for PCI devices to associate with NetVSC interfaces and feed their > >>> addresses to corresponding fail-safe instances. > >>> > >>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > >> > >> <...> > >> > >>> + RTE_ETH_FOREACH_DEV(port_id) { > >> <..> > >>> + ret = rte_eal_hotplug_remove(bus->name, dev->name); > >> <..> > >>> + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > >> > >> Overall why this logic implemented as network PMD? > >> Yes technically you can implement *anything* as PMD :), but should we? > >> > >> This code does eal level work (scans bus, add/remove devices), and for control > >> path, and not a generic solution either (specific to netvsc and failsafe). > >> > >> Only device argument part of a PMD seems used, rest is unrelated to being a PMD. > >> Scans netvsc changes in background and reflects them into failsafe PMD... > >> > >> Why this is implemented as PMD, not another entity, like bus driver perhaps? > >> > >> Or indeed why this in DPDK instead of being in application? > > > > I'll address that last question first: the point of this driver is enabling > > existing applications to run within a Hyper-V environment unmodified, > > because they'd otherwise need to manage two driver instances correctly on > > their own in addition to hot-plug events during VM migration. > > > > Some kind of driver generating a front end to what otherwise appears as two > > distinct ethdev to applications is therefore necessary. > > > > Currently without it, users have to manually configure failsafe properly for > > each NetVSC interface on their system. Besides the inconvenience, it's not > > even a possibility with DPDK applications that don't rely on EAL > > command-line arguments. > > > > As such it's more correctly defined as a "platform" driver rather than a > > true PMD. It leaves VF device handling to their respective PMDs while > > automatically managing the platform-specific part itself. There's no simpler > > alternative when running in blacklist mode (i.e. not specifying any device > > parameters on the command line). > > > > Regarding its presence in drivers/net rather than drivers/bus, the end > > result from an application standpoint is that each instance exposes a single > > ethdev, even if not its own (failsafe's). Busses don't do that. It also > > allows passing arguments to individual devices through --vdev if needed. > > > > You're right about putting device detection at the bus level though, and I > > think there's work in progress to do just that, this driver will be updated > > to benefit from it once applied. In the meantime, the code as submitted > > works fine with the current DPDK code base and addresses an existing use > > case for which there is no solution at this point. > > This may be working but this looks like a hack to me. > > If we need a platform driver why not properly work on it. If we need to improve > eal hotplug, this is a good motivation to improve it. Hotplug surely can be improved but I don't think that alone will be enough for what this driver does. Here's how things are sequenced as currently implemented: 1. DPDK application starts. 2. EAL scans for PCI devices, ethdev ports are created for relevant ones. 3. hyperv vdev scans the system for appropriate NetVSC netdevices, instantiates failsafe PMD accordingly to create ethdev ports for each of them. At this stage, rte_eal_hotplug_remove() is also called on physical devices found in 2. that will be given to failsafe (see 4.), since they're not supposed to be seen or owned by the application (keep in mind this happens on Hyper-V platforms only). 4. From this point on, application can use the remaining ports normally. 5. A PCI device gets plugged in, kernel recognizes it and creates a netdevice for it. 6. hyperv's timer callback detects the new netdevice, if its properties match NetVSC's then it proceeds to tell failsafe its location. 7. failsafe probes the given address on the appropriate bus to instantiate another hidden ethdev out of it and primarily uses that device for TX until it gets unplugged. Meanwhile, RX is still performed on both underlying devices. Let's now assume hot-plug is perfectly implemented in DPDK along with Gaetan's netdevice bus [2] (or equivalent) with hotplug properties as well: 1. DPDK application starts. 2. EAL scans for PCI devices, ethdev ports are created for relevant ones. 3. EAL scans for net_bus devices, ethdev ports are created for relevant ones. 4. The piece of code formerly known as the hyperv driver looks at detected net_bus devices, finds relevant ones with NetVSC properties and promptly kicks them out through rte_eal_hotplug_remove() (or equivalent) so that the application doesn't get a chance to "see" them. It then instantiates fail-safe PMD like before, with fail-safe re-discovering devices as its own. 5. From this point on, application can use the remaining ports normally. 6. A PCI device gets plugged in, kernel recognizes it and creates a netdevice for it. 7. EAL's net_bus hotplug handler kicks in, automatically creates a new ethdev port out of it (note: device properties such as MAC addresses are not known before the associated PMD is initialized and an ethdev created). 8. The piece of code formerly known as the hyperv driver that happens to also be listening for hotplug events sees that new ethdev port; if its properties match NetVSC's then it proceeds to hide it before telling failsafe its location. 9. failsafe probes the given address on the appropriate bus to instantiate another hidden ethdev out of it and primarily uses that device for TX until it gets unplugged. Meanwhile, RX is still performed on both underlying devices. Hotplug basically removes the timer callback and some of the probing code. I agree it's perfectly fine to update this PMD once hotplug is implemented that way. Now what about the rest? Without a driver there's no way to orchestrate all the above. A separate layer between applications and PMDs is necessary for that; the handover of ethdev ports to failsafe is mandatory. > And if this logic needs to be in application let it be, your argument is to not > change the existing application but this logic may lead implementing many > unrelated things as PMD to not change application, what is the line here. Well, for this particular case I don't think many applications want to retrieve multicast and some other traffic out of one ethdev and the rest from another only when the latter is present. This complexity must be handled by the framework, not by applications, which ideally are not supposed to know much about the environment they're running in. For this reason, even a specific API is out of the question. > What is the work in progress, exact list, that will replace this solution? If > this hackish solution will prevent that real work, I am against this solution. > Is there a way to ensure this will be a temporary solution and that real work > will happen? I think Thomas answers this question [1], I'll just add that the current approach was developed and submitted in a way that doesn't have any impact on public APIs precisely to avoid conflicts with other work on EAL in the meantime. If the hotplug subsystem evolves, this driver will catch up, particularly since it's small and shouldn't be too complex to adapt. I volunteer for that work once APIs are ready in any case; failing that, the experimental tag (I'll add it for v2) means its pure and simple removal. I'd like your opinion on the current approach to determine the next steps: - Do you agree with the fact hotplug and platform-related functionality are two separate problems, that the approach to implement the former doesn't address the latter? - About implementing the latter in DPDK as a kind of platform driver so that applications don't need to be modified? - If you had to choose between drivers/bus and drivers/net for it? (keep in mind the ability to provide per-device options would be great) [1] http://dpdk.org/ml/archives/dev/2017-December/084558.html [2] http://dpdk.org/ml/archives/dev/2017-June/067546.html -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v1 3/3] net/hyperv: add "force" parameter 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil @ 2017-12-18 16:46 ` Adrien Mazarguil 2017-12-18 18:23 ` [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil 4 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 16:46 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger This parameter allows specifying any non-NetVSC interface to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- doc/guides/nics/hyperv.rst | 5 +++++ drivers/net/hyperv/hyperv.c | 26 +++++++++++++++++++------- 2 files changed, 24 insertions(+), 7 deletions(-) diff --git a/doc/guides/nics/hyperv.rst b/doc/guides/nics/hyperv.rst index 8f7a8b153..9b5220919 100644 --- a/doc/guides/nics/hyperv.rst +++ b/doc/guides/nics/hyperv.rst @@ -110,5 +110,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC. + Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to all NetVSC interfaces found on the system. diff --git a/drivers/net/hyperv/hyperv.c b/drivers/net/hyperv/hyperv.c index bad224be9..d9d9bbcd5 100644 --- a/drivers/net/hyperv/hyperv.c +++ b/drivers/net/hyperv/hyperv.c @@ -62,6 +62,7 @@ #define HYPERV_DRIVER net_hyperv #define HYPERV_ARG_IFACE "iface" #define HYPERV_ARG_MAC "mac" +#define HYPERV_ARG_FORCE "force" #define HYPERV_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -504,6 +505,9 @@ hyperv_alarm(void *arg) * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -521,6 +525,7 @@ hyperv_netvsc_probe(const struct if_nameindex *iface, { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -567,9 +572,11 @@ hyperv_netvsc_probe(const struct if_nameindex *iface, if (!hyperv_iface_is_netvsc(iface)) { if (!specified) return 0; - WARN("interface \"%s\" (index %u) is not NetVSC, skipping", - iface->if_name, iface->if_index); - return 0; + WARN("interface \"%s\" (index %u) is not NetVSC, %s", + iface->if_name, iface->if_index, + force ? "using anyway (forced)" : "skipping"); + if (!force) + return 0; } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -700,6 +707,7 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) static const char *const hyperv_arg[] = { HYPERV_ARG_IFACE, HYPERV_ARG_MAC, + HYPERV_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -708,6 +716,7 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) hyperv_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -719,13 +728,15 @@ hyperv_vdev_probe(struct rte_vdev_device *dev) for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, HYPERV_ARG_IFACE) || - !strcmp(pair->key, HYPERV_ARG_MAC)) + if (!strcmp(pair->key, HYPERV_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, HYPERV_ARG_IFACE) || + !strcmp(pair->key, HYPERV_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(hyperv_alarm, NULL); /* Gather interfaces. */ - ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, + ret = hyperv_foreach_iface(hyperv_netvsc_probe, name, kvargs, force, specified, &matched); if (ret < 0) goto error; @@ -784,4 +795,5 @@ RTE_PMD_REGISTER_VDEV(HYPERV_DRIVER, hyperv_vdev); RTE_PMD_REGISTER_ALIAS(HYPERV_DRIVER, eth_hyperv); RTE_PMD_REGISTER_PARAM_STRING(net_hyperv, HYPERV_ARG_IFACE "=<string> " - HYPERV_ARG_MAC "=<string>"); + HYPERV_ARG_MAC "=<string> " + HYPERV_ARG_FORCE "=<int>"); -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil ` (2 preceding siblings ...) 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 3/3] net/hyperv: add "force" parameter Adrien Mazarguil @ 2017-12-18 18:23 ` Stephen Hemminger 2017-12-18 20:13 ` Thomas Monjalon 2017-12-18 20:21 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil 4 siblings, 2 replies; 112+ messages in thread From: Stephen Hemminger @ 2017-12-18 18:23 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev On Mon, 18 Dec 2017 17:46:19 +0100 Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > Virtual machines hosted by Hyper-V/Azure platforms are fitted with > simplified virtual network devices named NetVSC that are used for fast > communication between VM to VM, VM to hypervisor, and the outside. > > They appear as standard system netdevices to user-land applications, the > main difference being they are implemented on top of VMBUS [1] instead of > emulated PCI devices. > > While this reads like a case for a standard DPDK PMD, there is more to it. > > To accelerate outside communication, NetVSC devices as they appear in a VM > can be paired with physical SR-IOV virtual function (VF) devices owned by > that same VM [2]. Both netdevices share the same MAC address in that case. > > When paired, egress and most of the ingress traffic flow through the VF > device, while part of it (e.g. multicasts, hypervisor control data) still > flows through NetVSC. Moreover VF devices are not retained and disappear > during VM migration; from a VM standpoint, they can be hot-plugged anytime > with NetVSC acting as a fallback. > > Running DPDK applications in such a context involves driving VF devices > using their dedicated PMDs in a vendor-independent fashion (to benefit from > maximum performance without writing dedicated code) while simultaneously > listening to NetVSC and handling the related hot-plug events. > > This new virtual PMD (referred to as "hyperv" from this point on) > automatically coordinates the Hyper-V/Azure-specific management part > described above by relying on vendor-specific, failsafe and tap PMDs to > expose a single consolidated Ethernet device usable directly by existing > applications. > > .------------------. > | DPDK application | > `--------+---------' > | > .------+------. > | DPDK ethdev | > `------+------' Control > | | > .------------+------------. v .------------. > | failsafe PMD +---------+ hyperv PMD | > `--+-------------------+--' `------------' > | | > | .........|......... > | : | : > .----+----. : .----+----. : > | tap PMD | : | any PMD | : > `----+----' : `----+----' : <-- Hot-pluggable > | : | : > .------+-------. : .-----+-----. : > | NetVSC-based | : | SR-IOV VF | : > | netdevice | : | device | : > `--------------' : `-----------' : > :.................: > > Note this diagram differs from that of the original RFC [3], with hyperv no > longer acting as a data plane layer. > > This initial version of the driver only works in whitelist mode. Users have > to provide the --vdev net_hyperv EAL option at least once to trigger it. > > Subsequent work will add support for blacklist mode based on automatic > detection of the host environment. > > [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html > [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v > [3] http://dpdk.org/ml/archives/dev/2017-November/082339.html > > Adrien Mazarguil (3): > net/hyperv: introduce MS Hyper-V platform driver > net/hyperv: implement core functionality > net/hyperv: add "force" parameter > > MAINTAINERS | 6 + > config/common_base | 6 + > config/common_linuxapp | 1 + > doc/guides/nics/features/hyperv.ini | 12 + > doc/guides/nics/hyperv.rst | 119 +++ > doc/guides/nics/index.rst | 1 + > drivers/net/Makefile | 1 + > drivers/net/hyperv/Makefile | 58 ++ > drivers/net/hyperv/hyperv.c | 799 +++++++++++++++++++++ > drivers/net/hyperv/rte_pmd_hyperv_version.map | 4 + > mk/rte.app.mk | 1 + > 11 files changed, 1008 insertions(+) > create mode 100644 doc/guides/nics/features/hyperv.ini > create mode 100644 doc/guides/nics/hyperv.rst > create mode 100644 drivers/net/hyperv/Makefile > create mode 100644 drivers/net/hyperv/hyperv.c > create mode 100644 drivers/net/hyperv/rte_pmd_hyperv_version.map > Please don't call this drivers/net/hyperv/ that name conflicts with the real netvsc PMD that I am working on. Maybe vdev-netvsc? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-18 18:23 ` [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger @ 2017-12-18 20:13 ` Thomas Monjalon 2017-12-19 0:40 ` Stephen Hemminger 2017-12-18 20:21 ` Adrien Mazarguil 1 sibling, 1 reply; 112+ messages in thread From: Thomas Monjalon @ 2017-12-18 20:13 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Adrien Mazarguil, Ferruh Yigit 18/12/2017 19:23, Stephen Hemminger: > Please don't call this drivers/net/hyperv/ > that name conflicts with the real netvsc PMD that I am working on. > > Maybe vdev-netvsc? I expect your PMD to be in drivers/net/netvsc/ Why is it conflicting with drivers/net/hyperv/ ? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-18 20:13 ` Thomas Monjalon @ 2017-12-19 0:40 ` Stephen Hemminger 0 siblings, 0 replies; 112+ messages in thread From: Stephen Hemminger @ 2017-12-19 0:40 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Adrien Mazarguil, Ferruh Yigit On Mon, Dec 18, 2017 at 12:13 PM, Thomas Monjalon <thomas@monjalon.net> wrote: > 18/12/2017 19:23, Stephen Hemminger: > > Please don't call this drivers/net/hyperv/ > > that name conflicts with the real netvsc PMD that I am working on. > > > > Maybe vdev-netvsc? > > I expect your PMD to be in drivers/net/netvsc/ > Why is it conflicting with drivers/net/hyperv/ ? > > > > The naming is a bit confusing, and I am willing to change it since not upstream. The code uses mostly BSD driver which doesn't call itself netvsc. Instead the BSD driver uses hn_ as a prefix for most visible data and functions. Have been trying to name netvsc to avoid confusion with the kernel driver. Like any name it is completely irrelevant to functionality. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-18 18:23 ` [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger 2017-12-18 20:13 ` Thomas Monjalon @ 2017-12-18 20:21 ` Adrien Mazarguil 1 sibling, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-18 20:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, dev On Mon, Dec 18, 2017 at 10:23:04AM -0800, Stephen Hemminger wrote: > On Mon, 18 Dec 2017 17:46:19 +0100 > Adrien Mazarguil <adrien.mazarguil@6wind.com> wrote: > > > Virtual machines hosted by Hyper-V/Azure platforms are fitted with > > simplified virtual network devices named NetVSC that are used for fast > > communication between VM to VM, VM to hypervisor, and the outside. > > > > They appear as standard system netdevices to user-land applications, the > > main difference being they are implemented on top of VMBUS [1] instead of > > emulated PCI devices. > > > > While this reads like a case for a standard DPDK PMD, there is more to it. > > > > To accelerate outside communication, NetVSC devices as they appear in a VM > > can be paired with physical SR-IOV virtual function (VF) devices owned by > > that same VM [2]. Both netdevices share the same MAC address in that case. > > > > When paired, egress and most of the ingress traffic flow through the VF > > device, while part of it (e.g. multicasts, hypervisor control data) still > > flows through NetVSC. Moreover VF devices are not retained and disappear > > during VM migration; from a VM standpoint, they can be hot-plugged anytime > > with NetVSC acting as a fallback. > > > > Running DPDK applications in such a context involves driving VF devices > > using their dedicated PMDs in a vendor-independent fashion (to benefit from > > maximum performance without writing dedicated code) while simultaneously > > listening to NetVSC and handling the related hot-plug events. > > > > This new virtual PMD (referred to as "hyperv" from this point on) > > automatically coordinates the Hyper-V/Azure-specific management part > > described above by relying on vendor-specific, failsafe and tap PMDs to > > expose a single consolidated Ethernet device usable directly by existing > > applications. > > > > .------------------. > > | DPDK application | > > `--------+---------' > > | > > .------+------. > > | DPDK ethdev | > > `------+------' Control > > | | > > .------------+------------. v .------------. > > | failsafe PMD +---------+ hyperv PMD | > > `--+-------------------+--' `------------' > > | | > > | .........|......... > > | : | : > > .----+----. : .----+----. : > > | tap PMD | : | any PMD | : > > `----+----' : `----+----' : <-- Hot-pluggable > > | : | : > > .------+-------. : .-----+-----. : > > | NetVSC-based | : | SR-IOV VF | : > > | netdevice | : | device | : > > `--------------' : `-----------' : > > :.................: > > > > Note this diagram differs from that of the original RFC [3], with hyperv no > > longer acting as a data plane layer. > > > > This initial version of the driver only works in whitelist mode. Users have > > to provide the --vdev net_hyperv EAL option at least once to trigger it. > > > > Subsequent work will add support for blacklist mode based on automatic > > detection of the host environment. > > > > [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html > > [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v > > [3] http://dpdk.org/ml/archives/dev/2017-November/082339.html > > > > Adrien Mazarguil (3): > > net/hyperv: introduce MS Hyper-V platform driver > > net/hyperv: implement core functionality > > net/hyperv: add "force" parameter > > > > MAINTAINERS | 6 + > > config/common_base | 6 + > > config/common_linuxapp | 1 + > > doc/guides/nics/features/hyperv.ini | 12 + > > doc/guides/nics/hyperv.rst | 119 +++ > > doc/guides/nics/index.rst | 1 + > > drivers/net/Makefile | 1 + > > drivers/net/hyperv/Makefile | 58 ++ > > drivers/net/hyperv/hyperv.c | 799 +++++++++++++++++++++ > > drivers/net/hyperv/rte_pmd_hyperv_version.map | 4 + > > mk/rte.app.mk | 1 + > > 11 files changed, 1008 insertions(+) > > create mode 100644 doc/guides/nics/features/hyperv.ini > > create mode 100644 doc/guides/nics/hyperv.rst > > create mode 100644 drivers/net/hyperv/Makefile > > create mode 100644 drivers/net/hyperv/hyperv.c > > create mode 100644 drivers/net/hyperv/rte_pmd_hyperv_version.map > > > > Please don't call this drivers/net/hyperv/ > that name conflicts with the real netvsc PMD that I am working on. > > Maybe vdev-netvsc? No problem with that, if vdev-netvsc is good for you, I can update it in v2 if needed. I'm just curious, I was under the impression both drivers would remain kind of complementary pending various API updates, in which case wouldn't it make sense to use "netvsc" as the better name for the NetVSC PMD? ("hyperv" being more a use case than a true PMD) Otherwise I also don't mind overwriting the current "hyperv" PMD code base with yours as soon as it's ready, this will most likely make it redundant anyway. -- Adrien Mazarguil 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil ` (3 preceding siblings ...) 2017-12-18 18:23 ` [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free Adrien Mazarguil ` (6 more replies) 4 siblings, 7 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS [1] instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM [2]. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual PMD (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .-----------------. | failsafe PMD +---------+ vdev_netvsc PMD | `--+-------------------+--' `-----------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: Note this diagram differs from that of the original RFC [3], with vdev_netvsc no longer acting as a data plane layer. This initial version of the driver only works in whitelist mode. Users have to provide the --vdev net_vdev_netvsc EAL option at least once to trigger it. Subsequent work will add support for blacklist mode based on automatic detection of the host environment. [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v [3] http://dpdk.org/ml/archives/dev/2017-November/082339.html v2 changes: - Renamed driver from "hyperv" to "vdev_netvsc". This change covers documentation and symbols prefix. - Driver is now tagged EXPERIMENTAL. - Replaced ether_addr_from_str() with a basic sscanf() call. - Removed debugging code (memset() poisoning). - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. - Removed hyperv_basename(). - Discarded unused variables through __rte_unused. - Added separate but necessary free() bugfix for failsafe PMD. - Added file descriptor input support to failsafe PMD. - Replaced temporary bash execution; failsafe now reads device definitions directly through a pipe without an intermediate bash one-liner. - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). - Added dynamic log type (pmd.vdev_netvsc). - Modified initialization code to probe devices immediately during startup. - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more appropriate than "ret >= sizeof(foo) - 1"). Adrien Mazarguil (5): net/failsafe: fix invalid free net/failsafe: add "fd" parameter net/vdev_netvsc: introduce Hyper-V platform driver net/vdev_netvsc: implement core functionality net/vdev_netvsc: add "force" parameter MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/fail_safe.rst | 9 + doc/guides/nics/features/vdev_netvsc.ini | 12 + doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 116 +++ drivers/net/Makefile | 1 + drivers/net/failsafe/failsafe_args.c | 88 ++- drivers/net/failsafe/failsafe_private.h | 3 + drivers/net/vdev_netvsc/Makefile | 58 ++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 722 +++++++++++++++++++ mk/rte.app.mk | 1 + 14 files changed, 1025 insertions(+), 2 deletions(-) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 2/5] net/failsafe: add "fd" parameter Adrien Mazarguil ` (5 subsequent siblings) 6 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, stable, Gaetan Rivet rte_free() is not supposed to work with pointers returned by calloc(). Fixes: a0194d828100 ("net/failsafe: add flexible device definition") Cc: stable@dpdk.org Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- drivers/net/failsafe/failsafe_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index cfc83e365..ec63ac972 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -407,7 +407,7 @@ failsafe_args_free(struct rte_eth_dev *dev) uint8_t i; FOREACH_SUBDEV(sdev, i, dev) { - rte_free(sdev->cmdline); + free(sdev->cmdline); sdev->cmdline = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 2/5] net/failsafe: add "fd" parameter 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free Adrien Mazarguil @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 3/5] net/vdev_netvsc: introduce Hyper-V platform driver Adrien Mazarguil ` (4 subsequent siblings) 6 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Gaetan Rivet This parameter enables applications to provide device definitions through an arbitrary file descriptor number. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 9 +++ drivers/net/failsafe/failsafe_args.c | 86 +++++++++++++++++++++++++++- drivers/net/failsafe/failsafe_private.h | 3 + 3 files changed, 97 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index c4e3d2e8d..5b1b47e56 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -106,6 +106,15 @@ Fail-safe command line parameters All commas within the ``shell command`` are replaced by spaces before executing the command. This helps using scripts to specify devices. +- **fd(<file descriptor number>)** parameter + + This parameter reads a device definition from an arbitrary file descriptor + number in ``<iface>`` format as described above. + + The file descriptor is read in non-blocking mode and is never closed in + order to take only the last line into account (unlike ``exec()``) at every + probe attempt. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index ec63ac972..7a8605174 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -31,7 +31,11 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> #include <string.h> +#include <unistd.h> #include <errno.h> #include <rte_debug.h> @@ -161,6 +165,73 @@ fs_execute_cmd(struct sub_device *sdev, char *cmdline) } static int +fs_read_fd(struct sub_device *sdev, char *fd_str) +{ + FILE *fp = NULL; + int fd = -1; + /* store possible newline as well */ + char output[DEVARGS_MAXLEN + 1]; + int err = -ENODEV; + int ret; + + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); + if (sdev->fd_str == NULL) { + sdev->fd_str = strdup(fd_str); + if (sdev->fd_str == NULL) { + ERROR("Command line allocation failed"); + return -ENOMEM; + } + } + errno = 0; + fd = strtol(fd_str, &fd_str, 0); + if (errno || *fd_str || fd < 0) { + ERROR("Parsing FD number failed"); + goto error; + } + /* Fiddle with copy of file descriptor */ + fd = dup(fd); + if (fd == -1) + goto error; + ret = fcntl(fd, F_GETFL); + if (ret == -1) + goto error; + ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK); + if (ret == -1) + goto error; + fp = fdopen(fd, "r"); + if (!fp) + goto error; + fd = -1; + /* Only take the last line into account */ + ret = 0; + while (fgets(output, sizeof(output), fp)) + ++ret; + if (feof(fp)) { + if (!ret) + goto error; + } else if (ferror(fp)) { + if (errno != EAGAIN || !ret) + goto error; + } else if (!ret) { + goto error; + } + /* Line must end with a newline character */ + fs_sanitize_cmdline(output); + if (output[0] == '\0') + goto error; + ret = fs_parse_device(sdev, output); + if (ret) + ERROR("Parsing device '%s' failed", output); + err = ret; +error: + if (fp) + fclose(fp); + if (fd != -1) + close(fd); + return err; +} + +static int fs_parse_device_param(struct rte_eth_dev *dev, const char *param, uint8_t head) { @@ -202,6 +273,14 @@ fs_parse_device_param(struct rte_eth_dev *dev, const char *param, } if (ret) goto free_args; + } else if (strncmp(param, "fd", 2) == 0) { + ret = fs_read_fd(sdev, args); + if (ret == -ENODEV) { + DEBUG("Reading device info from FD failed"); + ret = 0; + } + if (ret) + goto free_args; } else { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; @@ -409,6 +488,8 @@ failsafe_args_free(struct rte_eth_dev *dev) FOREACH_SUBDEV(sdev, i, dev) { free(sdev->cmdline); sdev->cmdline = NULL; + free(sdev->fd_str); + sdev->fd_str = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; } @@ -424,7 +505,8 @@ fs_count_device(struct rte_eth_dev *dev, const char *param, param[b] != '\0') b++; if (strncmp(param, "dev", b) != 0 && - strncmp(param, "exec", b) != 0) { + strncmp(param, "exec", b) != 0 && + strncmp(param, "fd", b) != 0) { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; } @@ -463,6 +545,8 @@ failsafe_args_parse_subs(struct rte_eth_dev *dev) continue; if (sdev->cmdline) ret = fs_execute_cmd(sdev, sdev->cmdline); + else if (sdev->fd_str) + ret = fs_read_fd(sdev, sdev->fd_str); else ret = fs_parse_sub_device(sdev); if (ret == 0) diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index d81cc3ca6..a0d36751f 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -48,6 +48,7 @@ #define PMD_FAILSAFE_PARAM_STRING \ "dev(<ifc>)," \ "exec(<shell command>)," \ + "fd(<fd number>)," \ "mac=mac_addr," \ "hotplug_poll=u64" \ "" @@ -111,6 +112,8 @@ struct sub_device { struct fs_stats stats_snapshot; /* Some device are defined as a command line */ char *cmdline; + /* Others are retrieved through a file descriptor */ + char *fd_str; /* fail-safe device backreference */ struct rte_eth_dev *fs_dev; /* flag calling for recollection */ -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 3/5] net/vdev_netvsc: introduce Hyper-V platform driver 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 2/5] net/failsafe: add "fd" parameter Adrien Mazarguil @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 4/5] net/vdev_netvsc: implement core functionality Adrien Mazarguil ` (3 subsequent siblings) 6 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger This patch lays the groundwork for this driver (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/features/vdev_netvsc.ini | 12 ++ doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 46 +++++++ drivers/net/Makefile | 1 + drivers/net/vdev_netvsc/Makefile | 54 ++++++++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 132 +++++++++++++++++++ mk/rte.app.mk | 1 + 11 files changed, 263 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 5a63b40c2..2b61c93aa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -451,6 +451,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft vdev-netvsc - EXPERIMENTAL +M: Adrien Mazarguil <adrien.mazarguil@6wind.com> +F: drivers/net/vdev-netvsc/ +F: doc/guides/nics/vdev-netvsc.rst +F: doc/guides/nics/features/vdev-netvsc.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index b8ee8f91c..ef904dfd5 100644 --- a/config/common_base +++ b/config/common_base @@ -280,6 +280,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile virtual device driver for NetVSC on Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64ec..e04326224 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/vdev_netvsc.ini b/doc/guides/nics/features/vdev_netvsc.ini new file mode 100644 index 000000000..cfc5cb93e --- /dev/null +++ b/doc/guides/nics/features/vdev_netvsc.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'vdev_netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe933..566604671 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -64,6 +64,7 @@ Network Interface Controller Drivers szedata2 tap thunderx + vdev_netvsc virtio vhost vmxnet3 diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst new file mode 100644 index 000000000..be31b6597 --- /dev/null +++ b/doc/guides/nics/vdev_netvsc.rst @@ -0,0 +1,46 @@ +.. BSD LICENSE + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of 6WIND S.A. nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +VDEV_NETVSC poll mode driver +============================ + +The VDEV_NETVSC PMD (librte_pmd_vdev_netvsc) provides support for NetVSC +interfaces and associated SR-IOV virtual function (VF) devices found in +Linux virtual machines running on Microsoft Hyper-V_ (including Azure) +platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) + + Toggle compilation of this driver. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index ef09b4e16..dc41ed11e 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -66,6 +66,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx +DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile new file mode 100644 index 000000000..e53050fe1 --- /dev/null +++ b/drivers/net/vdev_netvsc/Makefile @@ -0,0 +1,54 @@ +# BSD LICENSE +# +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of 6WIND S.A. nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_vdev_netvsc.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_vdev_netvsc_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map new file mode 100644 index 000000000..179140fb8 --- /dev/null +++ b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c new file mode 100644 index 000000000..3b73482da --- /dev/null +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -0,0 +1,132 @@ +/*- + * BSD LICENSE + * + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <stddef.h> + +#include <rte_bus_vdev.h> +#include <rte_common.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_ARG_IFACE "iface" +#define VDEV_NETVSC_ARG_MAC "mac" + +#define PMD_DRV_LOG(level, ...) \ + rte_log(RTE_LOG_ ## level, \ + vdev_netvsc_logtype, \ + RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +/** Driver-specific log messages type. */ +static int vdev_netvsc_logtype; + +/** Number of PMD instances relying on context list. */ +static unsigned int vdev_netvsc_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for PMD instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const vdev_netvsc_arg[] = { + VDEV_NETVSC_ARG_IFACE, + VDEV_NETVSC_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + vdev_netvsc_arg); + + PMD_DRV_LOG(DEBUG, + "invoked as \"%s\", using arguments \"%s\"", + name, args); + if (!kvargs) { + PMD_DRV_LOG(ERR, "cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++vdev_netvsc_ctx_inst; + return 0; +} + +/** + * Remove PMD instance. + * + * @param dev + * Virtual device context for PMD instance. + * + * @return + * Always 0. + */ +static int +vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) +{ + --vdev_netvsc_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver vdev_netvsc_vdev = { + .probe = vdev_netvsc_vdev_probe, + .remove = vdev_netvsc_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); +RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); +RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, + VDEV_NETVSC_ARG_IFACE "=<string> " + VDEV_NETVSC_ARG_MAC "=<string>"); + +/** Initialize driver log type. */ +static void +vdev_netvsc_init_log(void) +{ + vdev_netvsc_logtype = rte_log_register("pmd.vdev_netvsc"); + if (vdev_netvsc_logtype >= 0) + rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); +} + +RTE_INIT(vdev_netvsc_init_log); diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 6a6a7452e..3ae521228 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -156,6 +156,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += -lrte_pmd_sfc_efx _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += -lrte_pmd_szedata2 -lsze2 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += -lrte_pmd_tap _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += -lrte_pmd_thunderx_nicvf +_LDLIBS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += -lrte_pmd_vdev_netvsc _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 4/5] net/vdev_netvsc: implement core functionality 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil ` (2 preceding siblings ...) 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 3/5] net/vdev_netvsc: introduce Hyper-V platform driver Adrien Mazarguil @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 5/5] net/vdev_netvsc: add "force" parameter Adrien Mazarguil ` (2 subsequent siblings) 6 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the vdev_netvsc driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- doc/guides/nics/vdev_netvsc.rst | 65 ++++ drivers/net/vdev_netvsc/Makefile | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 581 ++++++++++++++++++++++++++++- 3 files changed, 649 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index be31b6597..73a63e552 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -38,9 +38,74 @@ platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .-----------------. + | failsafe PMD +---------+ vdev_netvsc PMD | + `--+-------------------+--' `-----------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + Build options ------------- - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) Toggle compilation of this driver. + +Run-time parameters +------------------- + +To invoke this PMD, applications have to explicitly provide the +``--vdev=net_vdev_netvsc`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this PMD + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile index e53050fe1..3b3fe1c56 100644 --- a/drivers/net/vdev_netvsc/Makefile +++ b/drivers/net/vdev_netvsc/Makefile @@ -40,6 +40,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -47,6 +50,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 3b73482da..738196e75 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -31,17 +31,41 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <errno.h> +#include <fcntl.h> +#include <inttypes.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> #include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #define PMD_DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -53,12 +77,527 @@ /** Driver-specific log messages type. */ static int vdev_netvsc_logtype; +/** Context structure for a vdev_netvsc instance. */ +struct vdev_netvsc_ctx { + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< ID used to generate unique names. */ + char name[64]; /**< Unique name for vdev_netvsc instance. */ + char devname[64]; /**< Fail-safe PMD instance name. */ + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Communication pipe with fail-safe instance. */ + char yield[256]; /**< Current device string used with fail-safe. */ +}; + +/** Context list is common to all PMD instances. */ +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); + +/** Number of entries in context list. */ +static unsigned int vdev_netvsc_ctx_count; + /** Number of PMD instances relying on context list. */ static unsigned int vdev_netvsc_ctx_inst; /** + * Destroy a vdev_netvsc context instance. + * + * @param ctx + * Context to destroy. + */ +static void +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + PMD_DRV_LOG(ERR, "cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + PMD_DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + PMD_DRV_LOG(WARNING, + "cannot retrieve information about" + " interface \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[sizeof(temp) + IF_NAMESIZE]; + FILE *f; + int ret; + int len = 0; + + ret = snprintf(path, sizeof(path), temp, iface->if_name); + if (ret == -1 || (size_t)ret >= sizeof(path)) { + rte_errno = ENOBUFS; + return 0; + } + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with vdev_netvsc context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the vdev_netvsc context and + * communicates its bus address to the fail-safe PMD instance if so. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct vdev_netvsc_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + PMD_DRV_LOG(DEBUG, + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (vdev_netvsc_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + if (strcmp(addr, "pci")) + return 0; + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance. */ + if (strcmp(addr, ctx->yield)) + PMD_DRV_LOG(DEBUG, + "associating PCI device \"%s\" with NetVSC" + " interface \"%s\" (index %u)", + addr, ctx->if_name, ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + PMD_DRV_LOG(WARNING, + "cannot associate PCI device name \"%s\" with" + " interface \"%s\": %s", + addr, ctx->if_name, rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as + * long as an vdev_netvsc context instance exists. + * + * @param arg + * Ignored. + */ +static void +vdev_netvsc_alarm(__rte_unused void *arg) +{ + struct vdev_netvsc_ctx *ctx; + int ret; + + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + if (ret) + break; + } + if (!vdev_netvsc_ctx_count) + return; + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + PMD_DRV_LOG(ERR, "unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a vdev_netvsc context from. + * + * This function instantiates vdev_netvsc contexts either for all NetVSC + * devices found on the system or only a subset provided as device + * arguments. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct vdev_netvsc_ctx *ctx; + uint16_t port_id; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { + struct ether_addr tmp; + + if (sscanf(pair->value, + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, + &tmp.addr_bytes[0], + &tmp.addr_bytes[1], + &tmp.addr_bytes[2], + &tmp.addr_bytes[3], + &tmp.addr_bytes[4], + &tmp.addr_bytes[5]) != 6) { + PMD_DRV_LOG(ERR, + "invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (!is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + PMD_DRV_LOG(WARNING, + "interface \"%s\" (index %u) is already handled," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!vdev_netvsc_iface_is_netvsc(iface)) { + if (!specified) + return 0; + PMD_DRV_LOG(WARNING, + "interface \"%s\" (index %u) is not NetVSC," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + PMD_DRV_LOG(ERR, + "cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = vdev_netvsc_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + PMD_DRV_LOG(ERR, + "cannot allocate control pipe for interface" + " \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) + continue; + ret = -errno; + PMD_DRV_LOG(ERR, + "cannot toggle non-blocking flag on control file" + " descriptor #%u (%d): %s", + i, ctx->pipe[i], rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) + ++i; + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "fd(%d),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) + ++i; + if (i) { + ret = -ENOBUFS; + PMD_DRV_LOG(ERR, + "generated virtual device name or argument list" + " too long for interface \"%s\"", + ctx->if_name); + goto error; + } + /* + * Remove any competing rte_eth_dev entries sharing the same MAC + * address, fail-safe instances created by this PMD will handle them + * as sub-devices later. + */ + RTE_ETH_FOREACH_DEV(port_id) { + struct rte_device *dev = rte_eth_devices[port_id].device; + struct rte_bus *bus = rte_bus_find_by_device(dev); + struct ether_addr tmp; + + rte_eth_macaddr_get(port_id, &tmp); + if (!is_same_ether_addr(eth_addr, &tmp)) + continue; + PMD_DRV_LOG(WARNING, + "removing device \"%s\" with identical MAC address" + " to re-create it as a fail-safe sub-device", + dev->name); + if (!bus) + ret = -EINVAL; + else + ret = rte_eal_hotplug_remove(bus->name, dev->name); + if (ret < 0) { + PMD_DRV_LOG(ERR, "unable to remove device \"%s\": %s", + dev->name, rte_strerror(-ret)); + goto error; + } + } + /* Request virtual device generation. */ + PMD_DRV_LOG(DEBUG, + "generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); + ++vdev_netvsc_ctx_count; + PMD_DRV_LOG(DEBUG, + "added NetVSC interface \"%s\" to context list", + ctx->if_name); + return 0; +error: + if (ctx) + vdev_netvsc_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for PMD instance. * @@ -77,6 +616,10 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", vdev_netvsc_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; PMD_DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", @@ -85,6 +628,30 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) PMD_DRV_LOG(ERR, "cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + /* Gather interfaces. */ + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + PMD_DRV_LOG(WARNING, + "some of the specified parameters did not match" + " recognized network interfaces"); + /* Probe interfaces immediately. */ + vdev_netvsc_alarm(NULL); + if (ret < 0) { + PMD_DRV_LOG(ERR, "unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -95,6 +662,9 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) /** * Remove PMD instance. * + * The alarm callback and underlying vdev_netvsc context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for PMD instance. * @@ -104,7 +674,16 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) static int vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) { - --vdev_netvsc_ctx_inst; + if (--vdev_netvsc_ctx_inst) + return 0; + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); + + LIST_REMOVE(ctx, entry); + --vdev_netvsc_ctx_count; + vdev_netvsc_ctx_destroy(ctx); + } return 0; } -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v2 5/5] net/vdev_netvsc: add "force" parameter 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil ` (3 preceding siblings ...) 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 4/5] net/vdev_netvsc: implement core functionality Adrien Mazarguil @ 2017-12-22 18:01 ` Adrien Mazarguil 2017-12-23 2:06 ` [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad 6 siblings, 0 replies; 112+ messages in thread From: Adrien Mazarguil @ 2017-12-22 18:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Stephen Hemminger This parameter allows specifying any non-NetVSC interface to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- doc/guides/nics/vdev_netvsc.rst | 5 +++++ drivers/net/vdev_netvsc/vdev_netvsc.c | 27 +++++++++++++++++++-------- 2 files changed, 24 insertions(+), 8 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index 73a63e552..a0417b5ef 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -107,5 +107,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC. + Not specifying either ``iface`` or ``mac`` makes this PMD attach itself to all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 738196e75..5e426adc0 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -63,6 +63,7 @@ #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_ARG_FORCE "force" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -405,6 +406,9 @@ vdev_netvsc_alarm(__rte_unused void *arg) * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -422,6 +426,7 @@ vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -480,10 +485,11 @@ vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, if (!specified) return 0; PMD_DRV_LOG(WARNING, - "interface \"%s\" (index %u) is not NetVSC," - " skipping", - iface->if_name, iface->if_index); - return 0; + "interface \"%s\" (index %u) is not NetVSC, %s", + iface->if_name, iface->if_index, + force ? "using anyway (forced)" : "skipping"); + if (!force) + return 0; } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -610,6 +616,7 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) static const char *const vdev_netvsc_arg[] = { VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, + VDEV_NETVSC_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -618,6 +625,7 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) vdev_netvsc_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -631,14 +639,16 @@ vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || - !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, - specified, &matched); + force, specified, &matched); if (ret < 0) goto error; if (matched < specified) @@ -697,7 +707,8 @@ RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " - VDEV_NETVSC_ARG_MAC "=<string>"); + VDEV_NETVSC_ARG_MAC "=<string> " + VDEV_NETVSC_ARG_FORCE "=<int>"); /** Initialize driver log type. */ static void -- 2.11.0 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil ` (4 preceding siblings ...) 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 5/5] net/vdev_netvsc: add "force" parameter Adrien Mazarguil @ 2017-12-23 2:06 ` Stephen Hemminger 2017-12-23 14:28 ` Thomas Monjalon 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad 6 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2017-12-23 2:06 UTC (permalink / raw) To: Adrien Mazarguil; +Cc: Ferruh Yigit, dev Why does this need to be a PMD? Maybe we need some platform infrastructure? My definition of PMD is it can send and receive On Dec 22, 2017 10:01, "Adrien Mazarguil" <adrien.mazarguil@6wind.com> wrote: > Virtual machines hosted by Hyper-V/Azure platforms are fitted with > simplified virtual network devices named NetVSC that are used for fast > communication between VM to VM, VM to hypervisor, and the outside. > > They appear as standard system netdevices to user-land applications, the > main difference being they are implemented on top of VMBUS [1] instead of > emulated PCI devices. > > While this reads like a case for a standard DPDK PMD, there is more to it. > > To accelerate outside communication, NetVSC devices as they appear in a VM > can be paired with physical SR-IOV virtual function (VF) devices owned by > that same VM [2]. Both netdevices share the same MAC address in that case. > > When paired, egress and most of the ingress traffic flow through the VF > device, while part of it (e.g. multicasts, hypervisor control data) still > flows through NetVSC. Moreover VF devices are not retained and disappear > during VM migration; from a VM standpoint, they can be hot-plugged anytime > with NetVSC acting as a fallback. > > Running DPDK applications in such a context involves driving VF devices > using their dedicated PMDs in a vendor-independent fashion (to benefit from > maximum performance without writing dedicated code) while simultaneously > listening to NetVSC and handling the related hot-plug events. > > This new virtual PMD (referred to as "vdev_netvsc" from this point on) > automatically coordinates the Hyper-V/Azure-specific management part > described above by relying on vendor-specific, failsafe and tap PMDs to > expose a single consolidated Ethernet device usable directly by existing > applications. > > .------------------. > | DPDK application | > `--------+---------' > | > .------+------. > | DPDK ethdev | > `------+------' Control > | | > .------------+------------. v .-----------------. > | failsafe PMD +---------+ vdev_netvsc PMD | > `--+-------------------+--' `-----------------' > | | > | .........|......... > | : | : > .----+----. : .----+----. : > | tap PMD | : | any PMD | : > `----+----' : `----+----' : <-- Hot-pluggable > | : | : > .------+-------. : .-----+-----. : > | NetVSC-based | : | SR-IOV VF | : > | netdevice | : | device | : > `--------------' : `-----------' : > :.................: > > Note this diagram differs from that of the original RFC [3], with > vdev_netvsc no longer acting as a data plane layer. > > This initial version of the driver only works in whitelist mode. Users have > to provide the --vdev net_vdev_netvsc EAL option at least once to trigger > it. > > Subsequent work will add support for blacklist mode based on automatic > detection of the host environment. > > [1] http://dpdk.org/ml/archives/dev/2017-January/054165.html > [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ > network/overview-of-hyper-v > [3] http://dpdk.org/ml/archives/dev/2017-November/082339.html > > v2 changes: > > - Renamed driver from "hyperv" to "vdev_netvsc". This change covers > documentation and symbols prefix. > - Driver is now tagged EXPERIMENTAL. > - Replaced ether_addr_from_str() with a basic sscanf() call. > - Removed debugging code (memset() poisoning). > - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. > - Removed hyperv_basename(). > - Discarded unused variables through __rte_unused. > - Added separate but necessary free() bugfix for failsafe PMD. > - Added file descriptor input support to failsafe PMD. > - Replaced temporary bash execution; failsafe now reads device definitions > directly through a pipe without an intermediate bash one-liner. > - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). > - Added dynamic log type (pmd.vdev_netvsc). > - Modified initialization code to probe devices immediately during startup. > - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is > more > appropriate than "ret >= sizeof(foo) - 1"). > > Adrien Mazarguil (5): > net/failsafe: fix invalid free > net/failsafe: add "fd" parameter > net/vdev_netvsc: introduce Hyper-V platform driver > net/vdev_netvsc: implement core functionality > net/vdev_netvsc: add "force" parameter > > MAINTAINERS | 6 + > config/common_base | 5 + > config/common_linuxapp | 1 + > doc/guides/nics/fail_safe.rst | 9 + > doc/guides/nics/features/vdev_netvsc.ini | 12 + > doc/guides/nics/index.rst | 1 + > doc/guides/nics/vdev_netvsc.rst | 116 +++ > drivers/net/Makefile | 1 + > drivers/net/failsafe/failsafe_args.c | 88 ++- > drivers/net/failsafe/failsafe_private.h | 3 + > drivers/net/vdev_netvsc/Makefile | 58 ++ > .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + > drivers/net/vdev_netvsc/vdev_netvsc.c | 722 +++++++++++++++++++ > mk/rte.app.mk | 1 + > 14 files changed, 1025 insertions(+), 2 deletions(-) > create mode 100644 doc/guides/nics/features/vdev_netvsc.ini > create mode 100644 doc/guides/nics/vdev_netvsc.rst > create mode 100644 drivers/net/vdev_netvsc/Makefile > create mode 100644 drivers/net/vdev_netvsc/rte_ > pmd_vdev_netvsc_version.map > create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c > > -- > 2.11.0 > ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms 2017-12-23 2:06 ` [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger @ 2017-12-23 14:28 ` Thomas Monjalon 0 siblings, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2017-12-23 14:28 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Adrien Mazarguil, Ferruh Yigit 23/12/2017 03:06, Stephen Hemminger: > Why does this need to be a PMD? It needs to be a driver on top of buses. > Maybe we need some platform infrastructure? What would be such infrastructure? A new driver type? Something like drivers/platform/? I am not sure it is required for this driver given it is most probably only a temporary driver waiting for the NetVSC PMD and a full hotplug support in DPDK internals. I think we should create such new infrastructure only when we are sure it is needed permanently for some drivers. > My definition of PMD is it can send and receive It is the definition of an ethdev driver, yes. ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver for Hyper-V/Azure platforms 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil ` (5 preceding siblings ...) 2017-12-23 2:06 ` [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger @ 2018-01-09 14:47 ` Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free Matan Azrad ` (8 more replies) 6 siblings, 9 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual driver (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .--------------------. | failsafe PMD +---------+ vdev_netvsc driver | `--+-------------------+--' `--------------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: v2 changes(Adrien): - Renamed driver from "hyperv" to "vdev_netvsc". This change covers documentation and symbols prefix. - Driver is now tagged EXPERIMENTAL. - Replaced ether_addr_from_str() with a basic sscanf() call. - Removed debugging code (memset() poisoning). - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. - Removed hyperv_basename(). - Discarded unused variables through __rte_unused. - Added separate but necessary free() bugfix for failsafe PMD. - Added file descriptor input support to failsafe PMD. - Replaced temporary bash execution; failsafe now reads device definitions directly through a pipe without an intermediate bash one-liner. - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). - Added dynamic log type (pmd.vdev_netvsc). - Modified initialization code to probe devices immediately during startup. - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more appropriate than "ret >= sizeof(foo) - 1"). v3 changes(Matan): - Fixed clang compilation in V2. - Removed hotplug remove code from the new driver. - Supported probed sub-devices getting in fail-safe. - Added automatic probing for HyperV VM systems. - Added option to ignore the automatic probing. - Skiped routed NetVSC devices probing. - Adjusted documentation and semantics. - Replaced maintainer. Adrien Mazarguil (2): net/failsafe: fix invalid free net/failsafe: add "fd" parameter Matan Azrad (6): net/failsafe: support probed sub-devices getting net/vdev_netvsc: introduce Hyper-V platform driver net/vdev_netvsc: implement core functionality net/vdev_netvsc: skip routed netvsc probing net/vdev_netvsc: add "force" parameter net/vdev_netvsc: add automatic probing MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/fail_safe.rst | 14 + doc/guides/nics/features/vdev_netvsc.ini | 12 + doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 100 +++ drivers/net/Makefile | 1 + drivers/net/failsafe/failsafe_args.c | 88 ++- drivers/net/failsafe/failsafe_eal.c | 60 +- drivers/net/failsafe/failsafe_private.h | 3 + drivers/net/vdev_netvsc/Makefile | 31 + .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 746 +++++++++++++++++++++ mk/rte.app.mk | 1 + 15 files changed, 1051 insertions(+), 22 deletions(-) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-16 10:24 ` Gaëtan Rivet 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter Matan Azrad ` (7 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil, stable, Gaetan Rivet From: Adrien Mazarguil <adrien.mazarguil@6wind.com> rte_free() is not supposed to work with pointers returned by calloc(). Fixes: a0194d828100 ("net/failsafe: add flexible device definition") Cc: stable@dpdk.org Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> --- drivers/net/failsafe/failsafe_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index cfc83e3..ec63ac9 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -407,7 +407,7 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t i; FOREACH_SUBDEV(sdev, i, dev) { - rte_free(sdev->cmdline); + free(sdev->cmdline); sdev->cmdline = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free Matan Azrad @ 2018-01-16 10:24 ` Gaëtan Rivet 0 siblings, 0 replies; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 10:24 UTC (permalink / raw) To: Matan Azrad Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen, Adrien Mazarguil, stable Hi Matan, On Tue, Jan 09, 2018 at 02:47:26PM +0000, Matan Azrad wrote: > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > rte_free() is not supposed to work with pointers returned by calloc(). > > Fixes: a0194d828100 ("net/failsafe: add flexible device definition") > Cc: stable@dpdk.org > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> > --- > drivers/net/failsafe/failsafe_args.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c > index cfc83e3..ec63ac9 100644 > --- a/drivers/net/failsafe/failsafe_args.c > +++ b/drivers/net/failsafe/failsafe_args.c > @@ -407,7 +407,7 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > uint8_t i; > > FOREACH_SUBDEV(sdev, i, dev) { > - rte_free(sdev->cmdline); > + free(sdev->cmdline); > sdev->cmdline = NULL; > free(sdev->devargs.args); > sdev->devargs.args = NULL; > -- > 1.8.3.1 > -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-16 10:54 ` Gaëtan Rivet 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting Matan Azrad ` (6 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil, Gaetan Rivet From: Adrien Mazarguil <adrien.mazarguil@6wind.com> This parameter enables applications to provide device definitions through an arbitrary file descriptor number. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 9 ++++ drivers/net/failsafe/failsafe_args.c | 86 ++++++++++++++++++++++++++++++++- drivers/net/failsafe/failsafe_private.h | 3 ++ 3 files changed, 97 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index c4e3d2e..5b1b47e 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -106,6 +106,15 @@ Fail-safe command line parameters All commas within the ``shell command`` are replaced by spaces before executing the command. This helps using scripts to specify devices. +- **fd(<file descriptor number>)** parameter + + This parameter reads a device definition from an arbitrary file descriptor + number in ``<iface>`` format as described above. + + The file descriptor is read in non-blocking mode and is never closed in + order to take only the last line into account (unlike ``exec()``) at every + probe attempt. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index ec63ac9..7a86051 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -31,7 +31,11 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> #include <string.h> +#include <unistd.h> #include <errno.h> #include <rte_debug.h> @@ -161,6 +165,73 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } static int +fs_read_fd(struct sub_device *sdev, char *fd_str) +{ + FILE *fp = NULL; + int fd = -1; + /* store possible newline as well */ + char output[DEVARGS_MAXLEN + 1]; + int err = -ENODEV; + int ret; + + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); + if (sdev->fd_str == NULL) { + sdev->fd_str = strdup(fd_str); + if (sdev->fd_str == NULL) { + ERROR("Command line allocation failed"); + return -ENOMEM; + } + } + errno = 0; + fd = strtol(fd_str, &fd_str, 0); + if (errno || *fd_str || fd < 0) { + ERROR("Parsing FD number failed"); + goto error; + } + /* Fiddle with copy of file descriptor */ + fd = dup(fd); + if (fd == -1) + goto error; + ret = fcntl(fd, F_GETFL); + if (ret == -1) + goto error; + ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK); + if (ret == -1) + goto error; + fp = fdopen(fd, "r"); + if (!fp) + goto error; + fd = -1; + /* Only take the last line into account */ + ret = 0; + while (fgets(output, sizeof(output), fp)) + ++ret; + if (feof(fp)) { + if (!ret) + goto error; + } else if (ferror(fp)) { + if (errno != EAGAIN || !ret) + goto error; + } else if (!ret) { + goto error; + } + /* Line must end with a newline character */ + fs_sanitize_cmdline(output); + if (output[0] == '\0') + goto error; + ret = fs_parse_device(sdev, output); + if (ret) + ERROR("Parsing device '%s' failed", output); + err = ret; +error: + if (fp) + fclose(fp); + if (fd != -1) + close(fd); + return err; +} + +static int fs_parse_device_param(struct rte_eth_dev *dev, const char *param, uint8_t head) { @@ -202,6 +273,14 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } if (ret) goto free_args; + } else if (strncmp(param, "fd", 2) == 0) { + ret = fs_read_fd(sdev, args); + if (ret == -ENODEV) { + DEBUG("Reading device info from FD failed"); + ret = 0; + } + if (ret) + goto free_args; } else { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; @@ -409,6 +488,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, FOREACH_SUBDEV(sdev, i, dev) { free(sdev->cmdline); sdev->cmdline = NULL; + free(sdev->fd_str); + sdev->fd_str = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; } @@ -424,7 +505,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, param[b] != '\0') b++; if (strncmp(param, "dev", b) != 0 && - strncmp(param, "exec", b) != 0) { + strncmp(param, "exec", b) != 0 && + strncmp(param, "fd", b) != 0) { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; } @@ -463,6 +545,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, continue; if (sdev->cmdline) ret = fs_execute_cmd(sdev, sdev->cmdline); + else if (sdev->fd_str) + ret = fs_read_fd(sdev, sdev->fd_str); else ret = fs_parse_sub_device(sdev); if (ret == 0) diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index d81cc3c..a0d3675 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -48,6 +48,7 @@ #define PMD_FAILSAFE_PARAM_STRING \ "dev(<ifc>)," \ "exec(<shell command>)," \ + "fd(<fd number>)," \ "mac=mac_addr," \ "hotplug_poll=u64" \ "" @@ -111,6 +112,8 @@ struct sub_device { struct fs_stats stats_snapshot; /* Some device are defined as a command line */ char *cmdline; + /* Others are retrieved through a file descriptor */ + char *fd_str; /* fail-safe device backreference */ struct rte_eth_dev *fs_dev; /* flag calling for recollection */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-16 10:54 ` Gaëtan Rivet 2018-01-16 11:19 ` Gaëtan Rivet 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 10:54 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen, Adrien Mazarguil Hi Matam, Adrien, On Tue, Jan 09, 2018 at 02:47:27PM +0000, Matan Azrad wrote: > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > This parameter enables applications to provide device definitions through > an arbitrary file descriptor number. Ok on the principle, <snip> > @@ -161,6 +165,73 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > } > > static int > +fs_read_fd(struct sub_device *sdev, char *fd_str) > +{ > + FILE *fp = NULL; > + int fd = -1; > + /* store possible newline as well */ > + char output[DEVARGS_MAXLEN + 1]; > + int err = -ENODEV; > + int ret; ret is used as flag older, line counter and then error reporting. err should be the only variable used for reading errors from function and reporting it. It would be clearer to use descriptive names, such as "oflags" and "nl" or "lcount". I don't really care about one additional variable in this function, for the sake of expressiveness. > + > + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); > + if (sdev->fd_str == NULL) { > + sdev->fd_str = strdup(fd_str); > + if (sdev->fd_str == NULL) { > + ERROR("Command line allocation failed"); > + return -ENOMEM; > + } > + } > + errno = 0; > + fd = strtol(fd_str, &fd_str, 0); > + if (errno || *fd_str || fd < 0) { > + ERROR("Parsing FD number failed"); > + goto error; > + } > + /* Fiddle with copy of file descriptor */ > + fd = dup(fd); > + if (fd == -1) > + goto error; > + ret = fcntl(fd, F_GETFL); oflags = fcntl(...); > + if (ret == -1) > + goto error; > + ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK); err = fcntl(fd, F_SETFL, oflags | O_NONBLOCK); Using (fd | O_NONBLOCK) is probably a mistake. > + if (ret == -1) > + goto error; > + fp = fdopen(fd, "r"); > + if (!fp) > + goto error; > + fd = -1; > + /* Only take the last line into account */ > + ret = 0; > + while (fgets(output, sizeof(output), fp)) > + ++ret; lcount = 0; while (fgets(output, sizeof(output), fp)) ++lcount; > + if (feof(fp)) { > + if (!ret) > + goto error; > + } else if (ferror(fp)) { > + if (errno != EAGAIN || !ret) > + goto error; > + } else if (!ret) { > + goto error; > + } These branches seems needlessly complicated: if (lcount == 0) goto error; else if (ferror(fp) && errno != EAGAIN) goto error; > + /* Line must end with a newline character */ > + fs_sanitize_cmdline(output); > + if (output[0] == '\0') > + goto error; > + ret = fs_parse_device(sdev, output); > + if (ret) > + ERROR("Parsing device '%s' failed", output); > + err = ret; no need to use ret instead of err here? err = fs_parse_device(sdev, output); if (err) ERROR("Parsing device '%s' failed", output); Thus allowing to remove the "ret" variable completely. > +error: > + if (fp) > + fclose(fp); > + if (fd != -1) > + close(fd); > + return err; > +} > + > +static int > fs_parse_device_param(struct rte_eth_dev *dev, const char *param, > uint8_t head) > { > @@ -202,6 +273,14 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > } > if (ret) > goto free_args; > + } else if (strncmp(param, "fd", 2) == 0) { How about strncmp(param, "fd(", 3) == 0 here? I think I made a mistake for dev and exec device types, no reason at this point to reiterate for fd as well. > + ret = fs_read_fd(sdev, args); > + if (ret == -ENODEV) { > + DEBUG("Reading device info from FD failed"); > + ret = 0; > + } > + if (ret) > + goto free_args; > } else { > ERROR("Unrecognized device type: %.*s", (int)b, param); > return -EINVAL; > @@ -409,6 +488,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > FOREACH_SUBDEV(sdev, i, dev) { > free(sdev->cmdline); > sdev->cmdline = NULL; > + free(sdev->fd_str); > + sdev->fd_str = NULL; > free(sdev->devargs.args); > sdev->devargs.args = NULL; > } > @@ -424,7 +505,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > param[b] != '\0') > b++; > if (strncmp(param, "dev", b) != 0 && > - strncmp(param, "exec", b) != 0) { > + strncmp(param, "exec", b) != 0 && > + strncmp(param, "fd", b) != 0) { If the strncmp above is modified, this one should be as well for consistency. -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter 2018-01-16 10:54 ` Gaëtan Rivet @ 2018-01-16 11:19 ` Gaëtan Rivet 2018-01-16 16:17 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 11:19 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen, Adrien Mazarguil Hi again, made a mistake in reviewing, see below. On Tue, Jan 16, 2018 at 11:54:43AM +0100, Gaëtan Rivet wrote: > Hi Matam, Adrien, > > On Tue, Jan 09, 2018 at 02:47:27PM +0000, Matan Azrad wrote: > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > > > This parameter enables applications to provide device definitions through > > an arbitrary file descriptor number. > > Ok on the principle, > > <snip> > > > @@ -161,6 +165,73 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > > } > > > > static int > > +fs_read_fd(struct sub_device *sdev, char *fd_str) > > +{ > > + FILE *fp = NULL; > > + int fd = -1; > > + /* store possible newline as well */ > > + char output[DEVARGS_MAXLEN + 1]; > > + int err = -ENODEV; > > + int ret; > > ret is used as flag older, line counter and then error reporting. > err should be the only variable used for reading errors from function > and reporting it. > > It would be clearer to use descriptive names, such as "oflags" and "nl" > or "lcount". I don't really care about one additional variable in this > function, for the sake of expressiveness. > > > + > > + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); > > + if (sdev->fd_str == NULL) { > > + sdev->fd_str = strdup(fd_str); > > + if (sdev->fd_str == NULL) { > > + ERROR("Command line allocation failed"); > > + return -ENOMEM; > > + } > > + } > > + errno = 0; > > + fd = strtol(fd_str, &fd_str, 0); > > + if (errno || *fd_str || fd < 0) { > > + ERROR("Parsing FD number failed"); > > + goto error; > > + } > > + /* Fiddle with copy of file descriptor */ > > + fd = dup(fd); > > + if (fd == -1) > > + goto error; > > + ret = fcntl(fd, F_GETFL); > > oflags = fcntl(...); > > > + if (ret == -1) > > + goto error; > > + ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK); > > err = fcntl(fd, F_SETFL, oflags | O_NONBLOCK); > Using (fd | O_NONBLOCK) is probably a mistake. > This is sneaky. err is -ENODEV and would change to -1 on error, losing some meaning. > > + if (ret == -1) > > + goto error; > > + fp = fdopen(fd, "r"); > > + if (!fp) > > + goto error; > > + fd = -1; > > + /* Only take the last line into account */ > > + ret = 0; > > + while (fgets(output, sizeof(output), fp)) > > + ++ret; > > lcount = 0; > while (fgets(output, sizeof(output), fp)) > ++lcount; > > > > + if (feof(fp)) { > > + if (!ret) > > + goto error; > > + } else if (ferror(fp)) { > > + if (errno != EAGAIN || !ret) > > + goto error; > > + } else if (!ret) { > > + goto error; > > + } > > These branches seems needlessly complicated: > > if (lcount == 0) > goto error; > else if (ferror(fp) && errno != EAGAIN) > goto error; > Here err would have been set to 0 previously with the fcntl call, meaning that jumping to error would return 0 as well. I know Adrien wanted to avoid the usual ugly if (error) { err = -ENODEV; goto error; } But this kind of sneakiness is not easy to parse and maintain. If someone adds a new path of error later, this kind of subtlety *will* be lost. So between ugliness and maintainability, I choose maintainability (being the maintainer, of course). -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter 2018-01-16 11:19 ` Gaëtan Rivet @ 2018-01-16 16:17 ` Matan Azrad 0 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-16 16:17 UTC (permalink / raw) To: Gaëtan Rivet Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen, Adrien Mazarguil Hi Gaetan OK for all, will change it. From: Gaëtan Rivet, Tuesday, January 16, 2018 1:19 PM > Hi again, > > made a mistake in reviewing, see below. > > On Tue, Jan 16, 2018 at 11:54:43AM +0100, Gaëtan Rivet wrote: > > Hi Matam, Adrien, > > > > On Tue, Jan 09, 2018 at 02:47:27PM +0000, Matan Azrad wrote: > > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > > > > > This parameter enables applications to provide device definitions > > > through an arbitrary file descriptor number. > > > > Ok on the principle, > > > > <snip> > > > > > @@ -161,6 +165,73 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, > > > const char *params, } > > > > > > static int > > > +fs_read_fd(struct sub_device *sdev, char *fd_str) { > > > + FILE *fp = NULL; > > > + int fd = -1; > > > + /* store possible newline as well */ > > > + char output[DEVARGS_MAXLEN + 1]; > > > + int err = -ENODEV; > > > + int ret; > > > > ret is used as flag older, line counter and then error reporting. > > err should be the only variable used for reading errors from function > > and reporting it. > > > > It would be clearer to use descriptive names, such as "oflags" and "nl" > > or "lcount". I don't really care about one additional variable in this > > function, for the sake of expressiveness. > > > > > + > > > + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); > > > + if (sdev->fd_str == NULL) { > > > + sdev->fd_str = strdup(fd_str); > > > + if (sdev->fd_str == NULL) { > > > + ERROR("Command line allocation failed"); > > > + return -ENOMEM; > > > + } > > > + } > > > + errno = 0; > > > + fd = strtol(fd_str, &fd_str, 0); > > > + if (errno || *fd_str || fd < 0) { > > > + ERROR("Parsing FD number failed"); > > > + goto error; > > > + } > > > + /* Fiddle with copy of file descriptor */ > > > + fd = dup(fd); > > > + if (fd == -1) > > > + goto error; > > > + ret = fcntl(fd, F_GETFL); > > > > oflags = fcntl(...); > > > > > + if (ret == -1) > > > + goto error; > > > + ret = fcntl(fd, F_SETFL, fd | O_NONBLOCK); > > > > err = fcntl(fd, F_SETFL, oflags | O_NONBLOCK); Using (fd | O_NONBLOCK) > > is probably a mistake. > > > > This is sneaky. err is -ENODEV and would change to -1 on error, losing some > meaning. > > > > + if (ret == -1) > > > + goto error; > > > + fp = fdopen(fd, "r"); > > > + if (!fp) > > > + goto error; > > > + fd = -1; > > > + /* Only take the last line into account */ > > > + ret = 0; > > > + while (fgets(output, sizeof(output), fp)) > > > + ++ret; > > > > lcount = 0; > > while (fgets(output, sizeof(output), fp)) > > ++lcount; > > > > > > > + if (feof(fp)) { > > > + if (!ret) > > > + goto error; > > > + } else if (ferror(fp)) { > > > + if (errno != EAGAIN || !ret) > > > + goto error; > > > + } else if (!ret) { > > > + goto error; > > > + } > > > > These branches seems needlessly complicated: > > > > if (lcount == 0) > > goto error; > > else if (ferror(fp) && errno != EAGAIN) > > goto error; > > > > Here err would have been set to 0 previously with the fcntl call, meaning that > jumping to error would return 0 as well. > > I know Adrien wanted to avoid the usual ugly > > if (error) { > err = -ENODEV; > goto error; > } > > But this kind of sneakiness is not easy to parse and maintain. If someone > adds a new path of error later, this kind of subtlety *will* be lost. > > So between ugliness and maintainability, I choose maintainability (being the > maintainer, of course). > > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-16 11:09 ` Gaëtan Rivet 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad ` (5 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Gaetan Rivet Previous fail-safe code didn't support getting probed sub-devices and failed when it tried to probe them. Skip fail-safe sub-device probing when it already was probed. Signed-off-by: Matan Azrad <matan@mellanox.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 5 ++++ drivers/net/failsafe/failsafe_eal.c | 60 ++++++++++++++++++++++++------------- 2 files changed, 45 insertions(+), 20 deletions(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -115,6 +115,11 @@ Fail-safe command line parameters order to take only the last line into account (unlike ``exec()``) at every probe attempt. +.. note:: + + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device + as is, which means that EAL device options are taken in this case. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c index 19d26f5..7bc7453 100644 --- a/drivers/net/failsafe/failsafe_eal.c +++ b/drivers/net/failsafe/failsafe_eal.c @@ -36,39 +36,59 @@ #include "failsafe_private.h" static int +fs_get_port_by_device_name(const char *name, uint16_t *port_id) +{ + uint16_t pid; + size_t len; + + if (name == NULL) { + DEBUG("Null pointer is specified\n"); + return -EINVAL; + } + len = strlen(name); + RTE_ETH_FOREACH_DEV(pid) { + if (!strncmp(name, rte_eth_devices[pid].device->name, len)) { + *port_id = pid; + return 0; + } + } + return -ENODEV; +} + +static int fs_bus_init(struct rte_eth_dev *dev) { struct sub_device *sdev; struct rte_devargs *da; uint8_t i; - uint16_t j; + uint16_t pid; int ret; FOREACH_SUBDEV(sdev, i, dev) { if (sdev->state != DEV_PARSED) continue; da = &sdev->devargs; - ret = rte_eal_hotplug_add(da->bus->name, - da->name, - da->args); - if (ret) { - ERROR("sub_device %d probe failed %s%s%s", i, - rte_errno ? "(" : "", - rte_errno ? strerror(rte_errno) : "", - rte_errno ? ")" : ""); - continue; - } - RTE_ETH_FOREACH_DEV(j) { - if (strcmp(rte_eth_devices[j].device->name, - da->name) == 0) { - ETH(sdev) = &rte_eth_devices[j]; - break; + if (fs_get_port_by_device_name(da->name, &pid) != 0) { + ret = rte_eal_hotplug_add(da->bus->name, + da->name, + da->args); + if (ret) { + ERROR("sub_device %d probe failed %s%s%s", i, + rte_errno ? "(" : "", + rte_errno ? strerror(rte_errno) : "", + rte_errno ? ")" : ""); + continue; } + if (fs_get_port_by_device_name(da->name, &pid) != 0) { + ERROR("sub_device %d init went wrong", i); + return -ENODEV; + } + } else { + /* Take control of device probed by EAL options. */ + DEBUG("Taking control of a probed sub device" + " %d named %s", i, da->name); } - if (ETH(sdev) == NULL) { - ERROR("sub_device %d init went wrong", i); - return -ENODEV; - } + ETH(sdev) = &rte_eth_devices[pid]; SUB_ID(sdev) = i; sdev->fs_dev = dev; sdev->dev = ETH(sdev)->device; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting Matan Azrad @ 2018-01-16 11:09 ` Gaëtan Rivet 2018-01-16 12:27 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 11:09 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Matan, I'n not fond of the commit title, how about: [PATCH v3 3/8] net/failsafe: add probed etherdev capture ? On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > Previous fail-safe code didn't support getting probed sub-devices and > failed when it tried to probe them. > > Skip fail-safe sub-device probing when it already was probed. > > Signed-off-by: Matan Azrad <matan@mellanox.com> > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > --- > doc/guides/nics/fail_safe.rst | 5 ++++ > drivers/net/failsafe/failsafe_eal.c | 60 ++++++++++++++++++++++++------------- > 2 files changed, 45 insertions(+), 20 deletions(-) > > diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst > index 5b1b47e..b89e53b 100644 > --- a/doc/guides/nics/fail_safe.rst > +++ b/doc/guides/nics/fail_safe.rst > @@ -115,6 +115,11 @@ Fail-safe command line parameters > order to take only the last line into account (unlike ``exec()``) at every > probe attempt. > > +.. note:: > + > + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device > + as is, which means that EAL device options are taken in this case. > + > - **mac** parameter [MAC address] > > This parameter allows the user to set a default MAC address to the fail-safe > diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c > index 19d26f5..7bc7453 100644 > --- a/drivers/net/failsafe/failsafe_eal.c > +++ b/drivers/net/failsafe/failsafe_eal.c > @@ -36,39 +36,59 @@ > #include "failsafe_private.h" > > static int > +fs_get_port_by_device_name(const char *name, uint16_t *port_id) The naming convention for the failsafe driver is namespace_object_sub-object_action() With an ordering of objects by their scope (std, rte, failsafe, file). Also, "get" as an action is not descriptive enough. static int fs_ethdev_capture(const char *name, uint16_t *port_id); > +{ > + uint16_t pid; > + size_t len; > + > + if (name == NULL) { > + DEBUG("Null pointer is specified\n"); > + return -EINVAL; > + } > + len = strlen(name); > + RTE_ETH_FOREACH_DEV(pid) { > + if (!strncmp(name, rte_eth_devices[pid].device->name, len)) { > + *port_id = pid; > + return 0; > + } > + } > + return -ENODEV; > +} > + > +static int > fs_bus_init(struct rte_eth_dev *dev) > { > struct sub_device *sdev; > struct rte_devargs *da; > uint8_t i; > - uint16_t j; > + uint16_t pid; > int ret; > > FOREACH_SUBDEV(sdev, i, dev) { > if (sdev->state != DEV_PARSED) > continue; > da = &sdev->devargs; > - ret = rte_eal_hotplug_add(da->bus->name, > - da->name, > - da->args); > - if (ret) { > - ERROR("sub_device %d probe failed %s%s%s", i, > - rte_errno ? "(" : "", > - rte_errno ? strerror(rte_errno) : "", > - rte_errno ? ")" : ""); > - continue; > - } > - RTE_ETH_FOREACH_DEV(j) { > - if (strcmp(rte_eth_devices[j].device->name, > - da->name) == 0) { > - ETH(sdev) = &rte_eth_devices[j]; > - break; > + if (fs_get_port_by_device_name(da->name, &pid) != 0) { > + ret = rte_eal_hotplug_add(da->bus->name, > + da->name, > + da->args); > + if (ret) { > + ERROR("sub_device %d probe failed %s%s%s", i, > + rte_errno ? "(" : "", > + rte_errno ? strerror(rte_errno) : "", > + rte_errno ? ")" : ""); > + continue; > } > + if (fs_get_port_by_device_name(da->name, &pid) != 0) { > + ERROR("sub_device %d init went wrong", i); > + return -ENODEV; > + } > + } else { > + /* Take control of device probed by EAL options. */ > + DEBUG("Taking control of a probed sub device" > + " %d named %s", i, da->name); In this case, the devargs of the probed device must be copied within the sub-device definition and removed from the EAL using the proper rte_devargs API. Note that there is no rte_devargs copy function. You can use rte_devargs_parse instead, "parsing" again the original devargs into the sub-device one. It is necessary for complying with internal rte_devargs requirements (da->args being malloc-ed, at the moment, but may evolve). The rte_eal_devargs_parse function is not easy enough to use right now, you will have to build a devargs string (using snprintf) and submit it. I proposed a change this release for it but it will not make it for 18.02, that would have simplified your implementation. -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 11:09 ` Gaëtan Rivet @ 2018-01-16 12:27 ` Matan Azrad 2018-01-16 14:40 ` Gaëtan Rivet 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-16 12:27 UTC (permalink / raw) To: Gaëtan Rivet; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Gaetan From: Gaëtan Rivet, Tuesday, January 16, 2018 1:09 PM > Hi Matan, > > I'n not fond of the commit title, how about: > > [PATCH v3 3/8] net/failsafe: add probed etherdev capture > > ? > OK, no problem. > On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > > Previous fail-safe code didn't support getting probed sub-devices and > > failed when it tried to probe them. > > > > Skip fail-safe sub-device probing when it already was probed. > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > --- > > doc/guides/nics/fail_safe.rst | 5 ++++ > > drivers/net/failsafe/failsafe_eal.c | 60 > > ++++++++++++++++++++++++------------- > > 2 files changed, 45 insertions(+), 20 deletions(-) > > > > diff --git a/doc/guides/nics/fail_safe.rst > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > --- a/doc/guides/nics/fail_safe.rst > > +++ b/doc/guides/nics/fail_safe.rst > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > order to take only the last line into account (unlike ``exec()``) at every > > probe attempt. > > > > +.. note:: > > + > > + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the > device > > + as is, which means that EAL device options are taken in this case. > > + > > - **mac** parameter [MAC address] > > > > This parameter allows the user to set a default MAC address to the > > fail-safe diff --git a/drivers/net/failsafe/failsafe_eal.c > > b/drivers/net/failsafe/failsafe_eal.c > > index 19d26f5..7bc7453 100644 > > --- a/drivers/net/failsafe/failsafe_eal.c > > +++ b/drivers/net/failsafe/failsafe_eal.c > > @@ -36,39 +36,59 @@ > > #include "failsafe_private.h" > > > > static int > > +fs_get_port_by_device_name(const char *name, uint16_t *port_id) > > The naming convention for the failsafe driver is > > namespace_object_sub-object_action() > OK. > With an ordering of objects by their scope (std, rte, failsafe, file). > Also, "get" as an action is not descriptive enough. > Isn't "get by device name" descriptive? > static int > fs_ethdev_capture(const char *name, uint16_t *port_id); > You miss here the main reason why we need this function instead of using rte_eth_dev_get_port_by_name. The reason we need this function is because we want to find the device by the device name and not ethdev name. What's about fs_port_capture_by_device_name? Maybe comparing it to device->devargs->name is better, What do you think? > > +{ > > + uint16_t pid; > > + size_t len; > > + > > + if (name == NULL) { > > + DEBUG("Null pointer is specified\n"); > > + return -EINVAL; > > + } > > + len = strlen(name); > > + RTE_ETH_FOREACH_DEV(pid) { > > + if (!strncmp(name, rte_eth_devices[pid].device->name, > len)) { > > + *port_id = pid; > > + return 0; > > + } > > + } > > + return -ENODEV; > > +} > > + > > +static int > > fs_bus_init(struct rte_eth_dev *dev) > > { > > struct sub_device *sdev; > > struct rte_devargs *da; > > uint8_t i; > > - uint16_t j; > > + uint16_t pid; > > int ret; > > > > FOREACH_SUBDEV(sdev, i, dev) { > > if (sdev->state != DEV_PARSED) > > continue; > > da = &sdev->devargs; > > - ret = rte_eal_hotplug_add(da->bus->name, > > - da->name, > > - da->args); > > - if (ret) { > > - ERROR("sub_device %d probe failed %s%s%s", i, > > - rte_errno ? "(" : "", > > - rte_errno ? strerror(rte_errno) : "", > > - rte_errno ? ")" : ""); > > - continue; > > - } > > - RTE_ETH_FOREACH_DEV(j) { > > - if (strcmp(rte_eth_devices[j].device->name, > > - da->name) == 0) { > > - ETH(sdev) = &rte_eth_devices[j]; > > - break; > > + if (fs_get_port_by_device_name(da->name, &pid) != 0) { > > + ret = rte_eal_hotplug_add(da->bus->name, > > + da->name, > > + da->args); > > + if (ret) { > > + ERROR("sub_device %d probe failed > %s%s%s", i, > > + rte_errno ? "(" : "", > > + rte_errno ? strerror(rte_errno) : "", > > + rte_errno ? ")" : ""); > > + continue; > > } > > + if (fs_get_port_by_device_name(da->name, &pid) > != 0) { > > + ERROR("sub_device %d init went wrong", i); > > + return -ENODEV; > > + } > > + } else { > > + /* Take control of device probed by EAL options. */ > > + DEBUG("Taking control of a probed sub device" > > + " %d named %s", i, da->name); > > In this case, the devargs of the probed device must be copied within the sub- > device definition and removed from the EAL using the proper rte_devargs > API. > > Note that there is no rte_devargs copy function. You can use > rte_devargs_parse instead, "parsing" again the original devargs into the sub- > device one. It is necessary for complying with internal rte_devargs > requirements (da->args being malloc-ed, at the moment, but may evolve). > > The rte_eal_devargs_parse function is not easy enough to use right now, > you will have to build a devargs string (using snprintf) and submit it. > I proposed a change this release for it but it will not make it for 18.02, that > would have simplified your implementation. > Got you. You right we need to remove the created devargs in fail-safe parse level. What do you think about checking it in the parse level and avoid the new devargs creation? Also to do the copy in parse level(same method as we are doing in probe level)? > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 12:27 ` Matan Azrad @ 2018-01-16 14:40 ` Gaëtan Rivet 2018-01-16 16:15 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 14:40 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen On Tue, Jan 16, 2018 at 12:27:57PM +0000, Matan Azrad wrote: > Hi Gaetan > > From: Gaëtan Rivet, Tuesday, January 16, 2018 1:09 PM > > Hi Matan, > > > > I'n not fond of the commit title, how about: > > > > [PATCH v3 3/8] net/failsafe: add probed etherdev capture > > > > ? > > > OK, no problem. > > > On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > > > Previous fail-safe code didn't support getting probed sub-devices and > > > failed when it tried to probe them. > > > > > > Skip fail-safe sub-device probing when it already was probed. > > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > > --- > > > doc/guides/nics/fail_safe.rst | 5 ++++ > > > drivers/net/failsafe/failsafe_eal.c | 60 > > > ++++++++++++++++++++++++------------- > > > 2 files changed, 45 insertions(+), 20 deletions(-) > > > > > > diff --git a/doc/guides/nics/fail_safe.rst > > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > > --- a/doc/guides/nics/fail_safe.rst > > > +++ b/doc/guides/nics/fail_safe.rst > > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > > order to take only the last line into account (unlike ``exec()``) at every > > > probe attempt. > > > > > > +.. note:: > > > + > > > + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the > > device > > > + as is, which means that EAL device options are taken in this case. > > > + > > > - **mac** parameter [MAC address] > > > > > > This parameter allows the user to set a default MAC address to the > > > fail-safe diff --git a/drivers/net/failsafe/failsafe_eal.c > > > b/drivers/net/failsafe/failsafe_eal.c > > > index 19d26f5..7bc7453 100644 > > > --- a/drivers/net/failsafe/failsafe_eal.c > > > +++ b/drivers/net/failsafe/failsafe_eal.c > > > @@ -36,39 +36,59 @@ > > > #include "failsafe_private.h" > > > > > > static int > > > +fs_get_port_by_device_name(const char *name, uint16_t *port_id) > > > > The naming convention for the failsafe driver is > > > > namespace_object_sub-object_action() > > > OK. > > With an ordering of objects by their scope (std, rte, failsafe, file). > > Also, "get" as an action is not descriptive enough. > > > Isn't "get by device name" descriptive? The endgame is capturing a device that we know we are interested in. The device name being used for matching is an implementation detail, which should be abstracted by using a sub-function. Putting this in the name defeat the reason for using another function. > > static int > > fs_ethdev_capture(const char *name, uint16_t *port_id); > > > You miss here the main reason why we need this function instead of using rte_eth_dev_get_port_by_name. > The reason we need this function is because we want to find the device by the device name and not ethdev name. > What's about fs_port_capture_by_device_name? You are getting a port_id that is only valid for the rte_eth_devices array, by using the ethdev iterator. You are only looking for an ethdev. So it doesn't really matter whether you are using the ethdev name or the device name, in the end you are capturing an ethdev --> fs_ethdev_capture seems good for me. Now, I guess you will say that the user would need to know that they have to provide a device name that would be written in device->name. The issue here is that you have a leaky abstraction for your function, forcing this kind of consideration on your function user. So I'd go further and will ask you to change the `const char *name` to a `const rte_devargs *da` in the parameters. > Maybe comparing it to device->devargs->name is better, What do you think? > You are touching at a pretty contentious subject here :) . Identifying devices is not currently a well-defined function in DPDK. Some ports (actually, only one model: ConnectX-3) will have several ports using the same PCI slot. But even ignoring this glaring problem... As it is, the device->name for PCI will match the name given as a devargs, so functionally this should not change anything. Furthermore, you will have devices probed without any devargs. The fail-safe would thus be unable to capture non-blacklisted devices when the PCI bus is in blacklist mode. These not-blacklisted devices actually will have a full-PCI name (DomBDF format), so a simple match with the one passed in your fail-safe devargs will fail, ex: # A physical port exists at 0000:00:02.0 testpmd --vdev="net_failsafe,dev(00:02.0)" -- -i Would fail to capture the device 0000:00:02.0, as this is the name that the PCI bus would give to this device, in the absence of a user-given name. In 18.05, or 18.08 there should be an EAL function that would be able to identify a device given a specific ID string (very close to an rte_devargs). Currently, this API does not exist. You can hack your way around this for the moment, IF you really, really want: parse your devargs, get the bus, use the bus->parse() function to get a binary device representation, and compare bytes per bytes the binary representation given by your devargs and by the device->name. But this is a hack, and a pretty ugly one at that: you have no way of knowing the size taken by this binary representation, so you can restrict yourself to the vdev and PCI bus for the moment and take the larger of an rte_vdev_driver pointer and an rte_pci_addr.... { union { rte_vdev_driver *drv; struct rte_pci_addr pci_addr; } bindev1, bindev2; memset(&bindev1, 0, sizeof(bindev1)); memset(&bindev2, 0, sizeof(bindev2)); rte_eal_devargs_parse(device->name, da1); rte_eal_devargs_parse(your_devstr, da2); RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || da1->bus == rte_bus_find_by_name("vdev")); RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || da2->bus == rte_bus_find_by_name("vdev")); da1->bus->parse(da1->name, &bindev1); da1->bus->parse(da2->name, &bindev2); if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { /* found the device */ } else { /* not found */ } } So, really, really ugly. Anyway. <snip> > > > + /* Take control of device probed by EAL options. */ > > > + DEBUG("Taking control of a probed sub device" > > > + " %d named %s", i, da->name); > > > > In this case, the devargs of the probed device must be copied within the sub- > > device definition and removed from the EAL using the proper rte_devargs > > API. > > > > Note that there is no rte_devargs copy function. You can use > > rte_devargs_parse instead, "parsing" again the original devargs into the sub- > > device one. It is necessary for complying with internal rte_devargs > > requirements (da->args being malloc-ed, at the moment, but may evolve). > > > > The rte_eal_devargs_parse function is not easy enough to use right now, > > you will have to build a devargs string (using snprintf) and submit it. > > I proposed a change this release for it but it will not make it for 18.02, that > > would have simplified your implementation. > > > > Got you. You right we need to remove the created devargs in fail-safe parse level. > What do you think about checking it in the parse level and avoid the new devargs creation? > Also to do the copy in parse level(same method as we are doing in probe level)? > Not sure I follow here, but the new rte_devargs is part of the sub-device (it is not a pointer, but allocated alongside the sub_device). So keep everything here, it is the right place to deal with these things. -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 14:40 ` Gaëtan Rivet @ 2018-01-16 16:15 ` Matan Azrad 2018-01-16 16:54 ` Gaëtan Rivet 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-16 16:15 UTC (permalink / raw) To: Gaëtan Rivet; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Gaetan From: Gaëtan Rivet, Tuesday, January 16, 2018 4:41 PM > On Tue, Jan 16, 2018 at 12:27:57PM +0000, Matan Azrad wrote: > > Hi Gaetan > > > > From: Gaëtan Rivet, Tuesday, January 16, 2018 1:09 PM > > > Hi Matan, > > > > > > I'n not fond of the commit title, how about: > > > > > > [PATCH v3 3/8] net/failsafe: add probed etherdev capture > > > > > > ? > > > > > OK, no problem. > > > > > On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > > > > Previous fail-safe code didn't support getting probed sub-devices > > > > and failed when it tried to probe them. > > > > > > > > Skip fail-safe sub-device probing when it already was probed. > > > > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > > > --- > > > > doc/guides/nics/fail_safe.rst | 5 ++++ > > > > drivers/net/failsafe/failsafe_eal.c | 60 > > > > ++++++++++++++++++++++++------------- > > > > 2 files changed, 45 insertions(+), 20 deletions(-) > > > > > > > > diff --git a/doc/guides/nics/fail_safe.rst > > > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > > > --- a/doc/guides/nics/fail_safe.rst > > > > +++ b/doc/guides/nics/fail_safe.rst > > > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > > > order to take only the last line into account (unlike ``exec()``) at every > > > > probe attempt. > > > > > > > > +.. note:: > > > > + > > > > + In case of whitelist sub-device probed by EAL, fail-safe PMD > > > > + will take the > > > device > > > > + as is, which means that EAL device options are taken in this case. > > > > + > > > > - **mac** parameter [MAC address] > > > > > > > > This parameter allows the user to set a default MAC address to > > > > the fail-safe diff --git a/drivers/net/failsafe/failsafe_eal.c > > > > b/drivers/net/failsafe/failsafe_eal.c > > > > index 19d26f5..7bc7453 100644 > > > > --- a/drivers/net/failsafe/failsafe_eal.c > > > > +++ b/drivers/net/failsafe/failsafe_eal.c > > > > @@ -36,39 +36,59 @@ > > > > #include "failsafe_private.h" > > > > > > > > static int > > > > +fs_get_port_by_device_name(const char *name, uint16_t *port_id) > > > > > > The naming convention for the failsafe driver is > > > > > > namespace_object_sub-object_action() > > > > > OK. > > > With an ordering of objects by their scope (std, rte, failsafe, file). > > > Also, "get" as an action is not descriptive enough. > > > > > Isn't "get by device name" descriptive? > > The endgame is capturing a device that we know we are interested in. > The device name being used for matching is an implementation detail, which > should be abstracted by using a sub-function. > > Putting this in the name defeat the reason for using another function. > > > > static int > > > fs_ethdev_capture(const char *name, uint16_t *port_id); > > > > > You miss here the main reason why we need this function instead of using > rte_eth_dev_get_port_by_name. > > The reason we need this function is because we want to find the device by > the device name and not ethdev name. > > What's about fs_port_capture_by_device_name? > > You are getting a port_id that is only valid for the rte_eth_devices array, by > using the ethdev iterator. You are only looking for an ethdev. > > So it doesn't really matter whether you are using the ethdev name or the > device name, in the end you are capturing an ethdev > --> fs_ethdev_capture seems good for me. > I don't think so, this function doesn't take(capture) the device, just gets its ethdev port id using the device name. The function which actually captures the device is the fs_bus_init. So maybe even the "capture" name looks problematic here. The main idea of this function is just to get the port_id. > Now, I guess you will say that the user would need to know that they have to > provide a device name that would be written in device->name. The issue > here is that you have a leaky abstraction for your function, forcing this kind of > consideration on your function user. > > So I'd go further and will ask you to change the `const char *name` to a `const > rte_devargs *da` in the parameters. > > > Maybe comparing it to device->devargs->name is better, What do you > think? > > > > You are touching at a pretty contentious subject here :) . > > Identifying devices is not currently a well-defined function in DPDK. > Some ports (actually, only one model: ConnectX-3) will have several ports > using the same PCI slot. But even ignoring this glaring problem... > > As it is, the device->name for PCI will match the name given as a devargs, so > functionally this should not change anything. > > Furthermore, you will have devices probed without any devargs. The fail- > safe would thus be unable to capture non-blacklisted devices when the PCI > bus is in blacklist mode. > > These not-blacklisted devices actually will have a full-PCI name (DomBDF > format), so a simple match with the one passed in your fail-safe devargs will > fail, ex: > > # A physical port exists at 0000:00:02.0 > testpmd --vdev="net_failsafe,dev(00:02.0)" -- -i > > Would fail to capture the device 0000:00:02.0, as this is the name that the PCI > bus would give to this device, in the absence of a user-given name. > > In 18.05, or 18.08 there should be an EAL function that would be able to > identify a device given a specific ID string (very close to an rte_devargs). > Currently, this API does not exist. > > You can hack your way around this for the moment, IF you really, really > want: parse your devargs, get the bus, use the bus->parse() function to get a > binary device representation, and compare bytes per bytes the binary > representation given by your devargs and by the device->name. > > But this is a hack, and a pretty ugly one at that: you have no way of knowing > the size taken by this binary representation, so you can restrict yourself to > the vdev and PCI bus for the moment and take the larger of an > rte_vdev_driver pointer and an rte_pci_addr.... > > { > union { > rte_vdev_driver *drv; > struct rte_pci_addr pci_addr; > } bindev1, bindev2; > memset(&bindev1, 0, sizeof(bindev1)); > memset(&bindev2, 0, sizeof(bindev2)); > rte_eal_devargs_parse(device->name, da1); > rte_eal_devargs_parse(your_devstr, da2); > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > da1->bus == rte_bus_find_by_name("vdev")); > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > da2->bus == rte_bus_find_by_name("vdev")); > da1->bus->parse(da1->name, &bindev1); > da1->bus->parse(da2->name, &bindev2); > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > /* found the device */ > } else { > /* not found */ > } > } > > So, really, really ugly. Anyway. > Yes, ugly :) Thanks for this update! Will keep the comparison by device->name. > <snip> > > > > > + /* Take control of device probed by EAL options. */ > > > > + DEBUG("Taking control of a probed sub device" > > > > + " %d named %s", i, da->name); > > > > > > In this case, the devargs of the probed device must be copied within > > > the sub- device definition and removed from the EAL using the proper > > > rte_devargs API. > > > > > > Note that there is no rte_devargs copy function. You can use > > > rte_devargs_parse instead, "parsing" again the original devargs into > > > the sub- device one. It is necessary for complying with internal > > > rte_devargs requirements (da->args being malloc-ed, at the moment, > but may evolve). > > > > > > The rte_eal_devargs_parse function is not easy enough to use right > > > now, you will have to build a devargs string (using snprintf) and submit it. > > > I proposed a change this release for it but it will not make it for > > > 18.02, that would have simplified your implementation. > > > > > > > Got you. You right we need to remove the created devargs in fail-safe > parse level. > > What do you think about checking it in the parse level and avoid the new > devargs creation? > > Also to do the copy in parse level(same method as we are doing in probe > level)? > > > > Not sure I follow here, but the new rte_devargs is part of the sub-device (it is > not a pointer, but allocated alongside the sub_device). > > So keep everything here, it is the right place to deal with these things. > But it will prevent the double parsing and also saves the method: If the device already parsed - copy its devargs and continue. If the device already probed - copy the device pointer and continue. I think this is the right dealing, no? Why to deal with parse level in probe level? Just keep all the parse work to parse level and the probe work to probe level. Thanks, Matan. > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 16:15 ` Matan Azrad @ 2018-01-16 16:54 ` Gaëtan Rivet 2018-01-16 17:20 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 16:54 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen On Tue, Jan 16, 2018 at 04:15:36PM +0000, Matan Azrad wrote: > Hi Gaetan > > From: Gaëtan Rivet, Tuesday, January 16, 2018 4:41 PM > > On Tue, Jan 16, 2018 at 12:27:57PM +0000, Matan Azrad wrote: > > > Hi Gaetan > > > > > > From: Gaëtan Rivet, Tuesday, January 16, 2018 1:09 PM > > > > Hi Matan, > > > > > > > > I'n not fond of the commit title, how about: > > > > > > > > [PATCH v3 3/8] net/failsafe: add probed etherdev capture > > > > > > > > ? > > > > > > > OK, no problem. > > > > > > > On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > > > > > Previous fail-safe code didn't support getting probed sub-devices > > > > > and failed when it tried to probe them. > > > > > > > > > > Skip fail-safe sub-device probing when it already was probed. > > > > > > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > > > > --- > > > > > doc/guides/nics/fail_safe.rst | 5 ++++ > > > > > drivers/net/failsafe/failsafe_eal.c | 60 > > > > > ++++++++++++++++++++++++------------- > > > > > 2 files changed, 45 insertions(+), 20 deletions(-) > > > > > > > > > > diff --git a/doc/guides/nics/fail_safe.rst > > > > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > > > > --- a/doc/guides/nics/fail_safe.rst > > > > > +++ b/doc/guides/nics/fail_safe.rst > > > > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > > > > order to take only the last line into account (unlike ``exec()``) at every > > > > > probe attempt. > > > > > > > > > > +.. note:: > > > > > + > > > > > + In case of whitelist sub-device probed by EAL, fail-safe PMD > > > > > + will take the > > > > device > > > > > + as is, which means that EAL device options are taken in this case. > > > > > + > > > > > - **mac** parameter [MAC address] > > > > > > > > > > This parameter allows the user to set a default MAC address to > > > > > the fail-safe diff --git a/drivers/net/failsafe/failsafe_eal.c > > > > > b/drivers/net/failsafe/failsafe_eal.c > > > > > index 19d26f5..7bc7453 100644 > > > > > --- a/drivers/net/failsafe/failsafe_eal.c > > > > > +++ b/drivers/net/failsafe/failsafe_eal.c > > > > > @@ -36,39 +36,59 @@ > > > > > #include "failsafe_private.h" > > > > > > > > > > static int > > > > > +fs_get_port_by_device_name(const char *name, uint16_t *port_id) > > > > > > > > The naming convention for the failsafe driver is > > > > > > > > namespace_object_sub-object_action() > > > > > > > OK. > > > > With an ordering of objects by their scope (std, rte, failsafe, file). > > > > Also, "get" as an action is not descriptive enough. > > > > > > > Isn't "get by device name" descriptive? > > > > The endgame is capturing a device that we know we are interested in. > > The device name being used for matching is an implementation detail, which > > should be abstracted by using a sub-function. > > > > Putting this in the name defeat the reason for using another function. > > > > > > static int > > > > fs_ethdev_capture(const char *name, uint16_t *port_id); > > > > > > > You miss here the main reason why we need this function instead of using > > rte_eth_dev_get_port_by_name. > > > The reason we need this function is because we want to find the device by > > the device name and not ethdev name. > > > What's about fs_port_capture_by_device_name? > > > > You are getting a port_id that is only valid for the rte_eth_devices array, by > > using the ethdev iterator. You are only looking for an ethdev. > > > > So it doesn't really matter whether you are using the ethdev name or the > > device name, in the end you are capturing an ethdev > > --> fs_ethdev_capture seems good for me. > > > > I don't think so, this function doesn't take(capture) the device, just gets its ethdev port id using the device name. > The function which actually captures the device is the fs_bus_init. > So maybe even the "capture" name looks problematic here. > The main idea of this function is just to get the port_id. > Right :) . Call it fs_ethdev_portid_get() or fs_ethdev_find() then. > > Now, I guess you will say that the user would need to know that they have to > > provide a device name that would be written in device->name. The issue > > here is that you have a leaky abstraction for your function, forcing this kind of > > consideration on your function user. > > > > So I'd go further and will ask you to change the `const char *name` to a `const > > rte_devargs *da` in the parameters. > > > > > Maybe comparing it to device->devargs->name is better, What do you > > think? > > > > > > > You are touching at a pretty contentious subject here :) . > > > > Identifying devices is not currently a well-defined function in DPDK. > > Some ports (actually, only one model: ConnectX-3) will have several ports > > using the same PCI slot. But even ignoring this glaring problem... > > > > As it is, the device->name for PCI will match the name given as a devargs, so > > functionally this should not change anything. > > > > Furthermore, you will have devices probed without any devargs. The fail- > > safe would thus be unable to capture non-blacklisted devices when the PCI > > bus is in blacklist mode. > > > > These not-blacklisted devices actually will have a full-PCI name (DomBDF > > format), so a simple match with the one passed in your fail-safe devargs will > > fail, ex: > > > > # A physical port exists at 0000:00:02.0 > > testpmd --vdev="net_failsafe,dev(00:02.0)" -- -i > > > > Would fail to capture the device 0000:00:02.0, as this is the name that the PCI > > bus would give to this device, in the absence of a user-given name. > > > > In 18.05, or 18.08 there should be an EAL function that would be able to > > identify a device given a specific ID string (very close to an rte_devargs). > > Currently, this API does not exist. > > > > You can hack your way around this for the moment, IF you really, really > > want: parse your devargs, get the bus, use the bus->parse() function to get a > > binary device representation, and compare bytes per bytes the binary > > representation given by your devargs and by the device->name. > > > > But this is a hack, and a pretty ugly one at that: you have no way of knowing > > the size taken by this binary representation, so you can restrict yourself to > > the vdev and PCI bus for the moment and take the larger of an > > rte_vdev_driver pointer and an rte_pci_addr.... > > > > { > > union { > > rte_vdev_driver *drv; > > struct rte_pci_addr pci_addr; > > } bindev1, bindev2; > > memset(&bindev1, 0, sizeof(bindev1)); > > memset(&bindev2, 0, sizeof(bindev2)); > > rte_eal_devargs_parse(device->name, da1); > > rte_eal_devargs_parse(your_devstr, da2); > > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > > da1->bus == rte_bus_find_by_name("vdev")); > > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > > da2->bus == rte_bus_find_by_name("vdev")); > > da1->bus->parse(da1->name, &bindev1); > > da1->bus->parse(da2->name, &bindev2); > > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > > /* found the device */ > > } else { > > /* not found */ > > } > > } > > > > So, really, really ugly. Anyway. > > > Yes, ugly :) Thanks for this update! > Will keep the comparison by device->name. > Well as explained, above, the comparison by device->name only works with whitelisted devices. So either implement something broken right now that you will need to update in 18.05, or implement it properly in 18.05 from the get go. > > <snip> > > > > > > > + /* Take control of device probed by EAL options. */ > > > > > + DEBUG("Taking control of a probed sub device" > > > > > + " %d named %s", i, da->name); > > > > > > > > In this case, the devargs of the probed device must be copied within > > > > the sub- device definition and removed from the EAL using the proper > > > > rte_devargs API. > > > > > > > > Note that there is no rte_devargs copy function. You can use > > > > rte_devargs_parse instead, "parsing" again the original devargs into > > > > the sub- device one. It is necessary for complying with internal > > > > rte_devargs requirements (da->args being malloc-ed, at the moment, > > but may evolve). > > > > > > > > The rte_eal_devargs_parse function is not easy enough to use right > > > > now, you will have to build a devargs string (using snprintf) and submit it. > > > > I proposed a change this release for it but it will not make it for > > > > 18.02, that would have simplified your implementation. > > > > > > > > > > Got you. You right we need to remove the created devargs in fail-safe > > parse level. > > > What do you think about checking it in the parse level and avoid the new > > devargs creation? > > > Also to do the copy in parse level(same method as we are doing in probe > > level)? > > > > > > > Not sure I follow here, but the new rte_devargs is part of the sub-device (it is > > not a pointer, but allocated alongside the sub_device). > > > > So keep everything here, it is the right place to deal with these things. > > > But it will prevent the double parsing and also saves the method: > If the device already parsed - copy its devargs and continue. > If the device already probed - copy the device pointer and continue. > > I think this is the right dealing, no? > Why to deal with parse level in probe level? Just keep all the parse work to parse level and the probe work to probe level. After re-reading, I think we misunderstood each other. You cannot remove the rte_devargs created during parsing: it is allocated alongside the sub_device structure. You must only remove the rte_devargs allocated by the EAL (using rte_eal_devargs_remove()). Before removing it, you must copy its content in the local sub_device rte_devargs structure. I only proposed a way to do this copy that would not deal with rte_devargs internals, as it is bound to evolve rather soon. Otherwise, no, I do not want to complicate the parsing operations, they are already too complicated and too criticals. Better to keep it all here. -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 16:54 ` Gaëtan Rivet @ 2018-01-16 17:20 ` Matan Azrad 2018-01-16 22:31 ` Gaëtan Rivet 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-16 17:20 UTC (permalink / raw) To: Gaëtan Rivet; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Gaetan From: Gaëtan Rivet, Tuesday, January 16, 2018 6:54 PM > On Tue, Jan 16, 2018 at 04:15:36PM +0000, Matan Azrad wrote: > > Hi Gaetan > > > > From: Gaëtan Rivet, Tuesday, January 16, 2018 4:41 PM > > > On Tue, Jan 16, 2018 at 12:27:57PM +0000, Matan Azrad wrote: > > > > Hi Gaetan > > > > > > > > From: Gaëtan Rivet, Tuesday, January 16, 2018 1:09 PM > > > > > Hi Matan, > > > > > > > > > > I'n not fond of the commit title, how about: > > > > > > > > > > [PATCH v3 3/8] net/failsafe: add probed etherdev capture > > > > > > > > > > ? > > > > > > > > > OK, no problem. > > > > > > > > > On Tue, Jan 09, 2018 at 02:47:28PM +0000, Matan Azrad wrote: > > > > > > Previous fail-safe code didn't support getting probed > > > > > > sub-devices and failed when it tried to probe them. > > > > > > > > > > > > Skip fail-safe sub-device probing when it already was probed. > > > > > > > > > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > > > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > > > > > --- > > > > > > doc/guides/nics/fail_safe.rst | 5 ++++ > > > > > > drivers/net/failsafe/failsafe_eal.c | 60 > > > > > > ++++++++++++++++++++++++------------- > > > > > > 2 files changed, 45 insertions(+), 20 deletions(-) > > > > > > > > > > > > diff --git a/doc/guides/nics/fail_safe.rst > > > > > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > > > > > --- a/doc/guides/nics/fail_safe.rst > > > > > > +++ b/doc/guides/nics/fail_safe.rst > > > > > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > > > > > order to take only the last line into account (unlike ``exec()``) at > every > > > > > > probe attempt. > > > > > > > > > > > > +.. note:: > > > > > > + > > > > > > + In case of whitelist sub-device probed by EAL, fail-safe > > > > > > + PMD will take the > > > > > device > > > > > > + as is, which means that EAL device options are taken in this case. > > > > > > + > > > > > > - **mac** parameter [MAC address] > > > > > > > > > > > > This parameter allows the user to set a default MAC address > > > > > > to the fail-safe diff --git > > > > > > a/drivers/net/failsafe/failsafe_eal.c > > > > > > b/drivers/net/failsafe/failsafe_eal.c > > > > > > index 19d26f5..7bc7453 100644 > > > > > > --- a/drivers/net/failsafe/failsafe_eal.c > > > > > > +++ b/drivers/net/failsafe/failsafe_eal.c > > > > > > @@ -36,39 +36,59 @@ > > > > > > #include "failsafe_private.h" > > > > > > > > > > > > static int > > > > > > +fs_get_port_by_device_name(const char *name, uint16_t > > > > > > +*port_id) > > > > > > > > > > The naming convention for the failsafe driver is > > > > > > > > > > namespace_object_sub-object_action() > > > > > > > > > OK. > > > > > With an ordering of objects by their scope (std, rte, failsafe, file). > > > > > Also, "get" as an action is not descriptive enough. > > > > > > > > > Isn't "get by device name" descriptive? > > > > > > The endgame is capturing a device that we know we are interested in. > > > The device name being used for matching is an implementation detail, > > > which should be abstracted by using a sub-function. > > > > > > Putting this in the name defeat the reason for using another function. > > > > > > > > static int > > > > > fs_ethdev_capture(const char *name, uint16_t *port_id); > > > > > > > > > You miss here the main reason why we need this function instead of > > > > using > > > rte_eth_dev_get_port_by_name. > > > > The reason we need this function is because we want to find the > > > > device by > > > the device name and not ethdev name. > > > > What's about fs_port_capture_by_device_name? > > > > > > You are getting a port_id that is only valid for the rte_eth_devices > > > array, by using the ethdev iterator. You are only looking for an ethdev. > > > > > > So it doesn't really matter whether you are using the ethdev name or > > > the device name, in the end you are capturing an ethdev > > > --> fs_ethdev_capture seems good for me. > > > > > > > I don't think so, this function doesn't take(capture) the device, just gets its > ethdev port id using the device name. > > The function which actually captures the device is the fs_bus_init. > > So maybe even the "capture" name looks problematic here. > > The main idea of this function is just to get the port_id. > > > > Right :) . Call it fs_ethdev_portid_get() or fs_ethdev_find() then. > Sure, agree with the first one. > > > Now, I guess you will say that the user would need to know that they > > > have to provide a device name that would be written in device->name. > > > The issue here is that you have a leaky abstraction for your > > > function, forcing this kind of consideration on your function user. > > > > > > So I'd go further and will ask you to change the `const char *name` > > > to a `const rte_devargs *da` in the parameters. > > > > > > > Maybe comparing it to device->devargs->name is better, What do you > > > think? > > > > > > > > > > You are touching at a pretty contentious subject here :) . > > > > > > Identifying devices is not currently a well-defined function in DPDK. > > > Some ports (actually, only one model: ConnectX-3) will have several > > > ports using the same PCI slot. But even ignoring this glaring problem... > > > > > > As it is, the device->name for PCI will match the name given as a > > > devargs, so functionally this should not change anything. > > > > > > Furthermore, you will have devices probed without any devargs. The > > > fail- safe would thus be unable to capture non-blacklisted devices > > > when the PCI bus is in blacklist mode. > > > > > > These not-blacklisted devices actually will have a full-PCI name > > > (DomBDF format), so a simple match with the one passed in your > > > fail-safe devargs will fail, ex: > > > > > > # A physical port exists at 0000:00:02.0 > > > testpmd --vdev="net_failsafe,dev(00:02.0)" -- -i > > > > > > Would fail to capture the device 0000:00:02.0, as this is the name > > > that the PCI bus would give to this device, in the absence of a user-given > name. > > > > > > In 18.05, or 18.08 there should be an EAL function that would be > > > able to identify a device given a specific ID string (very close to an > rte_devargs). > > > Currently, this API does not exist. > > > > > > You can hack your way around this for the moment, IF you really, > > > really > > > want: parse your devargs, get the bus, use the bus->parse() function > > > to get a binary device representation, and compare bytes per bytes > > > the binary representation given by your devargs and by the device- > >name. > > > > > > But this is a hack, and a pretty ugly one at that: you have no way > > > of knowing the size taken by this binary representation, so you can > > > restrict yourself to the vdev and PCI bus for the moment and take > > > the larger of an rte_vdev_driver pointer and an rte_pci_addr.... > > > > > > { > > > union { > > > rte_vdev_driver *drv; > > > struct rte_pci_addr pci_addr; > > > } bindev1, bindev2; > > > memset(&bindev1, 0, sizeof(bindev1)); > > > memset(&bindev2, 0, sizeof(bindev2)); > > > rte_eal_devargs_parse(device->name, da1); > > > rte_eal_devargs_parse(your_devstr, da2); > > > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > > > da1->bus == rte_bus_find_by_name("vdev")); > > > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > > > da2->bus == rte_bus_find_by_name("vdev")); > > > da1->bus->parse(da1->name, &bindev1); > > > da1->bus->parse(da2->name, &bindev2); > > > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > > > /* found the device */ > > > } else { > > > /* not found */ > > > } > > > } > > > > > > So, really, really ugly. Anyway. > > > > > Yes, ugly :) Thanks for this update! > > Will keep the comparison by device->name. > > > > Well as explained, above, the comparison by device->name only works with > whitelisted devices. > > So either implement something broken right now that you will need to > update in 18.05, or implement it properly in 18.05 from the get go. > For the current needs it is enough. We can also say that it is the user responsibility to pass to failsafe the same names and same args as he passes for EAL(or default EAL names). I think I emphasized it in documentation. > > > <snip> > > > > > > > > > + /* Take control of device probed by EAL > options. */ > > > > > > + DEBUG("Taking control of a probed sub > device" > > > > > > + " %d named %s", i, da->name); > > > > > > > > > > In this case, the devargs of the probed device must be copied > > > > > within the sub- device definition and removed from the EAL using > > > > > the proper rte_devargs API. > > > > > > > > > > Note that there is no rte_devargs copy function. You can use > > > > > rte_devargs_parse instead, "parsing" again the original devargs > > > > > into the sub- device one. It is necessary for complying with > > > > > internal rte_devargs requirements (da->args being malloc-ed, at > > > > > the moment, > > > but may evolve). > > > > > > > > > > The rte_eal_devargs_parse function is not easy enough to use > > > > > right now, you will have to build a devargs string (using snprintf) and > submit it. > > > > > I proposed a change this release for it but it will not make it > > > > > for 18.02, that would have simplified your implementation. > > > > > > > > > > > > > Got you. You right we need to remove the created devargs in > > > > fail-safe > > > parse level. > > > > What do you think about checking it in the parse level and avoid > > > > the new > > > devargs creation? > > > > Also to do the copy in parse level(same method as we are doing in > > > > probe > > > level)? > > > > > > > > > > Not sure I follow here, but the new rte_devargs is part of the > > > sub-device (it is not a pointer, but allocated alongside the sub_device). > > > > > > So keep everything here, it is the right place to deal with these things. > > > > > But it will prevent the double parsing and also saves the method: > > If the device already parsed - copy its devargs and continue. > > If the device already probed - copy the device pointer and continue. > > > > I think this is the right dealing, no? > > Why to deal with parse level in probe level? Just keep all the parse work to > parse level and the probe work to probe level. > > After re-reading, I think we misunderstood each other. > You cannot remove the rte_devargs created during parsing: it is allocated > alongside the sub_device structure. > > You must only remove the rte_devargs allocated by the EAL (using > rte_eal_devargs_remove()). > Sure. > Before removing it, you must copy its content in the local sub_device > rte_devargs structure. I only proposed a way to do this copy that would not > deal with rte_devargs internals, as it is bound to evolve rather soon. > Yes. > Otherwise, no, I do not want to complicate the parsing operations, they are > already too complicated and too criticals. Better to keep it all here. I think fs_parse_device function is not complicated and it is the natural place for devargs games. For me this is the right place for the copy & remove devargs. Are you insisting to put all in fs_bus_init? > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 17:20 ` Matan Azrad @ 2018-01-16 22:31 ` Gaëtan Rivet 2018-01-17 8:40 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-16 22:31 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Matan, On Tue, Jan 16, 2018 at 05:20:27PM +0000, Matan Azrad wrote: > Hi Gaetan > <snip> > > > > In 18.05, or 18.08 there should be an EAL function that would be > > > > able to identify a device given a specific ID string (very close to an > > rte_devargs). > > > > Currently, this API does not exist. > > > > > > > > You can hack your way around this for the moment, IF you really, > > > > really > > > > want: parse your devargs, get the bus, use the bus->parse() function > > > > to get a binary device representation, and compare bytes per bytes > > > > the binary representation given by your devargs and by the device- > > >name. > > > > > > > > But this is a hack, and a pretty ugly one at that: you have no way > > > > of knowing the size taken by this binary representation, so you can > > > > restrict yourself to the vdev and PCI bus for the moment and take > > > > the larger of an rte_vdev_driver pointer and an rte_pci_addr.... > > > > > > > > { > > > > union { > > > > rte_vdev_driver *drv; > > > > struct rte_pci_addr pci_addr; > > > > } bindev1, bindev2; > > > > memset(&bindev1, 0, sizeof(bindev1)); > > > > memset(&bindev2, 0, sizeof(bindev2)); > > > > rte_eal_devargs_parse(device->name, da1); > > > > rte_eal_devargs_parse(your_devstr, da2); > > > > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > > > > da1->bus == rte_bus_find_by_name("vdev")); > > > > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > > > > da2->bus == rte_bus_find_by_name("vdev")); > > > > da1->bus->parse(da1->name, &bindev1); > > > > da1->bus->parse(da2->name, &bindev2); > > > > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > > > > /* found the device */ > > > > } else { > > > > /* not found */ > > > > } > > > > } > > > > > > > > So, really, really ugly. Anyway. > > > > > > > Yes, ugly :) Thanks for this update! > > > Will keep the comparison by device->name. > > > > > > > Well as explained, above, the comparison by device->name only works with > > whitelisted devices. > > > > > So either implement something broken right now that you will need to > > update in 18.05, or implement it properly in 18.05 from the get go. > > > For the current needs it is enough. > We can also say that it is the user responsibility to pass to failsafe the same names and same args as he passes for EAL(or default EAL names). > I think I emphasized it in documentation. > Okay, as you wish. Just be aware of this limitation. I think this functionality is good and useful, but it needs to be made clean. The proper function should be available soon, then this implementaion should be cleaned up. > > > > <snip> > > > > > > > > > > > + /* Take control of device probed by EAL > > options. */ > > > > > > > + DEBUG("Taking control of a probed sub > > device" > > > > > > > + " %d named %s", i, da->name); > > > > > > > > > > > > In this case, the devargs of the probed device must be copied > > > > > > within the sub- device definition and removed from the EAL using > > > > > > the proper rte_devargs API. > > > > > > > > > > > > Note that there is no rte_devargs copy function. You can use > > > > > > rte_devargs_parse instead, "parsing" again the original devargs > > > > > > into the sub- device one. It is necessary for complying with > > > > > > internal rte_devargs requirements (da->args being malloc-ed, at > > > > > > the moment, > > > > but may evolve). > > > > > > > > > > > > The rte_eal_devargs_parse function is not easy enough to use > > > > > > right now, you will have to build a devargs string (using snprintf) and > > submit it. > > > > > > I proposed a change this release for it but it will not make it > > > > > > for 18.02, that would have simplified your implementation. > > > > > > > > > > > > > > > > Got you. You right we need to remove the created devargs in > > > > > fail-safe > > > > parse level. > > > > > What do you think about checking it in the parse level and avoid > > > > > the new > > > > devargs creation? > > > > > Also to do the copy in parse level(same method as we are doing in > > > > > probe > > > > level)? > > > > > > > > > > > > > Not sure I follow here, but the new rte_devargs is part of the > > > > sub-device (it is not a pointer, but allocated alongside the sub_device). > > > > > > > > So keep everything here, it is the right place to deal with these things. > > > > > > > But it will prevent the double parsing and also saves the method: > > > If the device already parsed - copy its devargs and continue. > > > If the device already probed - copy the device pointer and continue. > > > > > > I think this is the right dealing, no? > > > Why to deal with parse level in probe level? Just keep all the parse work to > > parse level and the probe work to probe level. > > > > After re-reading, I think we misunderstood each other. > > You cannot remove the rte_devargs created during parsing: it is allocated > > alongside the sub_device structure. > > > > You must only remove the rte_devargs allocated by the EAL (using > > rte_eal_devargs_remove()). > > > > Sure. > > > Before removing it, you must copy its content in the local sub_device > > rte_devargs structure. I only proposed a way to do this copy that would not > > deal with rte_devargs internals, as it is bound to evolve rather soon. > > > Yes. > > > Otherwise, no, I do not want to complicate the parsing operations, they are > > already too complicated and too criticals. Better to keep it all here. > > I think fs_parse_device function is not complicated and it is the natural place for devargs games. > For me this is the right place for the copy & remove devargs. > Are you insisting to put all in fs_bus_init? You would have to put fs_ethdev_portid_find in failsafe_args, which is mixing layers. Sorry but yes, please keep all these changes in this file. Thanks, -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting 2018-01-16 22:31 ` Gaëtan Rivet @ 2018-01-17 8:40 ` Matan Azrad 0 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-17 8:40 UTC (permalink / raw) To: Gaëtan Rivet; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Gaetan From: Gaëtan Rivet, Wednesday, January 17, 2018 12:31 AM > Hi Matan, > > On Tue, Jan 16, 2018 at 05:20:27PM +0000, Matan Azrad wrote: > > Hi Gaetan > > > > <snip> > > > > > > In 18.05, or 18.08 there should be an EAL function that would be > > > > > able to identify a device given a specific ID string (very close > > > > > to an > > > rte_devargs). > > > > > Currently, this API does not exist. > > > > > > > > > > You can hack your way around this for the moment, IF you really, > > > > > really > > > > > want: parse your devargs, get the bus, use the bus->parse() > > > > > function to get a binary device representation, and compare > > > > > bytes per bytes the binary representation given by your devargs > > > > > and by the device- > > > >name. > > > > > > > > > > But this is a hack, and a pretty ugly one at that: you have no > > > > > way of knowing the size taken by this binary representation, so > > > > > you can restrict yourself to the vdev and PCI bus for the moment > > > > > and take the larger of an rte_vdev_driver pointer and an > rte_pci_addr.... > > > > > > > > > > { > > > > > union { > > > > > rte_vdev_driver *drv; > > > > > struct rte_pci_addr pci_addr; > > > > > } bindev1, bindev2; > > > > > memset(&bindev1, 0, sizeof(bindev1)); > > > > > memset(&bindev2, 0, sizeof(bindev2)); > > > > > rte_eal_devargs_parse(device->name, da1); > > > > > rte_eal_devargs_parse(your_devstr, da2); > > > > > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > > > > > da1->bus == rte_bus_find_by_name("vdev")); > > > > > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > > > > > da2->bus == rte_bus_find_by_name("vdev")); > > > > > da1->bus->parse(da1->name, &bindev1); > > > > > da1->bus->parse(da2->name, &bindev2); > > > > > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > > > > > /* found the device */ > > > > > } else { > > > > > /* not found */ > > > > > } > > > > > } > > > > > > > > > > So, really, really ugly. Anyway. > > > > > > > > > Yes, ugly :) Thanks for this update! > > > > Will keep the comparison by device->name. > > > > > > > > > > Well as explained, above, the comparison by device->name only works > > > with whitelisted devices. > > > > > > > > So either implement something broken right now that you will need to > > > update in 18.05, or implement it properly in 18.05 from the get go. > > > > > For the current needs it is enough. > > We can also say that it is the user responsibility to pass to failsafe the same > names and same args as he passes for EAL(or default EAL names). > > I think I emphasized it in documentation. > > > > Okay, as you wish. Just be aware of this limitation. > > I think this functionality is good and useful, but it needs to be made clean. > The proper function should be available soon, then this implementaion > should be cleaned up. Sure. > > > > > > <snip> > > > > > > > > > > > > > + /* Take control of device probed by EAL > > > options. */ > > > > > > > > + DEBUG("Taking control of a probed sub > > > device" > > > > > > > > + " %d named %s", i, da->name); > > > > > > > > > > > > > > In this case, the devargs of the probed device must be > > > > > > > copied within the sub- device definition and removed from > > > > > > > the EAL using the proper rte_devargs API. > > > > > > > > > > > > > > Note that there is no rte_devargs copy function. You can use > > > > > > > rte_devargs_parse instead, "parsing" again the original > > > > > > > devargs into the sub- device one. It is necessary for > > > > > > > complying with internal rte_devargs requirements (da->args > > > > > > > being malloc-ed, at the moment, > > > > > but may evolve). > > > > > > > > > > > > > > The rte_eal_devargs_parse function is not easy enough to use > > > > > > > right now, you will have to build a devargs string (using > > > > > > > snprintf) and > > > submit it. > > > > > > > I proposed a change this release for it but it will not make > > > > > > > it for 18.02, that would have simplified your implementation. > > > > > > > > > > > > > > > > > > > Got you. You right we need to remove the created devargs in > > > > > > fail-safe > > > > > parse level. > > > > > > What do you think about checking it in the parse level and > > > > > > avoid the new > > > > > devargs creation? > > > > > > Also to do the copy in parse level(same method as we are doing > > > > > > in probe > > > > > level)? > > > > > > > > > > > > > > > > Not sure I follow here, but the new rte_devargs is part of the > > > > > sub-device (it is not a pointer, but allocated alongside the > sub_device). > > > > > > > > > > So keep everything here, it is the right place to deal with these things. > > > > > > > > > But it will prevent the double parsing and also saves the method: > > > > If the device already parsed - copy its devargs and continue. > > > > If the device already probed - copy the device pointer and continue. > > > > > > > > I think this is the right dealing, no? > > > > Why to deal with parse level in probe level? Just keep all the > > > > parse work to > > > parse level and the probe work to probe level. > > > > > > After re-reading, I think we misunderstood each other. > > > You cannot remove the rte_devargs created during parsing: it is > > > allocated alongside the sub_device structure. > > > > > > You must only remove the rte_devargs allocated by the EAL (using > > > rte_eal_devargs_remove()). > > > > > > > Sure. > > > > > Before removing it, you must copy its content in the local > > > sub_device rte_devargs structure. I only proposed a way to do this > > > copy that would not deal with rte_devargs internals, as it is bound to > evolve rather soon. > > > > > Yes. > > > > > Otherwise, no, I do not want to complicate the parsing operations, > > > they are already too complicated and too criticals. Better to keep it all > here. > > > > I think fs_parse_device function is not complicated and it is the natural > place for devargs games. > > For me this is the right place for the copy & remove devargs. > > Are you insisting to put all in fs_bus_init? > > You would have to put fs_ethdev_portid_find in failsafe_args, which is > mixing layers. Sorry but yes, please keep all these changes in this file. > OK, Thanks man! > Thanks, > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 4/8] net/vdev_netvsc: introduce Hyper-V platform driver 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (2 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality Matan Azrad ` (4 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This patch lays the groundwork for this driver (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- MAINTAINERS | 6 ++ config/common_base | 5 ++ config/common_linuxapp | 1 + doc/guides/nics/features/vdev_netvsc.ini | 12 +++ doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 20 +++++ drivers/net/Makefile | 1 + drivers/net/vdev_netvsc/Makefile | 27 ++++++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 99 ++++++++++++++++++++++ mk/rte.app.mk | 1 + 11 files changed, 177 insertions(+) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c diff --git a/MAINTAINERS b/MAINTAINERS index f0baeb4..07be8cb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -451,6 +451,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft vdev-netvsc - EXPERIMENTAL +M: Matan Azrad <matan@mellanox.com> +F: drivers/net/vdev-netvsc/ +F: doc/guides/nics/vdev-netvsc.rst +F: doc/guides/nics/features/vdev-netvsc.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index e74febe..1c6629e 100644 --- a/config/common_base +++ b/config/common_base @@ -281,6 +281,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile virtual device driver for NetVSC on Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64..e043262 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/vdev_netvsc.ini b/doc/guides/nics/features/vdev_netvsc.ini new file mode 100644 index 0000000..cfc5cb9 --- /dev/null +++ b/doc/guides/nics/features/vdev_netvsc.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'vdev_netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe9..5666046 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -64,6 +64,7 @@ Network Interface Controller Drivers szedata2 tap thunderx + vdev_netvsc virtio vhost vmxnet3 diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst new file mode 100644 index 0000000..a952908 --- /dev/null +++ b/doc/guides/nics/vdev_netvsc.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox Technologies, Ltd. + +VDEV_NETVSC driver +================== + +The VDEV_NETVSC driver (librte_pmd_vdev_netvsc) provides support for NetVSC +interfaces and associated SR-IOV virtual function (VF) devices found in +Linux virtual machines running on Microsoft Hyper-V_ (including Azure) +platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) + + Toggle compilation of this driver. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index ef09b4e..dc41ed1 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -66,6 +66,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx +DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile new file mode 100644 index 0000000..2fb059d --- /dev/null +++ b/drivers/net/vdev_netvsc/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox Technologies, Ltd. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_vdev_netvsc.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_vdev_netvsc_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map new file mode 100644 index 0000000..179140f --- /dev/null +++ b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c new file mode 100644 index 0000000..e895b32 --- /dev/null +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox Technologies, Ltd. + */ + +#include <stddef.h> + +#include <rte_bus_vdev.h> +#include <rte_common.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_ARG_IFACE "iface" +#define VDEV_NETVSC_ARG_MAC "mac" + +#define DRV_LOG(level, ...) \ + rte_log(RTE_LOG_ ## level, \ + vdev_netvsc_logtype, \ + RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +/** Driver-specific log messages type. */ +static int vdev_netvsc_logtype; + +/** Number of driver instances relying on context list. */ +static unsigned int vdev_netvsc_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const vdev_netvsc_arg[] = { + VDEV_NETVSC_ARG_IFACE, + VDEV_NETVSC_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + vdev_netvsc_arg); + + DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); + if (!kvargs) { + DRV_LOG(ERR, "cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++vdev_netvsc_ctx_inst; + return 0; +} + +/** + * Remove driver instance. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0. + */ +static int +vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) +{ + --vdev_netvsc_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver vdev_netvsc_vdev = { + .probe = vdev_netvsc_vdev_probe, + .remove = vdev_netvsc_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); +RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); +RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, + VDEV_NETVSC_ARG_IFACE "=<string> " + VDEV_NETVSC_ARG_MAC "=<string>"); + +/** Initialize driver log type. */ +RTE_INIT(vdev_netvsc_init_log) +{ + vdev_netvsc_logtype = rte_log_register("pmd.vdev_netvsc"); + if (vdev_netvsc_logtype >= 0) + rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); +} diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 6a6a745..3ae5212 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -156,6 +156,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += -lrte_pmd_sfc_efx _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += -lrte_pmd_szedata2 -lsze2 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += -lrte_pmd_tap _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += -lrte_pmd_thunderx_nicvf +_LDLIBS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += -lrte_pmd_vdev_netvsc _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (3 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-09 18:49 ` Stephen Hemminger 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad ` (3 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the vdev_netvsc driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 70 +++++ drivers/net/vdev_netvsc/Makefile | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 544 +++++++++++++++++++++++++++++++++- 3 files changed, 617 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index a952908..fde1fb8 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -12,9 +12,79 @@ platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .--------------------. + | failsafe PMD +---------+ vdev_netvsc driver | + `--+-------------------+--' `--------------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + + +This driver implementation may be temporary and should be improved or removed +either when hot-plug will be fully supported in EAL and bus drivers or when +a new NetVSC driver will be integrated. + Build options ------------- - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) Toggle compilation of this driver. + +Run-time parameters +------------------- + +To invoke this driver, applications have to explicitly provide the +``--vdev=net_vdev_netvsc`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this driver + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this driver attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile index 2fb059d..f2b2ac5 100644 --- a/drivers/net/vdev_netvsc/Makefile +++ b/drivers/net/vdev_netvsc/Makefile @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -20,6 +23,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index e895b32..3d8895b 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -3,17 +3,41 @@ * Copyright 2017 Mellanox Technologies, Ltd. */ +#include <errno.h> +#include <fcntl.h> +#include <inttypes.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> #include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -25,12 +49,490 @@ /** Driver-specific log messages type. */ static int vdev_netvsc_logtype; +/** Context structure for a vdev_netvsc instance. */ +struct vdev_netvsc_ctx { + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< ID used to generate unique names. */ + char name[64]; /**< Unique name for vdev_netvsc instance. */ + char devname[64]; /**< Fail-safe PMD instance name. */ + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Communication pipe with fail-safe instance. */ + char yield[256]; /**< Current device string used with fail-safe. */ +}; + +/** Context list is common to all driver instances. */ +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); + +/** Number of entries in context list. */ +static unsigned int vdev_netvsc_ctx_count; + /** Number of driver instances relying on context list. */ static unsigned int vdev_netvsc_ctx_inst; /** + * Destroy a vdev_netvsc context instance. + * + * @param ctx + * Context to destroy. + */ +static void +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + DRV_LOG(ERR, "cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + DRV_LOG(WARNING, "cannot retrieve information about" + " interface \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[sizeof(temp) + IF_NAMESIZE]; + FILE *f; + int ret; + int len = 0; + + ret = snprintf(path, sizeof(path), temp, iface->if_name); + if (ret == -1 || (size_t)ret >= sizeof(path)) { + rte_errno = ENOBUFS; + return 0; + } + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with vdev_netvsc context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the vdev_netvsc context and + * communicates its bus address to the fail-safe PMD instance if so. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct vdev_netvsc_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + DRV_LOG(DEBUG, + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (vdev_netvsc_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + if (strcmp(addr, "pci")) + return 0; + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance. */ + if (strcmp(addr, ctx->yield)) + DRV_LOG(DEBUG, "associating PCI device \"%s\" with NetVSC" + " interface \"%s\" (index %u)", addr, ctx->if_name, + ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + DRV_LOG(WARNING, "cannot associate PCI device name \"%s\" with" + " interface \"%s\": %s", addr, ctx->if_name, + rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as + * long as an vdev_netvsc context instance exists. + * + * @param arg + * Ignored. + */ +static void +vdev_netvsc_alarm(__rte_unused void *arg) +{ + struct vdev_netvsc_ctx *ctx; + int ret; + + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + if (ret) + break; + } + if (!vdev_netvsc_ctx_count) + return; + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a vdev_netvsc context from. + * + * This function instantiates vdev_netvsc contexts either for all NetVSC + * devices found on the system or only a subset provided as device + * arguments. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct vdev_netvsc_ctx *ctx; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { + struct ether_addr tmp; + + if (sscanf(pair->value, + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, + &tmp.addr_bytes[0], + &tmp.addr_bytes[1], + &tmp.addr_bytes[2], + &tmp.addr_bytes[3], + &tmp.addr_bytes[4], + &tmp.addr_bytes[5]) != 6) { + DRV_LOG(ERR, + "invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is already handled," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!vdev_netvsc_iface_is_netvsc(iface)) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is not NetVSC," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = vdev_netvsc_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + DRV_LOG(ERR, + "cannot allocate control pipe for interface \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) + continue; + ret = -errno; + DRV_LOG(ERR, "cannot toggle non-blocking flag on control file" + " descriptor #%u (%d): %s", i, ctx->pipe[i], + rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) + ++i; + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "fd(%d),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) + ++i; + if (i) { + ret = -ENOBUFS; + DRV_LOG(ERR, "generated virtual device name or argument list" + " too long for interface \"%s\"", ctx->if_name); + goto error; + } + /* Request virtual device generation. */ + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); + ++vdev_netvsc_ctx_count; + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", + ctx->if_name); + return 0; +error: + if (ctx) + vdev_netvsc_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for driver instance. * @@ -49,12 +551,40 @@ const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", vdev_netvsc_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); if (!kvargs) { DRV_LOG(ERR, "cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + /* Gather interfaces. */ + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + DRV_LOG(WARNING, + "some of the specified parameters did not match" + " recognized network interfaces"); + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -65,6 +595,9 @@ /** * Remove driver instance. * + * The alarm callback and underlying vdev_netvsc context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for driver instance. * @@ -74,7 +607,16 @@ static int vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) { - --vdev_netvsc_ctx_inst; + if (--vdev_netvsc_ctx_inst) + return 0; + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); + + LIST_REMOVE(ctx, entry); + --vdev_netvsc_ctx_count; + vdev_netvsc_ctx_destroy(ctx); + } return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-09 18:49 ` Stephen Hemminger 2018-01-10 15:02 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-09 18:49 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil On Tue, 9 Jan 2018 14:47:30 +0000 Matan Azrad <matan@mellanox.com> wrote: > As described in more details in the attached documentation (see patch > contents), this virtual device driver manages NetVSC interfaces in virtual > machines hosted by Hyper-V/Azure platforms. > > This driver does not manage traffic nor Ethernet devices directly; it acts > as a thin configuration layer that automatically instantiates and controls > fail-safe PMD instances combining tap and PCI sub-devices, so that each > NetVSC interface is exposed as a single consolidated port to DPDK > applications. > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > applications automatically benefit from increased throughput when present > and automatic fallback on NetVSC otherwise without interruption thanks to > fail-safe's hot-plug handling. > > Once initialized, the sole job of the vdev_netvsc driver is to regularly > scan for PCI devices to associate with NetVSC interfaces and feed their > addresses to corresponding fail-safe instances. > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Signed-off-by: Matan Azrad <matan@mellanox.com> There is also the issue of how rescind is handled, but that may be more complex than you want to deal with now. Host may rescind PCI devices for other reasons than migration. For example, if host needs to do live upgrade of PF device driver on host (or firmware); then it will rescind VF device from all guests and then restore it after upgrade. > diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile > index 2fb059d..f2b2ac5 100644 > --- a/drivers/net/vdev_netvsc/Makefile > +++ b/drivers/net/vdev_netvsc/Makefile > @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map > CFLAGS += -O3 > CFLAGS += -g > CFLAGS += -std=c11 -pedantic -Wall -Wextra > +CFLAGS += -D_XOPEN_SOURCE=600 > +CFLAGS += -D_BSD_SOURCE > +CFLAGS += -D_DEFAULT_SOURCE These are kind of a nuisance, can't it just use same CFLAGS as other code? > # Source files. > SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c > diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c > index e895b32..3d8895b 100644 > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > @@ -3,17 +3,41 @@ > * Copyright 2017 Mellanox Technologies, Ltd. > > #define VDEV_NETVSC_DRIVER net_vdev_netvsc > #define VDEV_NETVSC_ARG_IFACE "iface" > #define VDEV_NETVSC_ARG_MAC "mac" > +#define VDEV_NETVSC_PROBE_MS 1000 > + > +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > #define DRV_LOG(level, ...) \ > rte_log(RTE_LOG_ ## level, \ > @@ -25,12 +49,490 @@ > /** Driver-specific log messages type. */ > static int vdev_netvsc_logtype; > > +/** Context structure for a vdev_netvsc instance. */ > +struct vdev_netvsc_ctx { > + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ > + unsigned int id; /**< ID used to generate unique names. */ > + char name[64]; /**< Unique name for vdev_netvsc instance. */ > + char devname[64]; /**< Fail-safe PMD instance name. */ > + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ > + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ > + unsigned int if_index; /**< NetVSC netdevice index. */ > + struct ether_addr if_addr; /**< NetVSC MAC address. */ > + int pipe[2]; /**< Communication pipe with fail-safe instance. */ > + char yield[256]; /**< Current device string used with fail-safe. */ > +}; Please align comments. > +/** Context list is common to all driver instances. */ > +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = > + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); > + > +/** Number of entries in context list. */ > +static unsigned int vdev_netvsc_ctx_count; > + > /** Number of driver instances relying on context list. */ > static unsigned int vdev_netvsc_ctx_inst; > > /** > + * Destroy a vdev_netvsc context instance. > + * > + * @param ctx > + * Context to destroy. > + */ > +static void > +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) > +{ > + if (ctx->pipe[0] != -1) > + close(ctx->pipe[0]); > + if (ctx->pipe[1] != -1) > + close(ctx->pipe[1]); > + free(ctx); > +} > + > +/** > + * Iterate over system network interfaces. > + * > + * This function runs a given callback function for each netdevice found on > + * the system. > + * > + * @param func > + * Callback function pointer. List traversal is aborted when this function > + * returns a nonzero value. > + * @param ... > + * Variable parameter list passed as @p va_list to @p func. > + * > + * @return > + * 0 when the entire list is traversed successfully, a negative error code > + * in case or failure, or the nonzero value returned by @p func when list > + * traversal is aborted. > + */ > +static int > +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap), ...) > +{ > + struct if_nameindex *iface = if_nameindex(); > + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > + unsigned int i; > + int ret = 0; > + > + if (!iface) { > + ret = -ENOBUFS; > + DRV_LOG(ERR, "cannot retrieve system network interfaces"); > + goto error; > + } > + if (s == -1) { > + ret = -errno; > + DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); > + goto error; > + } > + for (i = 0; iface[i].if_name; ++i) { > + struct ifreq req; > + struct ether_addr eth_addr; > + va_list ap; > + > + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); > + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { > + DRV_LOG(WARNING, "cannot retrieve information about" > + " interface \"%s\": %s", > + req.ifr_name, rte_strerror(errno)); > + continue; > + } Skip non-ethernet interfaces where addr length != 6 > + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, > + RTE_DIM(eth_addr.addr_bytes)); > + va_start(ap, func); > + ret = func(&iface[i], ð_addr, ap); > + va_end(ap); > + if (ret) > + break; > + } > +error: > + if (s != -1) > + close(s); > + if (iface) > + if_freenameindex(iface); > + return ret; > +} > + > +/** > + * Determine if a network interface is NetVSC. > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * > + * @return > + * A nonzero value when interface is detected as NetVSC. In case of error, > + * rte_errno is updated and 0 returned. > + */ > +static int > +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) > +{ > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > + char path[sizeof(temp) + IF_NAMESIZE]; > + FILE *f; > + int ret; > + int len = 0; > + > + ret = snprintf(path, sizeof(path), temp, iface->if_name); > + if (ret == -1 || (size_t)ret >= sizeof(path)) { > + rte_errno = ENOBUFS; > + return 0; > + } > + f = fopen(path, "r"); > + if (!f) { > + rte_errno = errno; > + return 0; > + } > + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); This is different way to compare uuid, maybe use fgets() and uuid_compare? > + if (ret == EOF) > + rte_errno = errno; > + ret = len == (int)strlen(NETVSC_CLASS_ID); > + fclose(f); > + return ret; > +} > + > +/** > + * Retrieve network interface data from sysfs symbolic link. > + * > + * @param[out] buf > + * Output data buffer. > + * @param size > + * Output buffer size. > + * @param[in] if_name > + * Netdevice name. > + * @param[in] relpath > + * Symbolic link path relative to netdevice sysfs entry. > + * > + * @return > + * 0 on success, a negative error code otherwise. > + */ > +static int > +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, > + const char *relpath) > +{ > + int ret; > + > + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); > + if (ret == -1 || (size_t)ret >= size) > + return -ENOBUFS; > + ret = readlink(buf, buf, size); > + if (ret == -1) > + return -errno; > + if ((size_t)ret >= size - 1) > + return -ENOBUFS; > + buf[ret] = '\0'; > + return 0; > +} You might find it easier to look at directory. /sys/bus/vmbus/drivers/hv_netvsc/ > + > +/** > + * Probe a network interface to associate with vdev_netvsc context. > + * > + * This function determines if the network device matches the properties of > + * the NetVSC interface associated with the vdev_netvsc context and > + * communicates its bus address to the fail-safe PMD instance if so. > + * > + * It is normally used with vdev_netvsc_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - struct vdev_netvsc_ctx *ctx: > + * Context to associate network interface with. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +vdev_netvsc_device_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > + const char *addr; > + size_t len; > + int ret; > + > + /* Skip non-matching or unwanted NetVSC interfaces. */ > + if (ctx->if_index == iface->if_index) { > + if (!strcmp(ctx->if_name, iface->if_name)) > + return 0; > + DRV_LOG(DEBUG, > + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", > + ctx->if_name, ctx->if_index, iface->if_name); > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + return 0; > + } > + if (vdev_netvsc_iface_is_netvsc(iface)) > + return 0; > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > + return 0; > + /* Look for associated PCI device. */ > + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device/subsystem"); > + if (ret) > + return 0; > + addr = strrchr(buf, '/'); > + addr = addr ? addr + 1 : buf; > + if (strcmp(addr, "pci")) > + return 0; > + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, > + "device"); > + if (ret) > + return 0; > + addr = strrchr(buf, '/'); > + addr = addr ? addr + 1 : buf; > + len = strlen(addr); > + if (!len) > + return 0; > + /* Send PCI device argument to fail-safe PMD instance. */ > + if (strcmp(addr, ctx->yield)) > + DRV_LOG(DEBUG, "associating PCI device \"%s\" with NetVSC" > + " interface \"%s\" (index %u)", addr, ctx->if_name, > + ctx->if_index); > + memmove(buf, addr, len + 1); > + addr = buf; > + buf[len] = '\n'; > + ret = write(ctx->pipe[1], addr, len + 1); > + buf[len] = '\0'; > + if (ret == -1) { > + if (errno == EINTR || errno == EAGAIN) > + return 1; > + DRV_LOG(WARNING, "cannot associate PCI device name \"%s\" with" > + " interface \"%s\": %s", addr, ctx->if_name, > + rte_strerror(errno)); > + return 1; > + } > + if ((size_t)ret != len + 1) { > + /* > + * Attempt to override previous partial write, no need to > + * recover if that fails. > + */ > + ret = write(ctx->pipe[1], "\n", 1); > + (void)ret; > + return 1; > + } > + fsync(ctx->pipe[1]); > + memcpy(ctx->yield, addr, len + 1); > + return 1; > +} > + > +/** > + * Alarm callback that regularly probes system network interfaces. > + * > + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as > + * long as an vdev_netvsc context instance exists. > + * > + * @param arg > + * Ignored. > + */ > +static void > +vdev_netvsc_alarm(__rte_unused void *arg) > +{ > + struct vdev_netvsc_ctx *ctx; > + int ret; > + > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { > + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > + if (ret) > + break; > + } > + if (!vdev_netvsc_ctx_count) > + return; > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > + vdev_netvsc_alarm, NULL); > + if (ret < 0) { > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > + rte_strerror(-ret)); > + } > +} Why not use netlink uevent? > +/** > + * Probe a NetVSC interface to generate a vdev_netvsc context from. > + * > + * This function instantiates vdev_netvsc contexts either for all NetVSC > + * devices found on the system or only a subset provided as device > + * arguments. > + * > + * It is normally used with vdev_netvsc_foreach_iface(). > + * > + * @param[in] iface > + * Pointer to netdevice description structure (name and index). > + * @param[in] eth_addr > + * MAC address associated with @p iface. > + * @param ap > + * Variable arguments list comprising: > + * > + * - const char *name: > + * Name associated with current driver instance. > + * > + * - struct rte_kvargs *kvargs: > + * Device arguments provided to current driver instance. > + * > + * - unsigned int specified: > + * Number of specific netdevices provided as device arguments. > + * > + * - unsigned int *matched: > + * The number of specified netdevices matched by this function. > + * > + * @return > + * A nonzero value when interface matches, 0 otherwise or in case of > + * error. > + */ > +static int > +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, > + const struct ether_addr *eth_addr, > + va_list ap) > +{ > + const char *name = va_arg(ap, const char *); > + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); > + unsigned int specified = va_arg(ap, unsigned int); > + unsigned int *matched = va_arg(ap, unsigned int *); > + unsigned int i; > + struct vdev_netvsc_ctx *ctx; > + int ret; > + > + /* Probe all interfaces when none are specified. */ > + if (specified) { > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { > + if (!strcmp(pair->value, iface->if_name)) > + break; > + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { > + struct ether_addr tmp; > + > + if (sscanf(pair->value, > + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" > + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, > + &tmp.addr_bytes[0], > + &tmp.addr_bytes[1], > + &tmp.addr_bytes[2], > + &tmp.addr_bytes[3], > + &tmp.addr_bytes[4], > + &tmp.addr_bytes[5]) != 6) { > + DRV_LOG(ERR, > + "invalid MAC address format" > + " \"%s\"", > + pair->value); > + return -EINVAL; > + } > + if (is_same_ether_addr(eth_addr, &tmp)) > + break; > + } > + } > + if (i == kvargs->count) > + return 0; > + ++(*matched); > + } > + /* Weed out interfaces already handled. */ > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) > + if (ctx->if_index == iface->if_index) > + break; > + if (ctx) { > + if (!specified) > + return 0; > + DRV_LOG(WARNING, > + "interface \"%s\" (index %u) is already handled," > + " skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + if (!vdev_netvsc_iface_is_netvsc(iface)) { > + if (!specified) > + return 0; > + DRV_LOG(WARNING, > + "interface \"%s\" (index %u) is not NetVSC," > + " skipping", > + iface->if_name, iface->if_index); > + return 0; > + } > + /* Create interface context. */ > + ctx = calloc(1, sizeof(*ctx)); > + if (!ctx) { > + ret = -errno; > + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": %s", > + iface->if_name, rte_strerror(errno)); > + goto error; > + } > + ctx->id = vdev_netvsc_ctx_count; > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > + ctx->if_index = iface->if_index; > + ctx->if_addr = *eth_addr; > + ctx->pipe[0] = -1; > + ctx->pipe[1] = -1; > + ctx->yield[0] = '\0'; > + if (pipe(ctx->pipe) == -1) { > + ret = -errno; > + DRV_LOG(ERR, > + "cannot allocate control pipe for interface \"%s\": %s", > + ctx->if_name, rte_strerror(errno)); > + goto error; > + } > + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { > + int flf = fcntl(ctx->pipe[i], F_GETFL); > + > + if (flf != -1 && > + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) > + continue; > + ret = -errno; > + DRV_LOG(ERR, "cannot toggle non-blocking flag on control file" > + " descriptor #%u (%d): %s", i, ctx->pipe[i], > + rte_strerror(errno)); > + goto error; > + } > + /* Generate virtual device name and arguments. */ > + i = 0; > + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", > + name, ctx->id); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) > + ++i; > + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", > + ctx->name); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) > + ++i; > + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), > + "fd(%d),dev(net_tap_%s,remote=%s)", > + ctx->pipe[0], ctx->name, ctx->if_name); > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) > + ++i; > + if (i) { > + ret = -ENOBUFS; > + DRV_LOG(ERR, "generated virtual device name or argument list" > + " too long for interface \"%s\"", ctx->if_name); > + goto error; > + } > + /* Request virtual device generation. */ > + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments \"%s\"", > + ctx->devname, ctx->devargs); > + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > + if (ret) > + goto error; > + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); > + ++vdev_netvsc_ctx_count; > + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", > + ctx->if_name); > + return 0; > +error: > + if (ctx) > + vdev_netvsc_ctx_destroy(ctx); > + return ret; > +} > + > +/** > * Probe NetVSC interfaces. > * > + * This function probes system netdevices according to the specified device > + * arguments and starts a periodic alarm callback to notify the resulting > + * fail-safe PMD instances of their sub-devices whereabouts. > + * > * @param dev > * Virtual device context for driver instance. > * > @@ -49,12 +551,40 @@ > const char *args = rte_vdev_device_args(dev); > struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", > vdev_netvsc_arg); > + unsigned int specified = 0; > + unsigned int matched = 0; > + unsigned int i; > + int ret; > > DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); > if (!kvargs) { > DRV_LOG(ERR, "cannot parse arguments list"); > goto error; > } > + for (i = 0; i != kvargs->count; ++i) { > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > + > + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || > + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) > + ++specified; > + } > + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); > + /* Gather interfaces. */ > + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, > + specified, &matched); > + if (ret < 0) > + goto error; > + if (matched < specified) > + DRV_LOG(WARNING, > + "some of the specified parameters did not match" > + " recognized network interfaces"); > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > + vdev_netvsc_alarm, NULL); > + if (ret < 0) { > + DRV_LOG(ERR, "unable to schedule alarm callback: %s", > + rte_strerror(-ret)); > + goto error; > + } > error: > if (kvargs) > rte_kvargs_free(kvargs); > @@ -65,6 +595,9 @@ > /** > * Remove driver instance. > * > + * The alarm callback and underlying vdev_netvsc context instances are only > + * destroyed after the last PMD instance is removed. > + * > * @param dev > * Virtual device context for driver instance. > * > @@ -74,7 +607,16 @@ > static int > vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) > { > - --vdev_netvsc_ctx_inst; > + if (--vdev_netvsc_ctx_inst) > + return 0; > + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); > + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { > + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); > + > + LIST_REMOVE(ctx, entry); > + --vdev_netvsc_ctx_count; > + vdev_netvsc_ctx_destroy(ctx); > + } > return 0; > } > ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality 2018-01-09 18:49 ` Stephen Hemminger @ 2018-01-10 15:02 ` Matan Azrad 2018-01-17 16:51 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-10 15:02 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil Hi Stephan Thank you for this quick review, please see some comments. From: Stephen Hemminger, Tuesday, January 9, 2018 8:49 PM > On Tue, 9 Jan 2018 14:47:30 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > As described in more details in the attached documentation (see patch > > contents), this virtual device driver manages NetVSC interfaces in > > virtual machines hosted by Hyper-V/Azure platforms. > > > > This driver does not manage traffic nor Ethernet devices directly; it > > acts as a thin configuration layer that automatically instantiates and > > controls fail-safe PMD instances combining tap and PCI sub-devices, so > > that each NetVSC interface is exposed as a single consolidated port to > > DPDK applications. > > > > PCI sub-devices being hot-pluggable (e.g. during VM migration), > > applications automatically benefit from increased throughput when > > present and automatic fallback on NetVSC otherwise without > > interruption thanks to fail-safe's hot-plug handling. > > > > Once initialized, the sole job of the vdev_netvsc driver is to > > regularly scan for PCI devices to associate with NetVSC interfaces and > > feed their addresses to corresponding fail-safe instances. > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > There is also the issue of how rescind is handled, but that may be more > complex than you want to deal with now. Host may rescind PCI devices for > other reasons than migration. For example, if host needs to do live upgrade > of PF device driver on host (or firmware); then it will rescind VF device from > all guests and then restore it after upgrade. > > > diff --git a/drivers/net/vdev_netvsc/Makefile > > b/drivers/net/vdev_netvsc/Makefile > > index 2fb059d..f2b2ac5 100644 > > --- a/drivers/net/vdev_netvsc/Makefile > > +++ b/drivers/net/vdev_netvsc/Makefile > > @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map > > CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall > > -Wextra > > +CFLAGS += -D_XOPEN_SOURCE=600 > > +CFLAGS += -D_BSD_SOURCE > > +CFLAGS += -D_DEFAULT_SOURCE > > > These are kind of a nuisance, can't it just use same CFLAGS as other code? > Will check. > > # Source files. > > SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff > --git > > a/drivers/net/vdev_netvsc/vdev_netvsc.c > > b/drivers/net/vdev_netvsc/vdev_netvsc.c > > index e895b32..3d8895b 100644 > > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > > @@ -3,17 +3,41 @@ > > * Copyright 2017 Mellanox Technologies, Ltd. > > > > > #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define > > VDEV_NETVSC_ARG_IFACE "iface" > > #define VDEV_NETVSC_ARG_MAC "mac" > > +#define VDEV_NETVSC_PROBE_MS 1000 > > + > > +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > > > #define DRV_LOG(level, ...) \ > > rte_log(RTE_LOG_ ## level, \ > > @@ -25,12 +49,490 @@ > > /** Driver-specific log messages type. */ static int > > vdev_netvsc_logtype; > > > > +/** Context structure for a vdev_netvsc instance. */ struct > > +vdev_netvsc_ctx { > > + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ > > + unsigned int id; /**< ID used to generate unique names. */ > > + char name[64]; /**< Unique name for vdev_netvsc instance. */ > > + char devname[64]; /**< Fail-safe PMD instance name. */ > > + char devargs[256]; /**< Fail-safe PMD instance device arguments. */ > > + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ > > + unsigned int if_index; /**< NetVSC netdevice index. */ > > + struct ether_addr if_addr; /**< NetVSC MAC address. */ > > + int pipe[2]; /**< Communication pipe with fail-safe instance. */ > > + char yield[256]; /**< Current device string used with fail-safe. */ > > +}; > > Please align comments. > Sure. > > +/** Context list is common to all driver instances. */ static > > +LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = > > + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); > > + > > +/** Number of entries in context list. */ static unsigned int > > +vdev_netvsc_ctx_count; > > + > > /** Number of driver instances relying on context list. */ static > > unsigned int vdev_netvsc_ctx_inst; > > > > /** > > + * Destroy a vdev_netvsc context instance. > > + * > > + * @param ctx > > + * Context to destroy. > > + */ > > +static void > > +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) { > > + if (ctx->pipe[0] != -1) > > + close(ctx->pipe[0]); > > + if (ctx->pipe[1] != -1) > > + close(ctx->pipe[1]); > > + free(ctx); > > +} > > + > > +/** > > + * Iterate over system network interfaces. > > + * > > + * This function runs a given callback function for each netdevice > > +found on > > + * the system. > > + * > > + * @param func > > + * Callback function pointer. List traversal is aborted when this function > > + * returns a nonzero value. > > + * @param ... > > + * Variable parameter list passed as @p va_list to @p func. > > + * > > + * @return > > + * 0 when the entire list is traversed successfully, a negative error code > > + * in case or failure, or the nonzero value returned by @p func when list > > + * traversal is aborted. > > + */ > > +static int > > +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap), ...) > > +{ > > + struct if_nameindex *iface = if_nameindex(); > > + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); > > + unsigned int i; > > + int ret = 0; > > + > > + if (!iface) { > > + ret = -ENOBUFS; > > + DRV_LOG(ERR, "cannot retrieve system network > interfaces"); > > + goto error; > > + } > > + if (s == -1) { > > + ret = -errno; > > + DRV_LOG(ERR, "cannot open socket: %s", > rte_strerror(errno)); > > + goto error; > > + } > > + for (i = 0; iface[i].if_name; ++i) { > > + struct ifreq req; > > + struct ether_addr eth_addr; > > + va_list ap; > > + > > + strncpy(req.ifr_name, iface[i].if_name, > sizeof(req.ifr_name)); > > + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { > > + DRV_LOG(WARNING, "cannot retrieve information > about" > > + " interface \"%s\": %s", > > + req.ifr_name, rte_strerror(errno)); > > + continue; > > + } > > Skip non-ethernet interfaces where addr length != 6 > Will check. > > + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, > > + RTE_DIM(eth_addr.addr_bytes)); > > + va_start(ap, func); > > + ret = func(&iface[i], ð_addr, ap); > > + va_end(ap); > > + if (ret) > > + break; > > + } > > +error: > > + if (s != -1) > > + close(s); > > + if (iface) > > + if_freenameindex(iface); > > + return ret; > > +} > > + > > +/** > > + * Determine if a network interface is NetVSC. > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * > > + * @return > > + * A nonzero value when interface is detected as NetVSC. In case of > error, > > + * rte_errno is updated and 0 returned. > > + */ > > +static int > > +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) { > > + static const char temp[] = "/sys/class/net/%s/device/class_id"; > > + char path[sizeof(temp) + IF_NAMESIZE]; > > + FILE *f; > > + int ret; > > + int len = 0; > > + > > + ret = snprintf(path, sizeof(path), temp, iface->if_name); > > + if (ret == -1 || (size_t)ret >= sizeof(path)) { > > + rte_errno = ENOBUFS; > > + return 0; > > + } > > + f = fopen(path, "r"); > > + if (!f) { > > + rte_errno = errno; > > + return 0; > > + } > > + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); > This is different way to compare uuid, maybe use fgets() and uuid_compare? > Different and nice. I don't see a reason to replace it. > > + if (ret == EOF) > > + rte_errno = errno; > > + ret = len == (int)strlen(NETVSC_CLASS_ID); > > + fclose(f); > > + return ret; > > +} > > + > > +/** > > + * Retrieve network interface data from sysfs symbolic link. > > + * > > + * @param[out] buf > > + * Output data buffer. > > + * @param size > > + * Output buffer size. > > + * @param[in] if_name > > + * Netdevice name. > > + * @param[in] relpath > > + * Symbolic link path relative to netdevice sysfs entry. > > + * > > + * @return > > + * 0 on success, a negative error code otherwise. > > + */ > > +static int > > +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, > > + const char *relpath) > > +{ > > + int ret; > > + > > + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); > > + if (ret == -1 || (size_t)ret >= size) > > + return -ENOBUFS; > > + ret = readlink(buf, buf, size); > > + if (ret == -1) > > + return -errno; > > + if ((size_t)ret >= size - 1) > > + return -ENOBUFS; > > + buf[ret] = '\0'; > > + return 0; > > +} > > You might find it easier to look at directory. > /sys/bus/vmbus/drivers/hv_netvsc/ > This driver allows to run regular netdevice instead of NetVSC(as described in doc) for debug purpose(even in non-HyperV-VM machine ), So, It doesn't make sense. > > + > > +/** > > + * Probe a network interface to associate with vdev_netvsc context. > > + * > > + * This function determines if the network device matches the > > +properties of > > + * the NetVSC interface associated with the vdev_netvsc context and > > + * communicates its bus address to the fail-safe PMD instance if so. > > + * > > + * It is normally used with vdev_netvsc_foreach_iface(). > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * @param[in] eth_addr > > + * MAC address associated with @p iface. > > + * @param ap > > + * Variable arguments list comprising: > > + * > > + * - struct vdev_netvsc_ctx *ctx: > > + * Context to associate network interface with. > > + * > > + * @return > > + * A nonzero value when interface matches, 0 otherwise or in case of > > + * error. > > + */ > > +static int > > +vdev_netvsc_device_probe(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap) > > +{ > > + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); > > + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; > > + const char *addr; > > + size_t len; > > + int ret; > > + > > + /* Skip non-matching or unwanted NetVSC interfaces. */ > > + if (ctx->if_index == iface->if_index) { > > + if (!strcmp(ctx->if_name, iface->if_name)) > > + return 0; > > + DRV_LOG(DEBUG, > > + "NetVSC interface \"%s\" (index %u) renamed > \"%s\"", > > + ctx->if_name, ctx->if_index, iface->if_name); > > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx- > >if_name)); > > + return 0; > > + } > > + if (vdev_netvsc_iface_is_netvsc(iface)) > > + return 0; > > + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) > > + return 0; > > + /* Look for associated PCI device. */ > > + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device/subsystem"); > > + if (ret) > > + return 0; > > + addr = strrchr(buf, '/'); > > + addr = addr ? addr + 1 : buf; > > + if (strcmp(addr, "pci")) > > + return 0; > > + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, > > + "device"); > > + if (ret) > > + return 0; > > + addr = strrchr(buf, '/'); > > + addr = addr ? addr + 1 : buf; > > + len = strlen(addr); > > + if (!len) > > + return 0; > > + /* Send PCI device argument to fail-safe PMD instance. */ > > + if (strcmp(addr, ctx->yield)) > > + DRV_LOG(DEBUG, "associating PCI device \"%s\" with > NetVSC" > > + " interface \"%s\" (index %u)", addr, ctx->if_name, > > + ctx->if_index); > > + memmove(buf, addr, len + 1); > > + addr = buf; > > + buf[len] = '\n'; > > + ret = write(ctx->pipe[1], addr, len + 1); > > + buf[len] = '\0'; > > + if (ret == -1) { > > + if (errno == EINTR || errno == EAGAIN) > > + return 1; > > + DRV_LOG(WARNING, "cannot associate PCI device name > \"%s\" with" > > + " interface \"%s\": %s", addr, ctx->if_name, > > + rte_strerror(errno)); > > + return 1; > > + } > > + if ((size_t)ret != len + 1) { > > + /* > > + * Attempt to override previous partial write, no need to > > + * recover if that fails. > > + */ > > + ret = write(ctx->pipe[1], "\n", 1); > > + (void)ret; > > + return 1; > > + } > > + fsync(ctx->pipe[1]); > > + memcpy(ctx->yield, addr, len + 1); > > + return 1; > > +} > > + > > +/** > > + * Alarm callback that regularly probes system network interfaces. > > + * > > + * This callback runs at a frequency determined by > > +VDEV_NETVSC_PROBE_MS as > > + * long as an vdev_netvsc context instance exists. > > + * > > + * @param arg > > + * Ignored. > > + */ > > +static void > > +vdev_netvsc_alarm(__rte_unused void *arg) { > > + struct vdev_netvsc_ctx *ctx; > > + int ret; > > + > > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { > > + ret = > vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > > + if (ret) > > + break; > > + } > > + if (!vdev_netvsc_ctx_count) > > + return; > > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > > + vdev_netvsc_alarm, NULL); > > + if (ret < 0) { > > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > > + rte_strerror(-ret)); > > + } > > +} > > Why not use netlink uevent? As described in doc, we can improve the hotplug mechanism(here and in fail-safe) after EAL hotplug work will be done. So, maybe in next release we will change it to use uevent by EAL hotplug. > > > +/** > > + * Probe a NetVSC interface to generate a vdev_netvsc context from. > > + * > > + * This function instantiates vdev_netvsc contexts either for all > > +NetVSC > > + * devices found on the system or only a subset provided as device > > + * arguments. > > + * > > + * It is normally used with vdev_netvsc_foreach_iface(). > > + * > > + * @param[in] iface > > + * Pointer to netdevice description structure (name and index). > > + * @param[in] eth_addr > > + * MAC address associated with @p iface. > > + * @param ap > > + * Variable arguments list comprising: > > + * > > + * - const char *name: > > + * Name associated with current driver instance. > > + * > > + * - struct rte_kvargs *kvargs: > > + * Device arguments provided to current driver instance. > > + * > > + * - unsigned int specified: > > + * Number of specific netdevices provided as device arguments. > > + * > > + * - unsigned int *matched: > > + * The number of specified netdevices matched by this function. > > + * > > + * @return > > + * A nonzero value when interface matches, 0 otherwise or in case of > > + * error. > > + */ > > +static int > > +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, > > + const struct ether_addr *eth_addr, > > + va_list ap) > > +{ > > + const char *name = va_arg(ap, const char *); > > + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); > > + unsigned int specified = va_arg(ap, unsigned int); > > + unsigned int *matched = va_arg(ap, unsigned int *); > > + unsigned int i; > > + struct vdev_netvsc_ctx *ctx; > > + int ret; > > + > > + /* Probe all interfaces when none are specified. */ > > + if (specified) { > > + for (i = 0; i != kvargs->count; ++i) { > > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > > + > > + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { > > + if (!strcmp(pair->value, iface->if_name)) > > + break; > > + } else if (!strcmp(pair->key, > VDEV_NETVSC_ARG_MAC)) { > > + struct ether_addr tmp; > > + > > + if (sscanf(pair->value, > > + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 > ":" > > + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, > > + &tmp.addr_bytes[0], > > + &tmp.addr_bytes[1], > > + &tmp.addr_bytes[2], > > + &tmp.addr_bytes[3], > > + &tmp.addr_bytes[4], > > + &tmp.addr_bytes[5]) != 6) { > > + DRV_LOG(ERR, > > + "invalid MAC address format" > > + " \"%s\"", > > + pair->value); > > + return -EINVAL; > > + } > > + if (is_same_ether_addr(eth_addr, &tmp)) > > + break; > > + } > > + } > > + if (i == kvargs->count) > > + return 0; > > + ++(*matched); > > + } > > + /* Weed out interfaces already handled. */ > > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) > > + if (ctx->if_index == iface->if_index) > > + break; > > + if (ctx) { > > + if (!specified) > > + return 0; > > + DRV_LOG(WARNING, > > + "interface \"%s\" (index %u) is already handled," > > + " skipping", > > + iface->if_name, iface->if_index); > > + return 0; > > + } > > + if (!vdev_netvsc_iface_is_netvsc(iface)) { > > + if (!specified) > > + return 0; > > + DRV_LOG(WARNING, > > + "interface \"%s\" (index %u) is not NetVSC," > > + " skipping", > > + iface->if_name, iface->if_index); > > + return 0; > > + } > > + /* Create interface context. */ > > + ctx = calloc(1, sizeof(*ctx)); > > + if (!ctx) { > > + ret = -errno; > > + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": > %s", > > + iface->if_name, rte_strerror(errno)); > > + goto error; > > + } > > + ctx->id = vdev_netvsc_ctx_count; > > + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); > > + ctx->if_index = iface->if_index; > > + ctx->if_addr = *eth_addr; > > + ctx->pipe[0] = -1; > > + ctx->pipe[1] = -1; > > + ctx->yield[0] = '\0'; > > + if (pipe(ctx->pipe) == -1) { > > + ret = -errno; > > + DRV_LOG(ERR, > > + "cannot allocate control pipe for interface \"%s\": > %s", > > + ctx->if_name, rte_strerror(errno)); > > + goto error; > > + } > > + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { > > + int flf = fcntl(ctx->pipe[i], F_GETFL); > > + > > + if (flf != -1 && > > + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) > > + continue; > > + ret = -errno; > > + DRV_LOG(ERR, "cannot toggle non-blocking flag on control > file" > > + " descriptor #%u (%d): %s", i, ctx->pipe[i], > > + rte_strerror(errno)); > > + goto error; > > + } > > + /* Generate virtual device name and arguments. */ > > + i = 0; > > + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", > > + name, ctx->id); > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) > > + ++i; > > + ret = snprintf(ctx->devname, sizeof(ctx->devname), > "net_failsafe_%s", > > + ctx->name); > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) > > + ++i; > > + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), > > + "fd(%d),dev(net_tap_%s,remote=%s)", > > + ctx->pipe[0], ctx->name, ctx->if_name); > > + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) > > + ++i; > > + if (i) { > > + ret = -ENOBUFS; > > + DRV_LOG(ERR, "generated virtual device name or argument > list" > > + " too long for interface \"%s\"", ctx->if_name); > > + goto error; > > + } > > + /* Request virtual device generation. */ > > + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments > \"%s\"", > > + ctx->devname, ctx->devargs); > > + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > > + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); > > + if (ret) > > + goto error; > > + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); > > + ++vdev_netvsc_ctx_count; > > + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", > > + ctx->if_name); > > + return 0; > > +error: > > + if (ctx) > > + vdev_netvsc_ctx_destroy(ctx); > > + return ret; > > +} > > + > > +/** > > * Probe NetVSC interfaces. > > * > > + * This function probes system netdevices according to the specified > > + device > > + * arguments and starts a periodic alarm callback to notify the > > + resulting > > + * fail-safe PMD instances of their sub-devices whereabouts. > > + * > > * @param dev > > * Virtual device context for driver instance. > > * > > @@ -49,12 +551,40 @@ > > const char *args = rte_vdev_device_args(dev); > > struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", > > vdev_netvsc_arg); > > + unsigned int specified = 0; > > + unsigned int matched = 0; > > + unsigned int i; > > + int ret; > > > > DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", > name, args); > > if (!kvargs) { > > DRV_LOG(ERR, "cannot parse arguments list"); > > goto error; > > } > > + for (i = 0; i != kvargs->count; ++i) { > > + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; > > + > > + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || > > + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) > > + ++specified; > > + } > > + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); > > + /* Gather interfaces. */ > > + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, > name, kvargs, > > + specified, &matched); > > + if (ret < 0) > > + goto error; > > + if (matched < specified) > > + DRV_LOG(WARNING, > > + "some of the specified parameters did not match" > > + " recognized network interfaces"); > > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > > + vdev_netvsc_alarm, NULL); > > + if (ret < 0) { > > + DRV_LOG(ERR, "unable to schedule alarm callback: %s", > > + rte_strerror(-ret)); > > + goto error; > > + } > > error: > > if (kvargs) > > rte_kvargs_free(kvargs); > > @@ -65,6 +595,9 @@ > > /** > > * Remove driver instance. > > * > > + * The alarm callback and underlying vdev_netvsc context instances > > + are only > > + * destroyed after the last PMD instance is removed. > > + * > > * @param dev > > * Virtual device context for driver instance. > > * > > @@ -74,7 +607,16 @@ > > static int > > vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) > { > > - --vdev_netvsc_ctx_inst; > > + if (--vdev_netvsc_ctx_inst) > > + return 0; > > + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); > > + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { > > + struct vdev_netvsc_ctx *ctx = > LIST_FIRST(&vdev_netvsc_ctx_list); > > + > > + LIST_REMOVE(ctx, entry); > > + --vdev_netvsc_ctx_count; > > + vdev_netvsc_ctx_destroy(ctx); > > + } > > return 0; > > } > > ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality 2018-01-10 15:02 ` Matan Azrad @ 2018-01-17 16:51 ` Thomas Monjalon 0 siblings, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2018-01-17 16:51 UTC (permalink / raw) To: Matan Azrad, Stephen Hemminger, Ferruh Yigit; +Cc: dev, Adrien Mazarguil 10/01/2018 16:02, Matan Azrad: > From: Stephen Hemminger, Tuesday, January 9, 2018 8:49 PM > > On Tue, 9 Jan 2018 14:47:30 +0000 > > Matan Azrad <matan@mellanox.com> wrote: > > > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > > > + vdev_netvsc_alarm, NULL); > > > + if (ret < 0) { > > > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > > > + rte_strerror(-ret)); > > > + } > > > +} > > > > Why not use netlink uevent? > > As described in doc, we can improve the hotplug mechanism(here and in fail-safe) after EAL hotplug work will be done. > So, maybe in next release we will change it to use uevent by EAL hotplug. I don't see any progress here for one week. Yes it is a temporary solution waiting for hotplug event callback in EAL. Hopefully it will be possible to do such improvements in 18.05. Am I missing something else? Or can it be applied to next-net? ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (4 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-09 18:51 ` Stephen Hemminger 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad ` (2 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Raslan Darawsheh NetVSC netdevices which are already routed should not be probed because they are used for management purposes by the HyperV. prevent routed netvsc devices probing. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 2 +- drivers/net/vdev_netvsc/vdev_netvsc.c | 46 +++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -87,4 +87,4 @@ The following device parameters are supported: MAC address. Not specifying either ``iface`` or ``mac`` makes this driver attach itself to -all NetVSC interfaces found on the system. +all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 3d8895b..4295b92 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -38,6 +38,7 @@ #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -192,6 +193,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = } /** + * Determine if a network interface has a route. + * + * @param[in] name + * Network device name. + * + * @return + * A nonzero value when interface has an route. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_has_route(const char *name) +{ + FILE *fp; + int ret = 0; + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; + char *netdev; + + fp = fopen("/proc/net/route", "r"); + if (!fp) { + rte_errno = errno; + return 0; + } + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { + netdev = strtok(route, "\t"); + if (strcmp(netdev, name) == 0) { + ret = 1; + break; + } + /* Move file pointer to the next line. */ + while (strchr(route, '\n') == NULL && + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) + ; + } + fclose(fp); + return ret; +} + +/** * Retrieve network interface data from sysfs symbolic link. * * @param[out] buf @@ -453,6 +492,13 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = iface->if_name, iface->if_index); return 0; } + /* Routed NetVSC should not be probed. */ + if (vdev_netvsc_has_route(iface->if_name)) { + DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", + iface->if_name, iface->if_index); + if (!specified) + return 0; + } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); if (!ctx) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-09 18:51 ` Stephen Hemminger 2018-01-10 15:07 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-09 18:51 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Raslan Darawsheh On Tue, 9 Jan 2018 14:47:31 +0000 Matan Azrad <matan@mellanox.com> wrote: > NetVSC netdevices which are already routed should not be probed because > they are used for management purposes by the HyperV. > > prevent routed netvsc devices probing. > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > Signed-off-by: Matan Azrad <matan@mellanox.com> > --- > doc/guides/nics/vdev_netvsc.rst | 2 +- > drivers/net/vdev_netvsc/vdev_netvsc.c | 46 +++++++++++++++++++++++++++++++++++ > 2 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst > index fde1fb8..f779862 100644 > --- a/doc/guides/nics/vdev_netvsc.rst > +++ b/doc/guides/nics/vdev_netvsc.rst > @@ -87,4 +87,4 @@ The following device parameters are supported: > MAC address. > > Not specifying either ``iface`` or ``mac`` makes this driver attach itself to > -all NetVSC interfaces found on the system. > +all unrouted NetVSC interfaces found on the system. > diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c > index 3d8895b..4295b92 100644 > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > @@ -38,6 +38,7 @@ > #define VDEV_NETVSC_PROBE_MS 1000 > > #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 > > #define DRV_LOG(level, ...) \ > rte_log(RTE_LOG_ ## level, \ > @@ -192,6 +193,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = > } > > /** > + * Determine if a network interface has a route. > + * > + * @param[in] name > + * Network device name. > + * > + * @return > + * A nonzero value when interface has an route. In case of error, > + * rte_errno is updated and 0 returned. > + */ > +static int > +vdev_netvsc_has_route(const char *name) > +{ > + FILE *fp; > + int ret = 0; > + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; > + char *netdev; > + > + fp = fopen("/proc/net/route", "r"); > + if (!fp) { > + rte_errno = errno; > + return 0; > + } > + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { > + netdev = strtok(route, "\t"); > + if (strcmp(netdev, name) == 0) { > + ret = 1; > + break; > + } > + /* Move file pointer to the next line. */ > + while (strchr(route, '\n') == NULL && > + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) > + ; > + } > + fclose(fp); > + return ret; > +} In many ways /proc/net/route is legacy intervace. And system may have 1 M routes. Maybe there is faster way to do this with netlink by looking to see if there is an address associated with the interface. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-09 18:51 ` Stephen Hemminger @ 2018-01-10 15:07 ` Matan Azrad 2018-01-10 16:43 ` Stephen Hemminger 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-10 15:07 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Raslan Darawsheh Hi Stephan From: Stephen Hemminger, Tuesday, January 9, 2018 8:51 PM > To: Matan Azrad <matan@mellanox.com> > Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Thomas Monjalon > <thomas@monjalon.net>; dev@dpdk.org; Raslan Darawsheh > <rasland@mellanox.com> > Subject: Re: [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing > > On Tue, 9 Jan 2018 14:47:31 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > NetVSC netdevices which are already routed should not be probed > > because they are used for management purposes by the HyperV. > > > > prevent routed netvsc devices probing. > > > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > --- > > doc/guides/nics/vdev_netvsc.rst | 2 +- > > drivers/net/vdev_netvsc/vdev_netvsc.c | 46 > > +++++++++++++++++++++++++++++++++++ > > 2 files changed, 47 insertions(+), 1 deletion(-) > > > > diff --git a/doc/guides/nics/vdev_netvsc.rst > > b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 > > --- a/doc/guides/nics/vdev_netvsc.rst > > +++ b/doc/guides/nics/vdev_netvsc.rst > > @@ -87,4 +87,4 @@ The following device parameters are supported: > > MAC address. > > > > Not specifying either ``iface`` or ``mac`` makes this driver attach > > itself to -all NetVSC interfaces found on the system. > > +all unrouted NetVSC interfaces found on the system. > > diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c > > b/drivers/net/vdev_netvsc/vdev_netvsc.c > > index 3d8895b..4295b92 100644 > > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > > @@ -38,6 +38,7 @@ > > #define VDEV_NETVSC_PROBE_MS 1000 > > > > #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 > > > > #define DRV_LOG(level, ...) \ > > rte_log(RTE_LOG_ ## level, \ > > @@ -192,6 +193,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) > > vdev_netvsc_ctx_list = } > > > > /** > > + * Determine if a network interface has a route. > > + * > > + * @param[in] name > > + * Network device name. > > + * > > + * @return > > + * A nonzero value when interface has an route. In case of error, > > + * rte_errno is updated and 0 returned. > > + */ > > +static int > > +vdev_netvsc_has_route(const char *name) { > > + FILE *fp; > > + int ret = 0; > > + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; > > + char *netdev; > > + > > + fp = fopen("/proc/net/route", "r"); > > + if (!fp) { > > + rte_errno = errno; > > + return 0; > > + } > > + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { > > + netdev = strtok(route, "\t"); > > + if (strcmp(netdev, name) == 0) { > > + ret = 1; > > + break; > > + } > > + /* Move file pointer to the next line. */ > > + while (strchr(route, '\n') == NULL && > > + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != > NULL) > > + ; > > + } > > + fclose(fp); > > + return ret; > > +} > > In many ways /proc/net/route is legacy intervace. > And system may have 1 M routes. > > Maybe there is faster way to do this with netlink by looking to see if there is > an address associated with the interface. Actually this is control path, we don't care about performance very much. But I can get other idea here, Do you have suggestion? Thanks! ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-10 15:07 ` Matan Azrad @ 2018-01-10 16:43 ` Stephen Hemminger 2018-01-11 9:00 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-10 16:43 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Raslan Darawsheh On Wed, 10 Jan 2018 15:07:14 +0000 Matan Azrad <matan@mellanox.com> wrote: > Hi Stephan > > From: Stephen Hemminger, Tuesday, January 9, 2018 8:51 PM > > To: Matan Azrad <matan@mellanox.com> > > Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Thomas Monjalon > > <thomas@monjalon.net>; dev@dpdk.org; Raslan Darawsheh > > <rasland@mellanox.com> > > Subject: Re: [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing > > > > On Tue, 9 Jan 2018 14:47:31 +0000 > > Matan Azrad <matan@mellanox.com> wrote: > > > > > NetVSC netdevices which are already routed should not be probed > > > because they are used for management purposes by the HyperV. > > > > > > prevent routed netvsc devices probing. > > > > > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > --- > > > doc/guides/nics/vdev_netvsc.rst | 2 +- > > > drivers/net/vdev_netvsc/vdev_netvsc.c | 46 > > > +++++++++++++++++++++++++++++++++++ > > > 2 files changed, 47 insertions(+), 1 deletion(-) > > > > > > diff --git a/doc/guides/nics/vdev_netvsc.rst > > > b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 > > > --- a/doc/guides/nics/vdev_netvsc.rst > > > +++ b/doc/guides/nics/vdev_netvsc.rst > > > @@ -87,4 +87,4 @@ The following device parameters are supported: > > > MAC address. > > > > > > Not specifying either ``iface`` or ``mac`` makes this driver attach > > > itself to -all NetVSC interfaces found on the system. > > > +all unrouted NetVSC interfaces found on the system. > > > diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c > > > b/drivers/net/vdev_netvsc/vdev_netvsc.c > > > index 3d8895b..4295b92 100644 > > > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > > > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > > > @@ -38,6 +38,7 @@ > > > #define VDEV_NETVSC_PROBE_MS 1000 > > > > > > #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > > +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 > > > > > > #define DRV_LOG(level, ...) \ > > > rte_log(RTE_LOG_ ## level, \ > > > @@ -192,6 +193,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) > > > vdev_netvsc_ctx_list = } > > > > > > /** > > > + * Determine if a network interface has a route. > > > + * > > > + * @param[in] name > > > + * Network device name. > > > + * > > > + * @return > > > + * A nonzero value when interface has an route. In case of error, > > > + * rte_errno is updated and 0 returned. > > > + */ > > > +static int > > > +vdev_netvsc_has_route(const char *name) { > > > + FILE *fp; > > > + int ret = 0; > > > + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; > > > + char *netdev; > > > + > > > + fp = fopen("/proc/net/route", "r"); > > > + if (!fp) { > > > + rte_errno = errno; > > > + return 0; > > > + } > > > + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { > > > + netdev = strtok(route, "\t"); > > > + if (strcmp(netdev, name) == 0) { > > > + ret = 1; > > > + break; > > > + } > > > + /* Move file pointer to the next line. */ > > > + while (strchr(route, '\n') == NULL && > > > + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != > > NULL) > > > + ; > > > + } > > > + fclose(fp); > > > + return ret; > > > +} > > > > In many ways /proc/net/route is legacy intervace. > > And system may have 1 M routes. > > > > Maybe there is faster way to do this with netlink by looking to see if there is > > an address associated with the interface. > > Actually this is control path, we don't care about performance very much. > But I can get other idea here, Do you have suggestion? > > Thanks! > Use netlink (or ioctl) to get interface address. If interface has an IPv4 or IPv6 (not link local), then skip it. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-10 16:43 ` Stephen Hemminger @ 2018-01-11 9:00 ` Matan Azrad 2018-01-17 16:59 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-11 9:00 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Raslan Darawsheh Hi Stephan From: Stephen Hemminger, Wednesday, January 10, 2018 6:44 PM > On Wed, 10 Jan 2018 15:07:14 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > Hi Stephan > > > > From: Stephen Hemminger, Tuesday, January 9, 2018 8:51 PM > > > To: Matan Azrad <matan@mellanox.com> > > > Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Thomas Monjalon > > > <thomas@monjalon.net>; dev@dpdk.org; Raslan Darawsheh > > > <rasland@mellanox.com> > > > Subject: Re: [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc > > > probing > > > > > > On Tue, 9 Jan 2018 14:47:31 +0000 > > > Matan Azrad <matan@mellanox.com> wrote: > > > > > > > NetVSC netdevices which are already routed should not be probed > > > > because they are used for management purposes by the HyperV. > > > > > > > > prevent routed netvsc devices probing. > > > > > > > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > > > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > > > --- > > > > doc/guides/nics/vdev_netvsc.rst | 2 +- > > > > drivers/net/vdev_netvsc/vdev_netvsc.c | 46 > > > > +++++++++++++++++++++++++++++++++++ > > > > 2 files changed, 47 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/doc/guides/nics/vdev_netvsc.rst > > > > b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 > > > > --- a/doc/guides/nics/vdev_netvsc.rst > > > > +++ b/doc/guides/nics/vdev_netvsc.rst > > > > @@ -87,4 +87,4 @@ The following device parameters are supported: > > > > MAC address. > > > > > > > > Not specifying either ``iface`` or ``mac`` makes this driver > > > > attach itself to -all NetVSC interfaces found on the system. > > > > +all unrouted NetVSC interfaces found on the system. > > > > diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c > > > > b/drivers/net/vdev_netvsc/vdev_netvsc.c > > > > index 3d8895b..4295b92 100644 > > > > --- a/drivers/net/vdev_netvsc/vdev_netvsc.c > > > > +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c > > > > @@ -38,6 +38,7 @@ > > > > #define VDEV_NETVSC_PROBE_MS 1000 > > > > > > > > #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" > > > > +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 > > > > > > > > #define DRV_LOG(level, ...) \ > > > > rte_log(RTE_LOG_ ## level, \ > > > > @@ -192,6 +193,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) > > > > vdev_netvsc_ctx_list = } > > > > > > > > /** > > > > + * Determine if a network interface has a route. > > > > + * > > > > + * @param[in] name > > > > + * Network device name. > > > > + * > > > > + * @return > > > > + * A nonzero value when interface has an route. In case of error, > > > > + * rte_errno is updated and 0 returned. > > > > + */ > > > > +static int > > > > +vdev_netvsc_has_route(const char *name) { > > > > + FILE *fp; > > > > + int ret = 0; > > > > + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; > > > > + char *netdev; > > > > + > > > > + fp = fopen("/proc/net/route", "r"); > > > > + if (!fp) { > > > > + rte_errno = errno; > > > > + return 0; > > > > + } > > > > + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { > > > > + netdev = strtok(route, "\t"); > > > > + if (strcmp(netdev, name) == 0) { > > > > + ret = 1; > > > > + break; > > > > + } > > > > + /* Move file pointer to the next line. */ > > > > + while (strchr(route, '\n') == NULL && > > > > + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != > > > NULL) > > > > + ; > > > > + } > > > > + fclose(fp); > > > > + return ret; > > > > +} > > > > > > In many ways /proc/net/route is legacy intervace. > > > And system may have 1 M routes. > > > > > > Maybe there is faster way to do this with netlink by looking to see > > > if there is an address associated with the interface. > > > > Actually this is control path, we don't care about performance very much. > > But I can get other idea here, Do you have suggestion? > > > > Thanks! > > > > Use netlink (or ioctl) to get interface address. > If interface has an IPv4 or IPv6 (not link local), then skip it. As I a little bit investigated I found that IPv6 getting is problematic by ioctl. And using nelink for it, really doesn't worth the effort. So, I suggest to keep this code simple as is in spite of the optional high latency for this function, after all it is a control path. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-11 9:00 ` Matan Azrad @ 2018-01-17 16:59 ` Thomas Monjalon 0 siblings, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2018-01-17 16:59 UTC (permalink / raw) To: Matan Azrad, Stephen Hemminger, Ferruh Yigit; +Cc: dev, Raslan Darawsheh 11/01/2018 10:00, Matan Azrad: > From: Stephen Hemminger, Wednesday, January 10, 2018 6:44 PM > > > From: Stephen Hemminger, Tuesday, January 9, 2018 8:51 PM > > > > On Tue, 9 Jan 2018 14:47:31 +0000 > > > > Matan Azrad <matan@mellanox.com> wrote: > > > > > +static int > > > > > +vdev_netvsc_has_route(const char *name) { > > > > > + FILE *fp; > > > > > + int ret = 0; > > > > > + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; > > > > > + char *netdev; > > > > > + > > > > > + fp = fopen("/proc/net/route", "r"); > > > > > + if (!fp) { > > > > > + rte_errno = errno; > > > > > + return 0; > > > > > + } > > > > > + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { > > > > > + netdev = strtok(route, "\t"); > > > > > + if (strcmp(netdev, name) == 0) { > > > > > + ret = 1; > > > > > + break; > > > > > + } > > > > > + /* Move file pointer to the next line. */ > > > > > + while (strchr(route, '\n') == NULL && > > > > > + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != > > > > NULL) > > > > > + ; > > > > > + } > > > > > + fclose(fp); > > > > > + return ret; > > > > > +} > > > > > > > > In many ways /proc/net/route is legacy intervace. > > > > And system may have 1 M routes. > > > > > > > > Maybe there is faster way to do this with netlink by looking to see > > > > if there is an address associated with the interface. > > > > > > Actually this is control path, we don't care about performance very much. > > > But I can get other idea here, Do you have suggestion? > > > > > > Thanks! > > > > > > > Use netlink (or ioctl) to get interface address. > > If interface has an IPv4 or IPv6 (not link local), then skip it. > > As I a little bit investigated I found that IPv6 getting is problematic by ioctl. > And using nelink for it, really doesn't worth the effort. > So, I suggest to keep this code simple as is in spite of the optional high latency for this function, after all it is a control path. No more comment? So we are OK with this solution for now? If we see real performance issue, I guess it can be fixed later. ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 7/8] net/vdev_netvsc: add "force" parameter 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (5 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This parameter allows specifying any non-NetVSC interface or routed NetVSC interfaces to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 5 +++++ drivers/net/vdev_netvsc/vdev_netvsc.c | 30 +++++++++++++++++++----------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index f779862..3c26990 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -86,5 +86,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC or detected as routed NETVSC. + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 4295b92..301f9b6 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -35,6 +35,7 @@ #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_ARG_FORCE "force" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -413,6 +414,9 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -430,6 +434,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -484,20 +489,18 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = return 0; } if (!vdev_netvsc_iface_is_netvsc(iface)) { - if (!specified) + if (!specified || !force) return 0; DRV_LOG(WARNING, - "interface \"%s\" (index %u) is not NetVSC," - " skipping", + "using non-NetVSC interface \"%s\" (index %u)", iface->if_name, iface->if_index); - return 0; } /* Routed NetVSC should not be probed. */ if (vdev_netvsc_has_route(iface->if_name)) { - DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", - iface->if_name, iface->if_index); - if (!specified) + if (!specified || !force) return 0; + DRV_LOG(WARNING, "using routed NetVSC interface \"%s\"" + " (index %u)", iface->if_name, iface->if_index); } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -591,6 +594,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = static const char *const vdev_netvsc_arg[] = { VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, + VDEV_NETVSC_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -599,6 +603,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = vdev_netvsc_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -610,14 +615,16 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || - !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, - specified, &matched); + force, specified, &matched); if (ret < 0) goto error; if (matched < specified) @@ -676,7 +683,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " - VDEV_NETVSC_ARG_MAC "=<string>"); + VDEV_NETVSC_ARG_MAC "=<string> " + VDEV_NETVSC_ARG_FORCE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v3 8/8] net/vdev_netvsc: add automatic probing 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (6 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad @ 2018-01-09 14:47 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-09 14:47 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair the NetVSC netdev device with the same MAC address PCI device by fail-safe PMD. Add vdev_netvsc custom scan in vdev bus to allow automatic probing in Hyper-V VM systems unless it was already specified by command line. Add "ignore" parameter to disable this auto-detection. Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 9 ++++-- drivers/net/vdev_netvsc/vdev_netvsc.c | 55 +++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index 3c26990..55d130a 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -71,8 +71,8 @@ Build options Run-time parameters ------------------- -To invoke this driver, applications have to explicitly provide the -``--vdev=net_vdev_netvsc`` EAL option. +This driver is invoked automatically in Hyper-V VM systems unless the user +invoked it by command line using ``--vdev=net_vdev_netvsc`` EAL option. The following device parameters are supported: @@ -91,5 +91,10 @@ The following device parameters are supported: If nonzero, forces the use of specified interfaces even if not detected as NetVSC or detected as routed NETVSC. +- ``ignore`` [int] + + If nonzero, ignores the driver runnig (actually used to disable the + auto-detection in Hyper-V VM). + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 301f9b6..0897c3d 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -29,13 +29,16 @@ #include <rte_errno.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_hypervisor.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_DRIVER_NAME RTE_STR(VDEV_NETVSC_DRIVER) #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" #define VDEV_NETVSC_ARG_FORCE "force" +#define VDEV_NETVSC_ARG_IGNORE "ignore" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -44,7 +47,7 @@ #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ vdev_netvsc_logtype, \ - RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT(VDEV_NETVSC_DRIVER_NAME ": " \ RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ RTE_FMT_TAIL(__VA_ARGS__,))) @@ -595,6 +598,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, VDEV_NETVSC_ARG_FORCE, + VDEV_NETVSC_ARG_IGNORE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -604,6 +608,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = unsigned int specified = 0; unsigned int matched = 0; int force = 0; + int ignore = 0; unsigned int i; int ret; @@ -617,10 +622,17 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IGNORE)) + ignore = !!atoi(pair->value); else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } + if (ignore) { + if (kvargs) + rte_kvargs_free(kvargs); + return 0; + } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, @@ -684,7 +696,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " VDEV_NETVSC_ARG_MAC "=<string> " - VDEV_NETVSC_ARG_FORCE "=<int>"); + VDEV_NETVSC_ARG_FORCE "=<int> " + VDEV_NETVSC_ARG_IGNORE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) @@ -693,3 +706,41 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (vdev_netvsc_logtype >= 0) rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); } + +/** Compare function for vdev find device operation. */ +static int +vdev_netvsc_cmp_rte_device(const struct rte_device *dev1, + __rte_unused const void *_dev2) +{ + return strcmp(dev1->devargs->name, VDEV_NETVSC_DRIVER_NAME); +} + +/** + * A callback called by vdev bus scan function to ensure this driver probing + * automatically in Hyper-V VM system unless it already exists in the + * devargs list. + */ +static void +vdev_netvsc_scan_callback(__rte_unused void *arg) +{ + struct rte_vdev_device *dev; + struct rte_devargs *devargs; + struct rte_bus *vbus = rte_bus_find_by_name("vdev"); + + TAILQ_FOREACH(devargs, &devargs_list, next) + if (!strcmp(devargs->name, VDEV_NETVSC_DRIVER_NAME)) + return; + dev = (struct rte_vdev_device *)vbus->find_device(NULL, + vdev_netvsc_cmp_rte_device, VDEV_NETVSC_DRIVER_NAME); + if (dev) + return; + if (rte_eal_devargs_add(RTE_DEVTYPE_VIRTUAL, VDEV_NETVSC_DRIVER_NAME)) + DRV_LOG(ERR, "unable to add netvsc devargs."); +} + +/** Initialize the custom scan. */ +RTE_INIT(vdev_netvsc_custom_scan_add) +{ + if (rte_hypervisor_get() == RTE_HYPERVISOR_HYPERV) + rte_vdev_add_custom_scan(vdev_netvsc_scan_callback, NULL); +} -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad ` (7 preceding siblings ...) 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 8/8] net/vdev_netvsc: add automatic probing Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 1/8] net/failsafe: fix invalid free Matan Azrad ` (8 more replies) 8 siblings, 9 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual driver (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .--------------------. | failsafe PMD +---------+ vdev_netvsc driver | `--+-------------------+--' `--------------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: v2 changes(Adrien): - Renamed driver from "hyperv" to "vdev_netvsc". This change covers documentation and symbols prefix. - Driver is now tagged EXPERIMENTAL. - Replaced ether_addr_from_str() with a basic sscanf() call. - Removed debugging code (memset() poisoning). - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. - Removed hyperv_basename(). - Discarded unused variables through __rte_unused. - Added separate but necessary free() bugfix for failsafe PMD. - Added file descriptor input support to failsafe PMD. - Replaced temporary bash execution; failsafe now reads device definitions directly through a pipe without an intermediate bash one-liner. - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). - Added dynamic log type (pmd.vdev_netvsc). - Modified initialization code to probe devices immediately during startup. - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more appropriate than "ret >= sizeof(foo) - 1"). v3 changes(Matan): - Fixed clang compilation in V2. - Removed hotplug remove code from the new driver. - Supported probed sub-devices getting in fail-safe. - Added automatic probing for HyperV VM systems. - Added option to ignore the automatic probing. - Skiped routed NetVSC devices probing. - Adjusted documentation and semantics. - Replaced maintainer. v4 changes(Matan): - Align descriptions of context struct(Stephen suggestion). - Skip non-ethernet devices in netdev loop(Stephen suggestion). - Use different variable names in "add fd parameter"(Gaetan suggestion). - Change name of get port id function in "add automatic probing"(Gaetan suggestion). - Update internal fail-safe devargs in case of probed device(Gaetan suggestion). - use deferent commit title instead of "support probed sub-devices getting"(Gaetan suggestion). Adrien Mazarguil (1): net/failsafe: fix invalid free Matan Azrad (7): net/failsafe: add "fd" parameter net/failsafe: add probed etherdev capture net/vdev_netvsc: introduce Hyper-V platform driver net/vdev_netvsc: implement core functionality net/vdev_netvsc: skip routed netvsc probing net/vdev_netvsc: add "force" parameter net/vdev_netvsc: add automatic probing MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/fail_safe.rst | 14 + doc/guides/nics/features/vdev_netvsc.ini | 12 + doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 100 +++ drivers/net/Makefile | 1 + drivers/net/failsafe/failsafe_args.c | 84 ++- drivers/net/failsafe/failsafe_eal.c | 78 ++- drivers/net/failsafe/failsafe_private.h | 5 + drivers/net/vdev_netvsc/Makefile | 31 + .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 752 +++++++++++++++++++++ mk/rte.app.mk | 1 + 15 files changed, 1071 insertions(+), 24 deletions(-) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 1/8] net/failsafe: fix invalid free 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter Matan Azrad ` (7 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil, stable, Gaetan Rivet From: Adrien Mazarguil <adrien.mazarguil@6wind.com> rte_free() is not supposed to work with pointers returned by calloc(). Fixes: a0194d828100 ("net/failsafe: add flexible device definition") Cc: stable@dpdk.org Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> --- drivers/net/failsafe/failsafe_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index cfc83e3..ec63ac9 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -407,7 +407,7 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t i; FOREACH_SUBDEV(sdev, i, dev) { - rte_free(sdev->cmdline); + free(sdev->cmdline); sdev->cmdline = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 1/8] net/failsafe: fix invalid free Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 8:51 ` Gaëtan Rivet 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture Matan Azrad ` (6 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil, Gaetan Rivet This parameter enables applications to provide device definitions through an arbitrary file descriptor number. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 9 ++++ drivers/net/failsafe/failsafe_args.c | 80 ++++++++++++++++++++++++++++++++- drivers/net/failsafe/failsafe_private.h | 3 ++ 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index c4e3d2e..5b1b47e 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -106,6 +106,15 @@ Fail-safe command line parameters All commas within the ``shell command`` are replaced by spaces before executing the command. This helps using scripts to specify devices. +- **fd(<file descriptor number>)** parameter + + This parameter reads a device definition from an arbitrary file descriptor + number in ``<iface>`` format as described above. + + The file descriptor is read in non-blocking mode and is never closed in + order to take only the last line into account (unlike ``exec()``) at every + probe attempt. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index ec63ac9..db5235b 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -31,7 +31,11 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> #include <string.h> +#include <unistd.h> #include <errno.h> #include <rte_debug.h> @@ -161,6 +165,67 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } static int +fs_read_fd(struct sub_device *sdev, char *fd_str) +{ + FILE *fp = NULL; + int fd = -1; + /* store possible newline as well */ + char output[DEVARGS_MAXLEN + 1]; + int err = -ENODEV; + int oflags; + int lcount; + + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); + if (sdev->fd_str == NULL) { + sdev->fd_str = strdup(fd_str); + if (sdev->fd_str == NULL) { + ERROR("Command line allocation failed"); + return -ENOMEM; + } + } + errno = 0; + fd = strtol(fd_str, &fd_str, 0); + if (errno || *fd_str || fd < 0) { + ERROR("Parsing FD number failed"); + goto error; + } + /* Fiddle with copy of file descriptor */ + fd = dup(fd); + if (fd == -1) + goto error; + oflags = fcntl(fd, F_GETFL); + if (oflags == -1) + goto error; + if (fcntl(fd, F_SETFL, fd | O_NONBLOCK) == -1) + goto error; + fp = fdopen(fd, "r"); + if (!fp) + goto error; + fd = -1; + /* Only take the last line into account */ + lcount = 0; + while (fgets(output, sizeof(output), fp)) + ++lcount; + if (lcount == 0) + goto error; + else if (ferror(fp) && errno != EAGAIN) + goto error; + /* Line must end with a newline character */ + fs_sanitize_cmdline(output); + if (output[0] == '\0') + goto error; + err = fs_parse_device(sdev, output); + if (err) + ERROR("Parsing device '%s' failed", output); +error: + if (fp) + fclose(fp); + if (fd != -1) + close(fd); + return err; +} + +static int fs_parse_device_param(struct rte_eth_dev *dev, const char *param, uint8_t head) { @@ -202,6 +267,14 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } if (ret) goto free_args; + } else if (strncmp(param, "fd(", 3) == 0) { + ret = fs_read_fd(sdev, args); + if (ret == -ENODEV) { + DEBUG("Reading device info from FD failed"); + ret = 0; + } + if (ret) + goto free_args; } else { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; @@ -409,6 +482,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, FOREACH_SUBDEV(sdev, i, dev) { free(sdev->cmdline); sdev->cmdline = NULL; + free(sdev->fd_str); + sdev->fd_str = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; } @@ -424,7 +499,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, param[b] != '\0') b++; if (strncmp(param, "dev", b) != 0 && - strncmp(param, "exec", b) != 0) { + strncmp(param, "exec", b) != 0 && + strncmp(param, "fd(", b) != 0) { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; } @@ -463,6 +539,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, continue; if (sdev->cmdline) ret = fs_execute_cmd(sdev, sdev->cmdline); + else if (sdev->fd_str) + ret = fs_read_fd(sdev, sdev->fd_str); else ret = fs_parse_sub_device(sdev); if (ret == 0) diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 54b5b91..5e04ffe 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -48,6 +48,7 @@ #define PMD_FAILSAFE_PARAM_STRING \ "dev(<ifc>)," \ "exec(<shell command>)," \ + "fd(<fd number>)," \ "mac=mac_addr," \ "hotplug_poll=u64" \ "" @@ -112,6 +113,8 @@ struct sub_device { struct fs_stats stats_snapshot; /* Some device are defined as a command line */ char *cmdline; + /* Others are retrieved through a file descriptor */ + char *fd_str; /* fail-safe device backreference */ struct rte_eth_dev *fs_dev; /* flag calling for recollection */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-18 8:51 ` Gaëtan Rivet 0 siblings, 0 replies; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-18 8:51 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen, Adrien Mazarguil Hi Matan, You forgot to fix the fcntl call, see below, On Thu, Jan 18, 2018 at 08:43:40AM +0000, Matan Azrad wrote: > This parameter enables applications to provide device definitions through > an arbitrary file descriptor number. > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Signed-off-by: Matan Azrad <matan@mellanox.com> with the relevant fixes: Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> > --- > doc/guides/nics/fail_safe.rst | 9 ++++ > drivers/net/failsafe/failsafe_args.c | 80 ++++++++++++++++++++++++++++++++- > drivers/net/failsafe/failsafe_private.h | 3 ++ > 3 files changed, 91 insertions(+), 1 deletion(-) > > diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst > index c4e3d2e..5b1b47e 100644 > --- a/doc/guides/nics/fail_safe.rst > +++ b/doc/guides/nics/fail_safe.rst > @@ -106,6 +106,15 @@ Fail-safe command line parameters > All commas within the ``shell command`` are replaced by spaces before > executing the command. This helps using scripts to specify devices. > > +- **fd(<file descriptor number>)** parameter > + > + This parameter reads a device definition from an arbitrary file descriptor > + number in ``<iface>`` format as described above. > + > + The file descriptor is read in non-blocking mode and is never closed in > + order to take only the last line into account (unlike ``exec()``) at every > + probe attempt. > + > - **mac** parameter [MAC address] > > This parameter allows the user to set a default MAC address to the fail-safe > diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c > index ec63ac9..db5235b 100644 > --- a/drivers/net/failsafe/failsafe_args.c > +++ b/drivers/net/failsafe/failsafe_args.c > @@ -31,7 +31,11 @@ > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > */ > > +#include <fcntl.h> > +#include <stdio.h> > +#include <stdlib.h> > #include <string.h> > +#include <unistd.h> > #include <errno.h> > > #include <rte_debug.h> > @@ -161,6 +165,67 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, > } > > static int > +fs_read_fd(struct sub_device *sdev, char *fd_str) > +{ > + FILE *fp = NULL; > + int fd = -1; > + /* store possible newline as well */ > + char output[DEVARGS_MAXLEN + 1]; > + int err = -ENODEV; > + int oflags; > + int lcount; > + > + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); > + if (sdev->fd_str == NULL) { > + sdev->fd_str = strdup(fd_str); > + if (sdev->fd_str == NULL) { > + ERROR("Command line allocation failed"); > + return -ENOMEM; > + } > + } > + errno = 0; > + fd = strtol(fd_str, &fd_str, 0); > + if (errno || *fd_str || fd < 0) { > + ERROR("Parsing FD number failed"); > + goto error; > + } > + /* Fiddle with copy of file descriptor */ > + fd = dup(fd); > + if (fd == -1) > + goto error; > + oflags = fcntl(fd, F_GETFL); > + if (oflags == -1) > + goto error; > + if (fcntl(fd, F_SETFL, fd | O_NONBLOCK) == -1) fcntl(fd, F_SETFL, oflags | O_NONBLOCK); here > + goto error; > + fp = fdopen(fd, "r"); > + if (!fp) While you're at it, here please use if (fp != NULL) instead. Regards, -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 9:10 ` Gaëtan Rivet 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad ` (5 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Gaetan Rivet Previous fail-safe code didn't support probed sub-devices capture and failed when it tried to probe them. Skip fail-safe sub-device probing when it already was probed. Signed-off-by: Matan Azrad <matan@mellanox.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 5 +++ drivers/net/failsafe/failsafe_args.c | 2 - drivers/net/failsafe/failsafe_eal.c | 78 ++++++++++++++++++++++++--------- drivers/net/failsafe/failsafe_private.h | 2 + 4 files changed, 65 insertions(+), 22 deletions(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -115,6 +115,11 @@ Fail-safe command line parameters order to take only the last line into account (unlike ``exec()``) at every probe attempt. +.. note:: + + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device + as is, which means that EAL device options are taken in this case. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index db5235b..daf5ed0 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -45,8 +45,6 @@ #include "failsafe_private.h" -#define DEVARGS_MAXLEN 4096 - /* Callback used when a new device is found in devargs */ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t head); diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c index 19d26f5..33a5adf 100644 --- a/drivers/net/failsafe/failsafe_eal.c +++ b/drivers/net/failsafe/failsafe_eal.c @@ -36,39 +36,77 @@ #include "failsafe_private.h" static int +fs_ethdev_portid_get(const char *name, uint16_t *port_id) +{ + uint16_t pid; + size_t len; + + if (name == NULL) { + DEBUG("Null pointer is specified\n"); + return -EINVAL; + } + len = strlen(name); + RTE_ETH_FOREACH_DEV(pid) { + if (!strncmp(name, rte_eth_devices[pid].device->name, len)) { + *port_id = pid; + return 0; + } + } + return -ENODEV; +} + +static int fs_bus_init(struct rte_eth_dev *dev) { struct sub_device *sdev; struct rte_devargs *da; uint8_t i; - uint16_t j; + uint16_t pid; int ret; FOREACH_SUBDEV(sdev, i, dev) { if (sdev->state != DEV_PARSED) continue; da = &sdev->devargs; - ret = rte_eal_hotplug_add(da->bus->name, - da->name, - da->args); - if (ret) { - ERROR("sub_device %d probe failed %s%s%s", i, - rte_errno ? "(" : "", - rte_errno ? strerror(rte_errno) : "", - rte_errno ? ")" : ""); - continue; - } - RTE_ETH_FOREACH_DEV(j) { - if (strcmp(rte_eth_devices[j].device->name, - da->name) == 0) { - ETH(sdev) = &rte_eth_devices[j]; - break; + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ret = rte_eal_hotplug_add(da->bus->name, + da->name, + da->args); + if (ret) { + ERROR("sub_device %d probe failed %s%s%s", i, + rte_errno ? "(" : "", + rte_errno ? strerror(rte_errno) : "", + rte_errno ? ")" : ""); + continue; } + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ERROR("sub_device %d init went wrong", i); + return -ENODEV; + } + } else { + char devstr[DEVARGS_MAXLEN] = ""; + struct rte_devargs *probed_da = + rte_eth_devices[pid].device->devargs; + + /* Take control of device probed by EAL options. */ + free(da->args); + memset(da, 0, sizeof(*da)); + if (probed_da != NULL) + snprintf(devstr, sizeof(devstr), "%s,%s", + probed_da->name, probed_da->args); + else + snprintf(devstr, sizeof(devstr), "%s", + rte_eth_devices[pid].device->name); + ret = rte_eal_devargs_parse(devstr, da); + if (ret) { + ERROR("Probed devargs parsing failed with code" + " %d", ret); + return ret; + } + INFO("Taking control of a probed sub device" + " %d named %s", i, da->name); } - if (ETH(sdev) == NULL) { - ERROR("sub_device %d init went wrong", i); - return -ENODEV; - } + ETH(sdev) = &rte_eth_devices[pid]; SUB_ID(sdev) = i; sdev->fs_dev = dev; sdev->dev = ETH(sdev)->device; diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 5e04ffe..9fcf72e 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -58,6 +58,8 @@ #define FAILSAFE_MAX_ETHPORTS 2 #define FAILSAFE_MAX_ETHADDR 128 +#define DEVARGS_MAXLEN 4096 + /* TYPES */ struct rxq { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 9:10 ` Gaëtan Rivet 2018-01-18 9:33 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-18 9:10 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Matan, On Thu, Jan 18, 2018 at 08:43:41AM +0000, Matan Azrad wrote: > Previous fail-safe code didn't support probed sub-devices capture and > failed when it tried to probe them. > > Skip fail-safe sub-device probing when it already was probed. > What happens when app --vdev "net_failsafe0,dev(net_failsafe0)" -- -i ? I guess infinite recursion. > Signed-off-by: Matan Azrad <matan@mellanox.com> > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > --- > doc/guides/nics/fail_safe.rst | 5 +++ > drivers/net/failsafe/failsafe_args.c | 2 - > drivers/net/failsafe/failsafe_eal.c | 78 ++++++++++++++++++++++++--------- > drivers/net/failsafe/failsafe_private.h | 2 + > 4 files changed, 65 insertions(+), 22 deletions(-) > > diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst > index 5b1b47e..b89e53b 100644 > --- a/doc/guides/nics/fail_safe.rst > +++ b/doc/guides/nics/fail_safe.rst > @@ -115,6 +115,11 @@ Fail-safe command line parameters > order to take only the last line into account (unlike ``exec()``) at every > probe attempt. > > +.. note:: > + > + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device > + as is, which means that EAL device options are taken in this case. > + This note should be right under the "dev()" parameter help I think. If the self-capture is possible and you fix it, you should as well add a line here about the limitation, concerning the PCI blacklist mode and the expected PCI id format? Something like: --- 8< --- When trying to use a PCI device automatically probed in blacklist mode, the syntax for the fail-safe must be with the full PCI id: Domain:Bus:Device.Function. See the usage example section. .. ^^^^^^^^^^^^^ Here, an ReST reference .. Would be nice, I don't recall .. the exact syntax. .. In the `Usage example` section: #. Start testpmd, automatically probing the device 84:00.0 and using it with the fail-safe .. code-block:: console $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ --vdev 'net_failsafe0,dev(0000:84:00.0),dev(net_ring0)' \ -- -i --- >8 --- Ensure that this is working before using this command, I haven't tested it. Regards, -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture 2018-01-18 9:10 ` Gaëtan Rivet @ 2018-01-18 9:33 ` Matan Azrad 0 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 9:33 UTC (permalink / raw) To: Gaëtan Rivet; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen Hi Gaetan From: Gaëtan Rivet, Thursday, January 18, 2018 11:11 AM > To: Matan Azrad <matan@mellanox.com> > Cc: Ferruh Yigit <ferruh.yigit@intel.com>; Thomas Monjalon > <thomas@monjalon.net>; dev@dpdk.org; stephen@networkplumber.org > Subject: Re: [PATCH v4 3/8] net/failsafe: add probed etherdev capture > > Hi Matan, > > On Thu, Jan 18, 2018 at 08:43:41AM +0000, Matan Azrad wrote: > > Previous fail-safe code didn't support probed sub-devices capture and > > failed when it tried to probe them. > > > > Skip fail-safe sub-device probing when it already was probed. > > > > What happens when > > app --vdev "net_failsafe0,dev(net_failsafe0)" -- -i > > ? I guess infinite recursion. > :) interesting ./x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd -n 4 --vdev="net_failsafe0,dev(net_failsafe0)" --vdev="net_vdev_netvsc,ignore=1" -- --burst=118 --mbcache=512 --portmask 0xf -i --nb-cores=11 --rxq=2 --txq=2 --txd=1024 --rxd=1024 EAL: Detected 12 lcore(s) EAL: No free hugepages reported in hugepages-1048576kB EAL: Debug dataplane logs available - lower performance EAL: Probing VFIO support... EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! EAL: PCI device 0002:00:02.0 on NUMA socket 0 EAL: probe driver: 15b3:1004 net_mlx4 PMD: net_mlx4: PCI information matches, using device "mlx4_0" (VF: true) PMD: net_mlx4: 1 port(s) detected PMD: net_mlx4: port 1 MAC address is 00:15:5d:44:4b:24 PMD: net_failsafe: Initializing Fail-safe PMD for net_failsafe0 PMD: net_failsafe: Creating fail-safe device on NUMA socket 0 PMD: net_failsafe: Taking control of a probed sub device 0 named net_failsafe0 PMD: net_failsafe: MAC address is 00:00:00:00:00:00 Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=327680, size=2176, socket=0 Configuring Port 0 (socket 0) Port 0: 00:15:5D:44:4B:24 Checking link statuses... Done testpmd> Failsafe0 took control of itself (since it is already probed we don't probe it again). > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> > > --- > > doc/guides/nics/fail_safe.rst | 5 +++ > > drivers/net/failsafe/failsafe_args.c | 2 - > > drivers/net/failsafe/failsafe_eal.c | 78 ++++++++++++++++++++++++--- > ------ > > drivers/net/failsafe/failsafe_private.h | 2 + > > 4 files changed, 65 insertions(+), 22 deletions(-) > > > > diff --git a/doc/guides/nics/fail_safe.rst > > b/doc/guides/nics/fail_safe.rst index 5b1b47e..b89e53b 100644 > > --- a/doc/guides/nics/fail_safe.rst > > +++ b/doc/guides/nics/fail_safe.rst > > @@ -115,6 +115,11 @@ Fail-safe command line parameters > > order to take only the last line into account (unlike ``exec()``) at every > > probe attempt. > > > > +.. note:: > > + > > + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the > device > > + as is, which means that EAL device options are taken in this case. > > + > > This note should be right under the "dev()" parameter help I think. > OK. > If the self-capture is possible and you fix it, you should as well add a line here > about the limitation, concerning the PCI blacklist mode and the expected PCI > id format? > > Something like: > > --- 8< --- > > When trying to use a PCI device automatically probed in blacklist mode, > the syntax for the fail-safe must be with the full PCI id: > Domain:Bus:Device.Function. See the usage example section. > > .. ^^^^^^^^^^^^^ Here, an ReST reference > .. Would be nice, I don't recall > .. the exact syntax. > .. In the `Usage example` section: > > #. Start testpmd, automatically probing the device 84:00.0 and using it with > the fail-safe > > .. code-block:: console > > $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ > --vdev 'net_failsafe0,dev(0000:84:00.0),dev(net_ring0)' \ > -- -i > > --- >8 --- > Ok. > Ensure that this is working before using this command, I haven't tested it. > Sure. > Regards, > -- > Gaëtan Rivet > 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 4/8] net/vdev_netvsc: introduce Hyper-V platform driver 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (2 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality Matan Azrad ` (4 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This patch lays the groundwork for this driver (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- MAINTAINERS | 6 ++ config/common_base | 5 ++ config/common_linuxapp | 1 + doc/guides/nics/features/vdev_netvsc.ini | 12 +++ doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 20 +++++ drivers/net/Makefile | 1 + drivers/net/vdev_netvsc/Makefile | 27 ++++++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 99 ++++++++++++++++++++++ mk/rte.app.mk | 1 + 11 files changed, 177 insertions(+) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c diff --git a/MAINTAINERS b/MAINTAINERS index af8de4f..97efbb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -462,6 +462,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft vdev-netvsc - EXPERIMENTAL +M: Matan Azrad <matan@mellanox.com> +F: drivers/net/vdev-netvsc/ +F: doc/guides/nics/vdev-netvsc.rst +F: doc/guides/nics/features/vdev-netvsc.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index 90508a8..664ff21 100644 --- a/config/common_base +++ b/config/common_base @@ -279,6 +279,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG_RX=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile virtual device driver for NetVSC on Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64..e043262 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/vdev_netvsc.ini b/doc/guides/nics/features/vdev_netvsc.ini new file mode 100644 index 0000000..cfc5cb9 --- /dev/null +++ b/doc/guides/nics/features/vdev_netvsc.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'vdev_netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe9..5666046 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -64,6 +64,7 @@ Network Interface Controller Drivers szedata2 tap thunderx + vdev_netvsc virtio vhost vmxnet3 diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst new file mode 100644 index 0000000..a952908 --- /dev/null +++ b/doc/guides/nics/vdev_netvsc.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox Technologies, Ltd. + +VDEV_NETVSC driver +================== + +The VDEV_NETVSC driver (librte_pmd_vdev_netvsc) provides support for NetVSC +interfaces and associated SR-IOV virtual function (VF) devices found in +Linux virtual machines running on Microsoft Hyper-V_ (including Azure) +platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) + + Toggle compilation of this driver. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index c2fd7f5..e112732 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -39,6 +39,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx +DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile new file mode 100644 index 0000000..2fb059d --- /dev/null +++ b/drivers/net/vdev_netvsc/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox Technologies, Ltd. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_vdev_netvsc.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_vdev_netvsc_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map new file mode 100644 index 0000000..179140f --- /dev/null +++ b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c new file mode 100644 index 0000000..e895b32 --- /dev/null +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox Technologies, Ltd. + */ + +#include <stddef.h> + +#include <rte_bus_vdev.h> +#include <rte_common.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_ARG_IFACE "iface" +#define VDEV_NETVSC_ARG_MAC "mac" + +#define DRV_LOG(level, ...) \ + rte_log(RTE_LOG_ ## level, \ + vdev_netvsc_logtype, \ + RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +/** Driver-specific log messages type. */ +static int vdev_netvsc_logtype; + +/** Number of driver instances relying on context list. */ +static unsigned int vdev_netvsc_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const vdev_netvsc_arg[] = { + VDEV_NETVSC_ARG_IFACE, + VDEV_NETVSC_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + vdev_netvsc_arg); + + DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); + if (!kvargs) { + DRV_LOG(ERR, "cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++vdev_netvsc_ctx_inst; + return 0; +} + +/** + * Remove driver instance. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0. + */ +static int +vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) +{ + --vdev_netvsc_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver vdev_netvsc_vdev = { + .probe = vdev_netvsc_vdev_probe, + .remove = vdev_netvsc_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); +RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); +RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, + VDEV_NETVSC_ARG_IFACE "=<string> " + VDEV_NETVSC_ARG_MAC "=<string>"); + +/** Initialize driver log type. */ +RTE_INIT(vdev_netvsc_init_log) +{ + vdev_netvsc_logtype = rte_log_register("pmd.vdev_netvsc"); + if (vdev_netvsc_logtype >= 0) + rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); +} diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 78f23c5..2f8af49 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -157,6 +157,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += -lrte_pmd_sfc_efx _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += -lrte_pmd_szedata2 -lsze2 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += -lrte_pmd_tap _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += -lrte_pmd_thunderx_nicvf +_LDLIBS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += -lrte_pmd_vdev_netvsc _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (3 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 18:25 ` Stephen Hemminger 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad ` (3 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the vdev_netvsc driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 70 +++++ drivers/net/vdev_netvsc/Makefile | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 550 +++++++++++++++++++++++++++++++++- 3 files changed, 623 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index a952908..fde1fb8 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -12,9 +12,79 @@ platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .--------------------. + | failsafe PMD +---------+ vdev_netvsc driver | + `--+-------------------+--' `--------------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + + +This driver implementation may be temporary and should be improved or removed +either when hot-plug will be fully supported in EAL and bus drivers or when +a new NetVSC driver will be integrated. + Build options ------------- - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) Toggle compilation of this driver. + +Run-time parameters +------------------- + +To invoke this driver, applications have to explicitly provide the +``--vdev=net_vdev_netvsc`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this driver + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this driver attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile index 2fb059d..f2b2ac5 100644 --- a/drivers/net/vdev_netvsc/Makefile +++ b/drivers/net/vdev_netvsc/Makefile @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -20,6 +23,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index e895b32..21c3265 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -3,17 +3,42 @@ * Copyright 2017 Mellanox Technologies, Ltd. */ +#include <errno.h> +#include <fcntl.h> +#include <inttypes.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <net/if_arp.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> #include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -25,12 +50,495 @@ /** Driver-specific log messages type. */ static int vdev_netvsc_logtype; +/** Context structure for a vdev_netvsc instance. */ +struct vdev_netvsc_ctx { + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< Unique ID. */ + char name[64]; /**< Unique name. */ + char devname[64]; /**< Fail-safe instance name. */ + char devargs[256]; /**< Fail-safe device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Fail-safe communication pipe. */ + char yield[256]; /**< PCI sub-device arguments. */ +}; + +/** Context list is common to all driver instances. */ +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); + +/** Number of entries in context list. */ +static unsigned int vdev_netvsc_ctx_count; + /** Number of driver instances relying on context list. */ static unsigned int vdev_netvsc_ctx_inst; /** + * Destroy a vdev_netvsc context instance. + * + * @param ctx + * Context to destroy. + */ +static void +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + DRV_LOG(ERR, "cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + DRV_LOG(WARNING, "cannot retrieve information about" + " interface \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) { + DRV_LOG(DEBUG, "interface %s is non-ethernet device", + req.ifr_name); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[sizeof(temp) + IF_NAMESIZE]; + FILE *f; + int ret; + int len = 0; + + ret = snprintf(path, sizeof(path), temp, iface->if_name); + if (ret == -1 || (size_t)ret >= sizeof(path)) { + rte_errno = ENOBUFS; + return 0; + } + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with vdev_netvsc context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the vdev_netvsc context and + * communicates its bus address to the fail-safe PMD instance if so. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct vdev_netvsc_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + DRV_LOG(DEBUG, + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (vdev_netvsc_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + if (strcmp(addr, "pci")) + return 0; + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance. */ + if (strcmp(addr, ctx->yield)) + DRV_LOG(DEBUG, "associating PCI device \"%s\" with NetVSC" + " interface \"%s\" (index %u)", addr, ctx->if_name, + ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + DRV_LOG(WARNING, "cannot associate PCI device name \"%s\" with" + " interface \"%s\": %s", addr, ctx->if_name, + rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as + * long as an vdev_netvsc context instance exists. + * + * @param arg + * Ignored. + */ +static void +vdev_netvsc_alarm(__rte_unused void *arg) +{ + struct vdev_netvsc_ctx *ctx; + int ret; + + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + if (ret) + break; + } + if (!vdev_netvsc_ctx_count) + return; + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a vdev_netvsc context from. + * + * This function instantiates vdev_netvsc contexts either for all NetVSC + * devices found on the system or only a subset provided as device + * arguments. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct vdev_netvsc_ctx *ctx; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { + struct ether_addr tmp; + + if (sscanf(pair->value, + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, + &tmp.addr_bytes[0], + &tmp.addr_bytes[1], + &tmp.addr_bytes[2], + &tmp.addr_bytes[3], + &tmp.addr_bytes[4], + &tmp.addr_bytes[5]) != 6) { + DRV_LOG(ERR, + "invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is already handled," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!vdev_netvsc_iface_is_netvsc(iface)) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is not NetVSC," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = vdev_netvsc_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + DRV_LOG(ERR, + "cannot allocate control pipe for interface \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) + continue; + ret = -errno; + DRV_LOG(ERR, "cannot toggle non-blocking flag on control file" + " descriptor #%u (%d): %s", i, ctx->pipe[i], + rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) + ++i; + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "fd(%d),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) + ++i; + if (i) { + ret = -ENOBUFS; + DRV_LOG(ERR, "generated virtual device name or argument list" + " too long for interface \"%s\"", ctx->if_name); + goto error; + } + /* Request virtual device generation. */ + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); + ++vdev_netvsc_ctx_count; + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", + ctx->if_name); + return 0; +error: + if (ctx) + vdev_netvsc_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for driver instance. * @@ -49,12 +557,40 @@ const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", vdev_netvsc_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); if (!kvargs) { DRV_LOG(ERR, "cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + /* Gather interfaces. */ + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + DRV_LOG(WARNING, + "some of the specified parameters did not match" + " recognized network interfaces"); + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -65,6 +601,9 @@ /** * Remove driver instance. * + * The alarm callback and underlying vdev_netvsc context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for driver instance. * @@ -74,7 +613,16 @@ static int vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) { - --vdev_netvsc_ctx_inst; + if (--vdev_netvsc_ctx_inst) + return 0; + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); + + LIST_REMOVE(ctx, entry); + --vdev_netvsc_ctx_count; + vdev_netvsc_ctx_destroy(ctx); + } return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-18 18:25 ` Stephen Hemminger 2018-01-18 18:28 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-18 18:25 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil On Thu, 18 Jan 2018 08:43:43 +0000 Matan Azrad <matan@mellanox.com> wrote: > + > +/** > + * Alarm callback that regularly probes system network interfaces. > + * > + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as > + * long as an vdev_netvsc context instance exists. > + * > + * @param arg > + * Ignored. > + */ > +static void > +vdev_netvsc_alarm(__rte_unused void *arg) > +{ > + struct vdev_netvsc_ctx *ctx; > + int ret; > + > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { > + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > + if (ret) > + break; > + } > + if (!vdev_netvsc_ctx_count) > + return; > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > + vdev_netvsc_alarm, NULL); > + if (ret < 0) { > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > + rte_strerror(-ret)); > + } > +} > + Not a fan of polling for network interface changes. Alarms in core code make life difficult for applications. Also, at least on current Azure infrastructure hotplug of netvsc devices is not supported. Can we just wait until proper hotplug API from kernel (ie read netlink uevent) is done? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality 2018-01-18 18:25 ` Stephen Hemminger @ 2018-01-18 18:28 ` Matan Azrad 0 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 18:28 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil From: Stephen Hemminger, Thursday, January 18, 2018 8:26 PM > On Thu, 18 Jan 2018 08:43:43 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > + > > +/** > > + * Alarm callback that regularly probes system network interfaces. > > + * > > + * This callback runs at a frequency determined by > > +VDEV_NETVSC_PROBE_MS as > > + * long as an vdev_netvsc context instance exists. > > + * > > + * @param arg > > + * Ignored. > > + */ > > +static void > > +vdev_netvsc_alarm(__rte_unused void *arg) { > > + struct vdev_netvsc_ctx *ctx; > > + int ret; > > + > > + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { > > + ret = > vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); > > + if (ret) > > + break; > > + } > > + if (!vdev_netvsc_ctx_count) > > + return; > > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > > + vdev_netvsc_alarm, NULL); > > + if (ret < 0) { > > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > > + rte_strerror(-ret)); > > + } > > +} > > + > > Not a fan of polling for network interface changes. > Alarms in core code make life difficult for applications. > What is the connection to application? It is netvsc driver alarm. > Also, at least on current Azure infrastructure hotplug of netvsc devices is not > supported. > It detects the PCI device hotplug, no netvsc device. > Can we just wait until proper hotplug API from kernel (ie read netlink > uevent) is done? Why? ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (4 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 18:26 ` Stephen Hemminger 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad ` (2 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Raslan Darawsheh NetVSC netdevices which are already routed should not be probed because they are used for management purposes by the HyperV. prevent routed netvsc devices probing. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 2 +- drivers/net/vdev_netvsc/vdev_netvsc.c | 46 +++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -87,4 +87,4 @@ The following device parameters are supported: MAC address. Not specifying either ``iface`` or ``mac`` makes this driver attach itself to -all NetVSC interfaces found on the system. +all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 21c3265..0055d0b 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -39,6 +39,7 @@ #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -198,6 +199,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = } /** + * Determine if a network interface has a route. + * + * @param[in] name + * Network device name. + * + * @return + * A nonzero value when interface has an route. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_has_route(const char *name) +{ + FILE *fp; + int ret = 0; + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; + char *netdev; + + fp = fopen("/proc/net/route", "r"); + if (!fp) { + rte_errno = errno; + return 0; + } + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { + netdev = strtok(route, "\t"); + if (strcmp(netdev, name) == 0) { + ret = 1; + break; + } + /* Move file pointer to the next line. */ + while (strchr(route, '\n') == NULL && + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) + ; + } + fclose(fp); + return ret; +} + +/** * Retrieve network interface data from sysfs symbolic link. * * @param[out] buf @@ -459,6 +498,13 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = iface->if_name, iface->if_index); return 0; } + /* Routed NetVSC should not be probed. */ + if (vdev_netvsc_has_route(iface->if_name)) { + DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", + iface->if_name, iface->if_index); + if (!specified) + return 0; + } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); if (!ctx) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-18 18:26 ` Stephen Hemminger 2018-01-18 18:47 ` Thomas Monjalon 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-18 18:26 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Raslan Darawsheh On Thu, 18 Jan 2018 08:43:44 +0000 Matan Azrad <matan@mellanox.com> wrote: > NetVSC netdevices which are already routed should not be probed because > they are used for management purposes by the HyperV. > > prevent routed netvsc devices probing. > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > Signed-off-by: Matan Azrad <matan@mellanox.com> Just checking for interface IPv4 or IPv6 (non-link local) is enough. If device has a L3 address than skip it. No need to read route table which maybe huge in some environments. ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-18 18:26 ` Stephen Hemminger @ 2018-01-18 18:47 ` Thomas Monjalon 0 siblings, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2018-01-18 18:47 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Matan Azrad, Ferruh Yigit, dev, Raslan Darawsheh 18/01/2018 19:26, Stephen Hemminger: > On Thu, 18 Jan 2018 08:43:44 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > NetVSC netdevices which are already routed should not be probed because > > they are used for management purposes by the HyperV. > > > > prevent routed netvsc devices probing. > > > > Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > Just checking for interface IPv4 or IPv6 (non-link local) is enough. > If device has a L3 address than skip it. > > No need to read route table which maybe huge in some environments. Stephen, I think you are in a better position to do this improvement. Can we accept this patch, so you can send a patch on top of it? Such PMD improvement may be integrated in RC2. ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (5 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 18:27 ` Stephen Hemminger 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This parameter allows specifying any non-NetVSC interface or routed NetVSC interfaces to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 5 +++++ drivers/net/vdev_netvsc/vdev_netvsc.c | 30 +++++++++++++++++++----------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index f779862..3c26990 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -86,5 +86,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC or detected as routed NETVSC. + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 0055d0b..2d03033 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -36,6 +36,7 @@ #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_ARG_FORCE "force" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -419,6 +420,9 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -436,6 +440,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -490,20 +495,18 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = return 0; } if (!vdev_netvsc_iface_is_netvsc(iface)) { - if (!specified) + if (!specified || !force) return 0; DRV_LOG(WARNING, - "interface \"%s\" (index %u) is not NetVSC," - " skipping", + "using non-NetVSC interface \"%s\" (index %u)", iface->if_name, iface->if_index); - return 0; } /* Routed NetVSC should not be probed. */ if (vdev_netvsc_has_route(iface->if_name)) { - DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", - iface->if_name, iface->if_index); - if (!specified) + if (!specified || !force) return 0; + DRV_LOG(WARNING, "using routed NetVSC interface \"%s\"" + " (index %u)", iface->if_name, iface->if_index); } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -597,6 +600,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = static const char *const vdev_netvsc_arg[] = { VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, + VDEV_NETVSC_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -605,6 +609,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = vdev_netvsc_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -616,14 +621,16 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || - !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, - specified, &matched); + force, specified, &matched); if (ret < 0) goto error; if (matched < specified) @@ -682,7 +689,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " - VDEV_NETVSC_ARG_MAC "=<string>"); + VDEV_NETVSC_ARG_MAC "=<string> " + VDEV_NETVSC_ARG_FORCE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad @ 2018-01-18 18:27 ` Stephen Hemminger 2018-01-18 18:30 ` Matan Azrad 0 siblings, 1 reply; 112+ messages in thread From: Stephen Hemminger @ 2018-01-18 18:27 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil On Thu, 18 Jan 2018 08:43:45 +0000 Matan Azrad <matan@mellanox.com> wrote: > This parameter allows specifying any non-NetVSC interface or routed > NetVSC interfaces to use with tap sub-devices for development purposes. > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > Signed-off-by: Matan Azrad <matan@mellanox.com> Might whitelist work for this? ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter 2018-01-18 18:27 ` Stephen Hemminger @ 2018-01-18 18:30 ` Matan Azrad 0 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 18:30 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, Thomas Monjalon, dev, Adrien Mazarguil From: Stephen Hemminger, Thursday, January 18, 2018 8:28 PM > On Thu, 18 Jan 2018 08:43:45 +0000 > Matan Azrad <matan@mellanox.com> wrote: > > > This parameter allows specifying any non-NetVSC interface or routed > > NetVSC interfaces to use with tap sub-devices for development purposes. > > > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> > > Signed-off-by: Matan Azrad <matan@mellanox.com> > > Might whitelist work for this? It is an optional parameter, you don't need to configure any parameter for this driver running. ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v4 8/8] net/vdev_netvsc: add automatic probing 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (6 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad @ 2018-01-18 8:43 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 8:43 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair the NetVSC netdev device with the same MAC address PCI device by fail-safe PMD. Add vdev_netvsc custom scan in vdev bus to allow automatic probing in Hyper-V VM systems unless it was already specified by command line. Add "ignore" parameter to disable this auto-detection. Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 9 ++++-- drivers/net/vdev_netvsc/vdev_netvsc.c | 55 +++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index 3c26990..55d130a 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -71,8 +71,8 @@ Build options Run-time parameters ------------------- -To invoke this driver, applications have to explicitly provide the -``--vdev=net_vdev_netvsc`` EAL option. +This driver is invoked automatically in Hyper-V VM systems unless the user +invoked it by command line using ``--vdev=net_vdev_netvsc`` EAL option. The following device parameters are supported: @@ -91,5 +91,10 @@ The following device parameters are supported: If nonzero, forces the use of specified interfaces even if not detected as NetVSC or detected as routed NETVSC. +- ``ignore`` [int] + + If nonzero, ignores the driver runnig (actually used to disable the + auto-detection in Hyper-V VM). + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 2d03033..a8a1a7f 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -30,13 +30,16 @@ #include <rte_errno.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_hypervisor.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_DRIVER_NAME RTE_STR(VDEV_NETVSC_DRIVER) #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" #define VDEV_NETVSC_ARG_FORCE "force" +#define VDEV_NETVSC_ARG_IGNORE "ignore" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -45,7 +48,7 @@ #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ vdev_netvsc_logtype, \ - RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT(VDEV_NETVSC_DRIVER_NAME ": " \ RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ RTE_FMT_TAIL(__VA_ARGS__,))) @@ -601,6 +604,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, VDEV_NETVSC_ARG_FORCE, + VDEV_NETVSC_ARG_IGNORE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -610,6 +614,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = unsigned int specified = 0; unsigned int matched = 0; int force = 0; + int ignore = 0; unsigned int i; int ret; @@ -623,10 +628,17 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IGNORE)) + ignore = !!atoi(pair->value); else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } + if (ignore) { + if (kvargs) + rte_kvargs_free(kvargs); + return 0; + } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, @@ -690,7 +702,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " VDEV_NETVSC_ARG_MAC "=<string> " - VDEV_NETVSC_ARG_FORCE "=<int>"); + VDEV_NETVSC_ARG_FORCE "=<int> " + VDEV_NETVSC_ARG_IGNORE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) @@ -699,3 +712,41 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (vdev_netvsc_logtype >= 0) rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); } + +/** Compare function for vdev find device operation. */ +static int +vdev_netvsc_cmp_rte_device(const struct rte_device *dev1, + __rte_unused const void *_dev2) +{ + return strcmp(dev1->devargs->name, VDEV_NETVSC_DRIVER_NAME); +} + +/** + * A callback called by vdev bus scan function to ensure this driver probing + * automatically in Hyper-V VM system unless it already exists in the + * devargs list. + */ +static void +vdev_netvsc_scan_callback(__rte_unused void *arg) +{ + struct rte_vdev_device *dev; + struct rte_devargs *devargs; + struct rte_bus *vbus = rte_bus_find_by_name("vdev"); + + TAILQ_FOREACH(devargs, &devargs_list, next) + if (!strcmp(devargs->name, VDEV_NETVSC_DRIVER_NAME)) + return; + dev = (struct rte_vdev_device *)vbus->find_device(NULL, + vdev_netvsc_cmp_rte_device, VDEV_NETVSC_DRIVER_NAME); + if (dev) + return; + if (rte_eal_devargs_add(RTE_DEVTYPE_VIRTUAL, VDEV_NETVSC_DRIVER_NAME)) + DRV_LOG(ERR, "unable to add netvsc devargs."); +} + +/** Initialize the custom scan. */ +RTE_INIT(vdev_netvsc_custom_scan_add) +{ + if (rte_hypervisor_get() == RTE_HYPERVISOR_HYPERV) + rte_vdev_add_custom_scan(vdev_netvsc_scan_callback, NULL); +} -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (7 preceding siblings ...) 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 8/8] net/vdev_netvsc: add automatic probing Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 1/8] net/failsafe: fix invalid free Matan Azrad ` (8 more replies) 8 siblings, 9 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual driver (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .--------------------. | failsafe PMD +---------+ vdev_netvsc driver | `--+-------------------+--' `--------------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: v2 changes(Adrien): - Renamed driver from "hyperv" to "vdev_netvsc". This change covers documentation and symbols prefix. - Driver is now tagged EXPERIMENTAL. - Replaced ether_addr_from_str() with a basic sscanf() call. - Removed debugging code (memset() poisoning). - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. - Removed hyperv_basename(). - Discarded unused variables through __rte_unused. - Added separate but necessary free() bugfix for failsafe PMD. - Added file descriptor input support to failsafe PMD. - Replaced temporary bash execution; failsafe now reads device definitions directly through a pipe without an intermediate bash one-liner. - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). - Added dynamic log type (pmd.vdev_netvsc). - Modified initialization code to probe devices immediately during startup. - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more appropriate than "ret >= sizeof(foo) - 1"). v3 changes(Matan): - Fixed clang compilation in V2. - Removed hotplug remove code from the new driver. - Supported probed sub-devices getting in fail-safe. - Added automatic probing for HyperV VM systems. - Added option to ignore the automatic probing. - Skiped routed NetVSC devices probing. - Adjusted documentation and semantics. - Replaced maintainer. v4 changes(Matan): - Align descriptions of context struct(Stephen suggestion). - Skip non-ethernet devices in netdev loop(Stephen suggestion). - Use different variable names in "add fd parameter"(Gaetan suggestion). - Change name of get port id function in "add automatic probing"(Gaetan suggestion). - Update internal fail-safe devargs in case of probed device(Gaetan suggestion). - use deferent commit title instead of "support probed sub-devices getting"(Gaetan suggestion). v5 changes(Matan): - Improve fail-safe documentation as Gaetan suggested. - Fix fcntl paramenter. Adrien Mazarguil (1): net/failsafe: fix invalid free Matan Azrad (7): net/failsafe: add "fd" parameter net/failsafe: add probed etherdev capture net/vdev_netvsc: introduce Hyper-V platform driver net/vdev_netvsc: implement core functionality net/vdev_netvsc: skip routed netvsc probing net/vdev_netvsc: add "force" parameter net/vdev_netvsc: add automatic probing MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/fail_safe.rst | 26 + doc/guides/nics/features/vdev_netvsc.ini | 12 + doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 100 +++ drivers/net/Makefile | 1 + drivers/net/failsafe/failsafe_args.c | 84 ++- drivers/net/failsafe/failsafe_eal.c | 78 ++- drivers/net/failsafe/failsafe_private.h | 5 + drivers/net/vdev_netvsc/Makefile | 31 + .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 752 +++++++++++++++++++++ mk/rte.app.mk | 1 + 15 files changed, 1083 insertions(+), 24 deletions(-) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 1/8] net/failsafe: fix invalid free 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 2/8] net/failsafe: add "fd" parameter Matan Azrad ` (7 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil, stable, Gaetan Rivet From: Adrien Mazarguil <adrien.mazarguil@6wind.com> rte_free() is not supposed to work with pointers returned by calloc(). Fixes: a0194d828100 ("net/failsafe: add flexible device definition") Cc: stable@dpdk.org Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> --- drivers/net/failsafe/failsafe_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index cfc83e3..ec63ac9 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -407,7 +407,7 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t i; FOREACH_SUBDEV(sdev, i, dev) { - rte_free(sdev->cmdline); + free(sdev->cmdline); sdev->cmdline = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 2/8] net/failsafe: add "fd" parameter 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 1/8] net/failsafe: fix invalid free Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture Matan Azrad ` (6 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This parameter enables applications to provide device definitions through an arbitrary file descriptor number. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 9 ++++ drivers/net/failsafe/failsafe_args.c | 80 ++++++++++++++++++++++++++++++++- drivers/net/failsafe/failsafe_private.h | 3 ++ 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index c4e3d2e..5b1b47e 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -106,6 +106,15 @@ Fail-safe command line parameters All commas within the ``shell command`` are replaced by spaces before executing the command. This helps using scripts to specify devices. +- **fd(<file descriptor number>)** parameter + + This parameter reads a device definition from an arbitrary file descriptor + number in ``<iface>`` format as described above. + + The file descriptor is read in non-blocking mode and is never closed in + order to take only the last line into account (unlike ``exec()``) at every + probe attempt. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index ec63ac9..c711da4 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -31,7 +31,11 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> #include <string.h> +#include <unistd.h> #include <errno.h> #include <rte_debug.h> @@ -161,6 +165,67 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } static int +fs_read_fd(struct sub_device *sdev, char *fd_str) +{ + FILE *fp = NULL; + int fd = -1; + /* store possible newline as well */ + char output[DEVARGS_MAXLEN + 1]; + int err = -ENODEV; + int oflags; + int lcount; + + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); + if (sdev->fd_str == NULL) { + sdev->fd_str = strdup(fd_str); + if (sdev->fd_str == NULL) { + ERROR("Command line allocation failed"); + return -ENOMEM; + } + } + errno = 0; + fd = strtol(fd_str, &fd_str, 0); + if (errno || *fd_str || fd < 0) { + ERROR("Parsing FD number failed"); + goto error; + } + /* Fiddle with copy of file descriptor */ + fd = dup(fd); + if (fd == -1) + goto error; + oflags = fcntl(fd, F_GETFL); + if (oflags == -1) + goto error; + if (fcntl(fd, F_SETFL, oflags | O_NONBLOCK) == -1) + goto error; + fp = fdopen(fd, "r"); + if (fp != NULL) + goto error; + fd = -1; + /* Only take the last line into account */ + lcount = 0; + while (fgets(output, sizeof(output), fp)) + ++lcount; + if (lcount == 0) + goto error; + else if (ferror(fp) && errno != EAGAIN) + goto error; + /* Line must end with a newline character */ + fs_sanitize_cmdline(output); + if (output[0] == '\0') + goto error; + err = fs_parse_device(sdev, output); + if (err) + ERROR("Parsing device '%s' failed", output); +error: + if (fp) + fclose(fp); + if (fd != -1) + close(fd); + return err; +} + +static int fs_parse_device_param(struct rte_eth_dev *dev, const char *param, uint8_t head) { @@ -202,6 +267,14 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } if (ret) goto free_args; + } else if (strncmp(param, "fd(", 3) == 0) { + ret = fs_read_fd(sdev, args); + if (ret == -ENODEV) { + DEBUG("Reading device info from FD failed"); + ret = 0; + } + if (ret) + goto free_args; } else { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; @@ -409,6 +482,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, FOREACH_SUBDEV(sdev, i, dev) { free(sdev->cmdline); sdev->cmdline = NULL; + free(sdev->fd_str); + sdev->fd_str = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; } @@ -424,7 +499,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, param[b] != '\0') b++; if (strncmp(param, "dev", b) != 0 && - strncmp(param, "exec", b) != 0) { + strncmp(param, "exec", b) != 0 && + strncmp(param, "fd(", b) != 0) { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; } @@ -463,6 +539,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, continue; if (sdev->cmdline) ret = fs_execute_cmd(sdev, sdev->cmdline); + else if (sdev->fd_str) + ret = fs_read_fd(sdev, sdev->fd_str); else ret = fs_parse_sub_device(sdev); if (ret == 0) diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 54b5b91..5e04ffe 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -48,6 +48,7 @@ #define PMD_FAILSAFE_PARAM_STRING \ "dev(<ifc>)," \ "exec(<shell command>)," \ + "fd(<fd number>)," \ "mac=mac_addr," \ "hotplug_poll=u64" \ "" @@ -112,6 +113,8 @@ struct sub_device { struct fs_stats stats_snapshot; /* Some device are defined as a command line */ char *cmdline; + /* Others are retrieved through a file descriptor */ + char *fd_str; /* fail-safe device backreference */ struct rte_eth_dev *fs_dev; /* flag calling for recollection */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:08 ` Gaëtan Rivet 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad ` (5 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Gaetan Rivet Previous fail-safe code didn't support probed sub-devices capture and failed when it tried to probe them. Skip fail-safe sub-device probing when it already was probed. Signed-off-by: Matan Azrad <matan@mellanox.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 17 +++++++ drivers/net/failsafe/failsafe_args.c | 2 - drivers/net/failsafe/failsafe_eal.c | 78 ++++++++++++++++++++++++--------- drivers/net/failsafe/failsafe_private.h | 2 + 4 files changed, 77 insertions(+), 22 deletions(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index 5b1b47e..3f72b59 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -93,6 +93,14 @@ Fail-safe command line parameters additional sub-device parameters if need be. They will be passed on to the sub-device. +.. note:: + + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device + as is, which means that EAL device options are taken in this case. + When trying to use a PCI device automatically probed in blacklist mode, + the syntax for the fail-safe must be with the full PCI id: + Domain:Bus:Device.Function. See the usage example section. + - **exec(<shell command>)** parameter This parameter allows the user to provide a command to the fail-safe PMD to @@ -169,6 +177,15 @@ This section shows some example of using **testpmd** with a fail-safe PMD. $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \ --vdev='net_failsafe0,exec(echo 84:00.0)' -- -i +#. Start testpmd, automatically probing the device 84:00.0 and using it with + the fail-safe. + + .. code-block:: console + + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ + --vdev 'net_failsafe0,dev(0000:84:00.0),dev(net_ring0)' -- -i + + Using the Fail-safe PMD from an application ------------------------------------------- diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index c711da4..583bf05 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -45,8 +45,6 @@ #include "failsafe_private.h" -#define DEVARGS_MAXLEN 4096 - /* Callback used when a new device is found in devargs */ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t head); diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c index 19d26f5..33a5adf 100644 --- a/drivers/net/failsafe/failsafe_eal.c +++ b/drivers/net/failsafe/failsafe_eal.c @@ -36,39 +36,77 @@ #include "failsafe_private.h" static int +fs_ethdev_portid_get(const char *name, uint16_t *port_id) +{ + uint16_t pid; + size_t len; + + if (name == NULL) { + DEBUG("Null pointer is specified\n"); + return -EINVAL; + } + len = strlen(name); + RTE_ETH_FOREACH_DEV(pid) { + if (!strncmp(name, rte_eth_devices[pid].device->name, len)) { + *port_id = pid; + return 0; + } + } + return -ENODEV; +} + +static int fs_bus_init(struct rte_eth_dev *dev) { struct sub_device *sdev; struct rte_devargs *da; uint8_t i; - uint16_t j; + uint16_t pid; int ret; FOREACH_SUBDEV(sdev, i, dev) { if (sdev->state != DEV_PARSED) continue; da = &sdev->devargs; - ret = rte_eal_hotplug_add(da->bus->name, - da->name, - da->args); - if (ret) { - ERROR("sub_device %d probe failed %s%s%s", i, - rte_errno ? "(" : "", - rte_errno ? strerror(rte_errno) : "", - rte_errno ? ")" : ""); - continue; - } - RTE_ETH_FOREACH_DEV(j) { - if (strcmp(rte_eth_devices[j].device->name, - da->name) == 0) { - ETH(sdev) = &rte_eth_devices[j]; - break; + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ret = rte_eal_hotplug_add(da->bus->name, + da->name, + da->args); + if (ret) { + ERROR("sub_device %d probe failed %s%s%s", i, + rte_errno ? "(" : "", + rte_errno ? strerror(rte_errno) : "", + rte_errno ? ")" : ""); + continue; } + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ERROR("sub_device %d init went wrong", i); + return -ENODEV; + } + } else { + char devstr[DEVARGS_MAXLEN] = ""; + struct rte_devargs *probed_da = + rte_eth_devices[pid].device->devargs; + + /* Take control of device probed by EAL options. */ + free(da->args); + memset(da, 0, sizeof(*da)); + if (probed_da != NULL) + snprintf(devstr, sizeof(devstr), "%s,%s", + probed_da->name, probed_da->args); + else + snprintf(devstr, sizeof(devstr), "%s", + rte_eth_devices[pid].device->name); + ret = rte_eal_devargs_parse(devstr, da); + if (ret) { + ERROR("Probed devargs parsing failed with code" + " %d", ret); + return ret; + } + INFO("Taking control of a probed sub device" + " %d named %s", i, da->name); } - if (ETH(sdev) == NULL) { - ERROR("sub_device %d init went wrong", i); - return -ENODEV; - } + ETH(sdev) = &rte_eth_devices[pid]; SUB_ID(sdev) = i; sdev->fs_dev = dev; sdev->dev = ETH(sdev)->device; diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 5e04ffe..9fcf72e 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -58,6 +58,8 @@ #define FAILSAFE_MAX_ETHPORTS 2 #define FAILSAFE_MAX_ETHADDR 128 +#define DEVARGS_MAXLEN 4096 + /* TYPES */ struct rxq { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 10:08 ` Gaëtan Rivet 0 siblings, 0 replies; 112+ messages in thread From: Gaëtan Rivet @ 2018-01-18 10:08 UTC (permalink / raw) To: Matan Azrad; +Cc: Ferruh Yigit, Thomas Monjalon, dev, stephen On Thu, Jan 18, 2018 at 10:01:44AM +0000, Matan Azrad wrote: > Previous fail-safe code didn't support probed sub-devices capture and > failed when it tried to probe them. > > Skip fail-safe sub-device probing when it already was probed. > > Signed-off-by: Matan Azrad <matan@mellanox.com> > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Okay, ignoring the recursive probing. It could be dangerous, with the ownership evolutions and unforeseen side-effects, but device matching will be reworked next release, so this new functionality will be fixed anyway at this point. Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> -- Gaëtan Rivet 6WIND ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 4/8] net/vdev_netvsc: introduce Hyper-V platform driver 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (2 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 5/8] net/vdev_netvsc: implement core functionality Matan Azrad ` (4 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This patch lays the groundwork for this driver (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- MAINTAINERS | 6 ++ config/common_base | 5 ++ config/common_linuxapp | 1 + doc/guides/nics/features/vdev_netvsc.ini | 12 +++ doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 20 +++++ drivers/net/Makefile | 1 + drivers/net/vdev_netvsc/Makefile | 27 ++++++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 99 ++++++++++++++++++++++ mk/rte.app.mk | 1 + 11 files changed, 177 insertions(+) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c diff --git a/MAINTAINERS b/MAINTAINERS index af8de4f..97efbb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -462,6 +462,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft vdev-netvsc - EXPERIMENTAL +M: Matan Azrad <matan@mellanox.com> +F: drivers/net/vdev-netvsc/ +F: doc/guides/nics/vdev-netvsc.rst +F: doc/guides/nics/features/vdev-netvsc.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index 90508a8..664ff21 100644 --- a/config/common_base +++ b/config/common_base @@ -279,6 +279,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG_RX=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile virtual device driver for NetVSC on Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64..e043262 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/vdev_netvsc.ini b/doc/guides/nics/features/vdev_netvsc.ini new file mode 100644 index 0000000..cfc5cb9 --- /dev/null +++ b/doc/guides/nics/features/vdev_netvsc.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'vdev_netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe9..5666046 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -64,6 +64,7 @@ Network Interface Controller Drivers szedata2 tap thunderx + vdev_netvsc virtio vhost vmxnet3 diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst new file mode 100644 index 0000000..a952908 --- /dev/null +++ b/doc/guides/nics/vdev_netvsc.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox Technologies, Ltd. + +VDEV_NETVSC driver +================== + +The VDEV_NETVSC driver (librte_pmd_vdev_netvsc) provides support for NetVSC +interfaces and associated SR-IOV virtual function (VF) devices found in +Linux virtual machines running on Microsoft Hyper-V_ (including Azure) +platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) + + Toggle compilation of this driver. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index c2fd7f5..e112732 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -39,6 +39,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx +DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile new file mode 100644 index 0000000..2fb059d --- /dev/null +++ b/drivers/net/vdev_netvsc/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox Technologies, Ltd. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_vdev_netvsc.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_vdev_netvsc_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map new file mode 100644 index 0000000..179140f --- /dev/null +++ b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c new file mode 100644 index 0000000..e895b32 --- /dev/null +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox Technologies, Ltd. + */ + +#include <stddef.h> + +#include <rte_bus_vdev.h> +#include <rte_common.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_ARG_IFACE "iface" +#define VDEV_NETVSC_ARG_MAC "mac" + +#define DRV_LOG(level, ...) \ + rte_log(RTE_LOG_ ## level, \ + vdev_netvsc_logtype, \ + RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +/** Driver-specific log messages type. */ +static int vdev_netvsc_logtype; + +/** Number of driver instances relying on context list. */ +static unsigned int vdev_netvsc_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const vdev_netvsc_arg[] = { + VDEV_NETVSC_ARG_IFACE, + VDEV_NETVSC_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + vdev_netvsc_arg); + + DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); + if (!kvargs) { + DRV_LOG(ERR, "cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++vdev_netvsc_ctx_inst; + return 0; +} + +/** + * Remove driver instance. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0. + */ +static int +vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) +{ + --vdev_netvsc_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver vdev_netvsc_vdev = { + .probe = vdev_netvsc_vdev_probe, + .remove = vdev_netvsc_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); +RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); +RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, + VDEV_NETVSC_ARG_IFACE "=<string> " + VDEV_NETVSC_ARG_MAC "=<string>"); + +/** Initialize driver log type. */ +RTE_INIT(vdev_netvsc_init_log) +{ + vdev_netvsc_logtype = rte_log_register("pmd.vdev_netvsc"); + if (vdev_netvsc_logtype >= 0) + rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); +} diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 78f23c5..2f8af49 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -157,6 +157,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += -lrte_pmd_sfc_efx _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += -lrte_pmd_szedata2 -lsze2 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += -lrte_pmd_tap _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += -lrte_pmd_thunderx_nicvf +_LDLIBS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += -lrte_pmd_vdev_netvsc _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 5/8] net/vdev_netvsc: implement core functionality 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (3 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad ` (3 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the vdev_netvsc driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 70 +++++ drivers/net/vdev_netvsc/Makefile | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 550 +++++++++++++++++++++++++++++++++- 3 files changed, 623 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index a952908..fde1fb8 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -12,9 +12,79 @@ platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .--------------------. + | failsafe PMD +---------+ vdev_netvsc driver | + `--+-------------------+--' `--------------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + + +This driver implementation may be temporary and should be improved or removed +either when hot-plug will be fully supported in EAL and bus drivers or when +a new NetVSC driver will be integrated. + Build options ------------- - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) Toggle compilation of this driver. + +Run-time parameters +------------------- + +To invoke this driver, applications have to explicitly provide the +``--vdev=net_vdev_netvsc`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this driver + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this driver attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile index 2fb059d..f2b2ac5 100644 --- a/drivers/net/vdev_netvsc/Makefile +++ b/drivers/net/vdev_netvsc/Makefile @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -20,6 +23,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index e895b32..21c3265 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -3,17 +3,42 @@ * Copyright 2017 Mellanox Technologies, Ltd. */ +#include <errno.h> +#include <fcntl.h> +#include <inttypes.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <net/if_arp.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> #include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -25,12 +50,495 @@ /** Driver-specific log messages type. */ static int vdev_netvsc_logtype; +/** Context structure for a vdev_netvsc instance. */ +struct vdev_netvsc_ctx { + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< Unique ID. */ + char name[64]; /**< Unique name. */ + char devname[64]; /**< Fail-safe instance name. */ + char devargs[256]; /**< Fail-safe device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Fail-safe communication pipe. */ + char yield[256]; /**< PCI sub-device arguments. */ +}; + +/** Context list is common to all driver instances. */ +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); + +/** Number of entries in context list. */ +static unsigned int vdev_netvsc_ctx_count; + /** Number of driver instances relying on context list. */ static unsigned int vdev_netvsc_ctx_inst; /** + * Destroy a vdev_netvsc context instance. + * + * @param ctx + * Context to destroy. + */ +static void +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + DRV_LOG(ERR, "cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + DRV_LOG(WARNING, "cannot retrieve information about" + " interface \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) { + DRV_LOG(DEBUG, "interface %s is non-ethernet device", + req.ifr_name); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[sizeof(temp) + IF_NAMESIZE]; + FILE *f; + int ret; + int len = 0; + + ret = snprintf(path, sizeof(path), temp, iface->if_name); + if (ret == -1 || (size_t)ret >= sizeof(path)) { + rte_errno = ENOBUFS; + return 0; + } + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with vdev_netvsc context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the vdev_netvsc context and + * communicates its bus address to the fail-safe PMD instance if so. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct vdev_netvsc_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + DRV_LOG(DEBUG, + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (vdev_netvsc_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + if (strcmp(addr, "pci")) + return 0; + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance. */ + if (strcmp(addr, ctx->yield)) + DRV_LOG(DEBUG, "associating PCI device \"%s\" with NetVSC" + " interface \"%s\" (index %u)", addr, ctx->if_name, + ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + DRV_LOG(WARNING, "cannot associate PCI device name \"%s\" with" + " interface \"%s\": %s", addr, ctx->if_name, + rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as + * long as an vdev_netvsc context instance exists. + * + * @param arg + * Ignored. + */ +static void +vdev_netvsc_alarm(__rte_unused void *arg) +{ + struct vdev_netvsc_ctx *ctx; + int ret; + + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + if (ret) + break; + } + if (!vdev_netvsc_ctx_count) + return; + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a vdev_netvsc context from. + * + * This function instantiates vdev_netvsc contexts either for all NetVSC + * devices found on the system or only a subset provided as device + * arguments. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct vdev_netvsc_ctx *ctx; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { + struct ether_addr tmp; + + if (sscanf(pair->value, + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, + &tmp.addr_bytes[0], + &tmp.addr_bytes[1], + &tmp.addr_bytes[2], + &tmp.addr_bytes[3], + &tmp.addr_bytes[4], + &tmp.addr_bytes[5]) != 6) { + DRV_LOG(ERR, + "invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is already handled," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!vdev_netvsc_iface_is_netvsc(iface)) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is not NetVSC," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = vdev_netvsc_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + DRV_LOG(ERR, + "cannot allocate control pipe for interface \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) + continue; + ret = -errno; + DRV_LOG(ERR, "cannot toggle non-blocking flag on control file" + " descriptor #%u (%d): %s", i, ctx->pipe[i], + rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) + ++i; + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "fd(%d),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) + ++i; + if (i) { + ret = -ENOBUFS; + DRV_LOG(ERR, "generated virtual device name or argument list" + " too long for interface \"%s\"", ctx->if_name); + goto error; + } + /* Request virtual device generation. */ + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); + ++vdev_netvsc_ctx_count; + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", + ctx->if_name); + return 0; +error: + if (ctx) + vdev_netvsc_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for driver instance. * @@ -49,12 +557,40 @@ const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", vdev_netvsc_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); if (!kvargs) { DRV_LOG(ERR, "cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + /* Gather interfaces. */ + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + DRV_LOG(WARNING, + "some of the specified parameters did not match" + " recognized network interfaces"); + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -65,6 +601,9 @@ /** * Remove driver instance. * + * The alarm callback and underlying vdev_netvsc context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for driver instance. * @@ -74,7 +613,16 @@ static int vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) { - --vdev_netvsc_ctx_inst; + if (--vdev_netvsc_ctx_inst) + return 0; + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); + + LIST_REMOVE(ctx, entry); + --vdev_netvsc_ctx_count; + vdev_netvsc_ctx_destroy(ctx); + } return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (4 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad ` (2 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Raslan Darawsheh NetVSC netdevices which are already routed should not be probed because they are used for management purposes by the HyperV. prevent routed netvsc devices probing. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 2 +- drivers/net/vdev_netvsc/vdev_netvsc.c | 46 +++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -87,4 +87,4 @@ The following device parameters are supported: MAC address. Not specifying either ``iface`` or ``mac`` makes this driver attach itself to -all NetVSC interfaces found on the system. +all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 21c3265..0055d0b 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -39,6 +39,7 @@ #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -198,6 +199,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = } /** + * Determine if a network interface has a route. + * + * @param[in] name + * Network device name. + * + * @return + * A nonzero value when interface has an route. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_has_route(const char *name) +{ + FILE *fp; + int ret = 0; + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; + char *netdev; + + fp = fopen("/proc/net/route", "r"); + if (!fp) { + rte_errno = errno; + return 0; + } + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { + netdev = strtok(route, "\t"); + if (strcmp(netdev, name) == 0) { + ret = 1; + break; + } + /* Move file pointer to the next line. */ + while (strchr(route, '\n') == NULL && + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) + ; + } + fclose(fp); + return ret; +} + +/** * Retrieve network interface data from sysfs symbolic link. * * @param[out] buf @@ -459,6 +498,13 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = iface->if_name, iface->if_index); return 0; } + /* Routed NetVSC should not be probed. */ + if (vdev_netvsc_has_route(iface->if_name)) { + DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", + iface->if_name, iface->if_index); + if (!specified) + return 0; + } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); if (!ctx) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 7/8] net/vdev_netvsc: add "force" parameter 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (5 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen, Adrien Mazarguil This parameter allows specifying any non-NetVSC interface or routed NetVSC interfaces to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 5 +++++ drivers/net/vdev_netvsc/vdev_netvsc.c | 30 +++++++++++++++++++----------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index f779862..3c26990 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -86,5 +86,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC or detected as routed NETVSC. + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 0055d0b..2d03033 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -36,6 +36,7 @@ #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_ARG_FORCE "force" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -419,6 +420,9 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -436,6 +440,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -490,20 +495,18 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = return 0; } if (!vdev_netvsc_iface_is_netvsc(iface)) { - if (!specified) + if (!specified || !force) return 0; DRV_LOG(WARNING, - "interface \"%s\" (index %u) is not NetVSC," - " skipping", + "using non-NetVSC interface \"%s\" (index %u)", iface->if_name, iface->if_index); - return 0; } /* Routed NetVSC should not be probed. */ if (vdev_netvsc_has_route(iface->if_name)) { - DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", - iface->if_name, iface->if_index); - if (!specified) + if (!specified || !force) return 0; + DRV_LOG(WARNING, "using routed NetVSC interface \"%s\"" + " (index %u)", iface->if_name, iface->if_index); } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -597,6 +600,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = static const char *const vdev_netvsc_arg[] = { VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, + VDEV_NETVSC_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -605,6 +609,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = vdev_netvsc_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -616,14 +621,16 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || - !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, - specified, &matched); + force, specified, &matched); if (ret < 0) goto error; if (matched < specified) @@ -682,7 +689,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " - VDEV_NETVSC_ARG_MAC "=<string>"); + VDEV_NETVSC_ARG_MAC "=<string> " + VDEV_NETVSC_ARG_FORCE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v5 8/8] net/vdev_netvsc: add automatic probing 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (6 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad @ 2018-01-18 10:01 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 10:01 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, dev, stephen Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair the NetVSC netdev device with the same MAC address PCI device by fail-safe PMD. Add vdev_netvsc custom scan in vdev bus to allow automatic probing in Hyper-V VM systems unless it was already specified by command line. Add "ignore" parameter to disable this auto-detection. Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 9 ++++-- drivers/net/vdev_netvsc/vdev_netvsc.c | 55 +++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index 3c26990..55d130a 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -71,8 +71,8 @@ Build options Run-time parameters ------------------- -To invoke this driver, applications have to explicitly provide the -``--vdev=net_vdev_netvsc`` EAL option. +This driver is invoked automatically in Hyper-V VM systems unless the user +invoked it by command line using ``--vdev=net_vdev_netvsc`` EAL option. The following device parameters are supported: @@ -91,5 +91,10 @@ The following device parameters are supported: If nonzero, forces the use of specified interfaces even if not detected as NetVSC or detected as routed NETVSC. +- ``ignore`` [int] + + If nonzero, ignores the driver runnig (actually used to disable the + auto-detection in Hyper-V VM). + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 2d03033..a8a1a7f 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -30,13 +30,16 @@ #include <rte_errno.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_hypervisor.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_DRIVER_NAME RTE_STR(VDEV_NETVSC_DRIVER) #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" #define VDEV_NETVSC_ARG_FORCE "force" +#define VDEV_NETVSC_ARG_IGNORE "ignore" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -45,7 +48,7 @@ #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ vdev_netvsc_logtype, \ - RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT(VDEV_NETVSC_DRIVER_NAME ": " \ RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ RTE_FMT_TAIL(__VA_ARGS__,))) @@ -601,6 +604,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, VDEV_NETVSC_ARG_FORCE, + VDEV_NETVSC_ARG_IGNORE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -610,6 +614,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = unsigned int specified = 0; unsigned int matched = 0; int force = 0; + int ignore = 0; unsigned int i; int ret; @@ -623,10 +628,17 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IGNORE)) + ignore = !!atoi(pair->value); else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } + if (ignore) { + if (kvargs) + rte_kvargs_free(kvargs); + return 0; + } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, @@ -690,7 +702,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " VDEV_NETVSC_ARG_MAC "=<string> " - VDEV_NETVSC_ARG_FORCE "=<int>"); + VDEV_NETVSC_ARG_FORCE "=<int> " + VDEV_NETVSC_ARG_IGNORE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) @@ -699,3 +712,41 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (vdev_netvsc_logtype >= 0) rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); } + +/** Compare function for vdev find device operation. */ +static int +vdev_netvsc_cmp_rte_device(const struct rte_device *dev1, + __rte_unused const void *_dev2) +{ + return strcmp(dev1->devargs->name, VDEV_NETVSC_DRIVER_NAME); +} + +/** + * A callback called by vdev bus scan function to ensure this driver probing + * automatically in Hyper-V VM system unless it already exists in the + * devargs list. + */ +static void +vdev_netvsc_scan_callback(__rte_unused void *arg) +{ + struct rte_vdev_device *dev; + struct rte_devargs *devargs; + struct rte_bus *vbus = rte_bus_find_by_name("vdev"); + + TAILQ_FOREACH(devargs, &devargs_list, next) + if (!strcmp(devargs->name, VDEV_NETVSC_DRIVER_NAME)) + return; + dev = (struct rte_vdev_device *)vbus->find_device(NULL, + vdev_netvsc_cmp_rte_device, VDEV_NETVSC_DRIVER_NAME); + if (dev) + return; + if (rte_eal_devargs_add(RTE_DEVTYPE_VIRTUAL, VDEV_NETVSC_DRIVER_NAME)) + DRV_LOG(ERR, "unable to add netvsc devargs."); +} + +/** Initialize the custom scan. */ +RTE_INIT(vdev_netvsc_custom_scan_add) +{ + if (rte_hypervisor_get() == RTE_HYPERVISOR_HYPERV) + rte_vdev_add_custom_scan(vdev_netvsc_scan_callback, NULL); +} -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (7 preceding siblings ...) 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 8/8] net/vdev_netvsc: add automatic probing Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 1/8] net/failsafe: fix invalid free Matan Azrad ` (8 more replies) 8 siblings, 9 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS instead of emulated PCI devices. While this reads like a case for a standard DPDK PMD, there is more to it. To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM. Both netdevices share the same MAC address in that case. When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. This new virtual driver (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. .------------------. | DPDK application | `--------+---------' | .------+------. | DPDK ethdev | `------+------' Control | | .------------+------------. v .--------------------. | failsafe PMD +---------+ vdev_netvsc driver | `--+-------------------+--' `--------------------' | | | .........|......... | : | : .----+----. : .----+----. : | tap PMD | : | any PMD | : `----+----' : `----+----' : <-- Hot-pluggable | : | : .------+-------. : .-----+-----. : | NetVSC-based | : | SR-IOV VF | : | netdevice | : | device | : `--------------' : `-----------' : :.................: v2 changes(Adrien): - Renamed driver from "hyperv" to "vdev_netvsc". This change covers documentation and symbols prefix. - Driver is now tagged EXPERIMENTAL. - Replaced ether_addr_from_str() with a basic sscanf() call. - Removed debugging code (memset() poisoning). - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. - Removed hyperv_basename(). - Discarded unused variables through __rte_unused. - Added separate but necessary free() bugfix for failsafe PMD. - Added file descriptor input support to failsafe PMD. - Replaced temporary bash execution; failsafe now reads device definitions directly through a pipe without an intermediate bash one-liner. - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). - Added dynamic log type (pmd.vdev_netvsc). - Modified initialization code to probe devices immediately during startup. - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more appropriate than "ret >= sizeof(foo) - 1"). v3 changes(Matan): - Fixed clang compilation in V2. - Removed hotplug remove code from the new driver. - Supported probed sub-devices getting in fail-safe. - Added automatic probing for HyperV VM systems. - Added option to ignore the automatic probing. - Skiped routed NetVSC devices probing. - Adjusted documentation and semantics. - Replaced maintainer. v4 changes(Matan): - Align descriptions of context struct(Stephen suggestion). - Skip non-ethernet devices in netdev loop(Stephen suggestion). - Use different variable names in "add fd parameter"(Gaetan suggestion). - Change name of get port id function in "add automatic probing"(Gaetan suggestion). - Update internal fail-safe devargs in case of probed device(Gaetan suggestion). - use deferent commit title instead of "support probed sub-devices getting"(Gaetan suggestion). v5 changes(Matan): - Improve fail-safe documentation as Gaetan suggested. - Fix fcntl paramenter. v6 changes: - fp!=NULL => fp==NULL in "add fd parameter". Adrien Mazarguil (1): net/failsafe: fix invalid free Matan Azrad (7): net/failsafe: add "fd" parameter net/failsafe: add probed etherdev capture net/vdev_netvsc: introduce Hyper-V platform driver net/vdev_netvsc: implement core functionality net/vdev_netvsc: skip routed netvsc probing net/vdev_netvsc: add "force" parameter net/vdev_netvsc: add automatic probing MAINTAINERS | 6 + config/common_base | 5 + config/common_linuxapp | 1 + doc/guides/nics/fail_safe.rst | 26 + doc/guides/nics/features/vdev_netvsc.ini | 12 + doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 100 +++ drivers/net/Makefile | 1 + drivers/net/failsafe/failsafe_args.c | 84 ++- drivers/net/failsafe/failsafe_eal.c | 78 ++- drivers/net/failsafe/failsafe_private.h | 5 + drivers/net/vdev_netvsc/Makefile | 31 + .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 752 +++++++++++++++++++++ mk/rte.app.mk | 1 + 15 files changed, 1083 insertions(+), 24 deletions(-) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 1/8] net/failsafe: fix invalid free 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 2/8] net/failsafe: add "fd" parameter Matan Azrad ` (7 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev, stable From: Adrien Mazarguil <adrien.mazarguil@6wind.com> rte_free() is not supposed to work with pointers returned by calloc(). Fixes: a0194d828100 ("net/failsafe: add flexible device definition") Cc: stable@dpdk.org Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> --- drivers/net/failsafe/failsafe_args.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index cfc83e3..ec63ac9 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -407,7 +407,7 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t i; FOREACH_SUBDEV(sdev, i, dev) { - rte_free(sdev->cmdline); + free(sdev->cmdline); sdev->cmdline = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 2/8] net/failsafe: add "fd" parameter 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 1/8] net/failsafe: fix invalid free Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture Matan Azrad ` (6 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev This parameter enables applications to provide device definitions through an arbitrary file descriptor number. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 9 ++++ drivers/net/failsafe/failsafe_args.c | 80 ++++++++++++++++++++++++++++++++- drivers/net/failsafe/failsafe_private.h | 3 ++ 3 files changed, 91 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index c4e3d2e..5b1b47e 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -106,6 +106,15 @@ Fail-safe command line parameters All commas within the ``shell command`` are replaced by spaces before executing the command. This helps using scripts to specify devices. +- **fd(<file descriptor number>)** parameter + + This parameter reads a device definition from an arbitrary file descriptor + number in ``<iface>`` format as described above. + + The file descriptor is read in non-blocking mode and is never closed in + order to take only the last line into account (unlike ``exec()``) at every + probe attempt. + - **mac** parameter [MAC address] This parameter allows the user to set a default MAC address to the fail-safe diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index ec63ac9..a1fb3fa 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -31,7 +31,11 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include <fcntl.h> +#include <stdio.h> +#include <stdlib.h> #include <string.h> +#include <unistd.h> #include <errno.h> #include <rte_debug.h> @@ -161,6 +165,67 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } static int +fs_read_fd(struct sub_device *sdev, char *fd_str) +{ + FILE *fp = NULL; + int fd = -1; + /* store possible newline as well */ + char output[DEVARGS_MAXLEN + 1]; + int err = -ENODEV; + int oflags; + int lcount; + + RTE_ASSERT(fd_str != NULL || sdev->fd_str != NULL); + if (sdev->fd_str == NULL) { + sdev->fd_str = strdup(fd_str); + if (sdev->fd_str == NULL) { + ERROR("Command line allocation failed"); + return -ENOMEM; + } + } + errno = 0; + fd = strtol(fd_str, &fd_str, 0); + if (errno || *fd_str || fd < 0) { + ERROR("Parsing FD number failed"); + goto error; + } + /* Fiddle with copy of file descriptor */ + fd = dup(fd); + if (fd == -1) + goto error; + oflags = fcntl(fd, F_GETFL); + if (oflags == -1) + goto error; + if (fcntl(fd, F_SETFL, oflags | O_NONBLOCK) == -1) + goto error; + fp = fdopen(fd, "r"); + if (fp == NULL) + goto error; + fd = -1; + /* Only take the last line into account */ + lcount = 0; + while (fgets(output, sizeof(output), fp)) + ++lcount; + if (lcount == 0) + goto error; + else if (ferror(fp) && errno != EAGAIN) + goto error; + /* Line must end with a newline character */ + fs_sanitize_cmdline(output); + if (output[0] == '\0') + goto error; + err = fs_parse_device(sdev, output); + if (err) + ERROR("Parsing device '%s' failed", output); +error: + if (fp) + fclose(fp); + if (fd != -1) + close(fd); + return err; +} + +static int fs_parse_device_param(struct rte_eth_dev *dev, const char *param, uint8_t head) { @@ -202,6 +267,14 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, } if (ret) goto free_args; + } else if (strncmp(param, "fd(", 3) == 0) { + ret = fs_read_fd(sdev, args); + if (ret == -ENODEV) { + DEBUG("Reading device info from FD failed"); + ret = 0; + } + if (ret) + goto free_args; } else { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; @@ -409,6 +482,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, FOREACH_SUBDEV(sdev, i, dev) { free(sdev->cmdline); sdev->cmdline = NULL; + free(sdev->fd_str); + sdev->fd_str = NULL; free(sdev->devargs.args); sdev->devargs.args = NULL; } @@ -424,7 +499,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, param[b] != '\0') b++; if (strncmp(param, "dev", b) != 0 && - strncmp(param, "exec", b) != 0) { + strncmp(param, "exec", b) != 0 && + strncmp(param, "fd(", b) != 0) { ERROR("Unrecognized device type: %.*s", (int)b, param); return -EINVAL; } @@ -463,6 +539,8 @@ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, continue; if (sdev->cmdline) ret = fs_execute_cmd(sdev, sdev->cmdline); + else if (sdev->fd_str) + ret = fs_read_fd(sdev, sdev->fd_str); else ret = fs_parse_sub_device(sdev); if (ret == 0) diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 54b5b91..5e04ffe 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -48,6 +48,7 @@ #define PMD_FAILSAFE_PARAM_STRING \ "dev(<ifc>)," \ "exec(<shell command>)," \ + "fd(<fd number>)," \ "mac=mac_addr," \ "hotplug_poll=u64" \ "" @@ -112,6 +113,8 @@ struct sub_device { struct fs_stats stats_snapshot; /* Some device are defined as a command line */ char *cmdline; + /* Others are retrieved through a file descriptor */ + char *fd_str; /* fail-safe device backreference */ struct rte_eth_dev *fs_dev; /* flag calling for recollection */ -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 2/8] net/failsafe: add "fd" parameter Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 22:34 ` Thomas Monjalon 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad ` (5 subsequent siblings) 8 siblings, 1 reply; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev Previous fail-safe code didn't support probed sub-devices capture and failed when it tried to probe them. Skip fail-safe sub-device probing when it already was probed. Signed-off-by: Matan Azrad <matan@mellanox.com> Cc: Gaetan Rivet <gaetan.rivet@6wind.com> --- doc/guides/nics/fail_safe.rst | 17 +++++++ drivers/net/failsafe/failsafe_args.c | 2 - drivers/net/failsafe/failsafe_eal.c | 78 ++++++++++++++++++++++++--------- drivers/net/failsafe/failsafe_private.h | 2 + 4 files changed, 77 insertions(+), 22 deletions(-) diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst index 5b1b47e..3f72b59 100644 --- a/doc/guides/nics/fail_safe.rst +++ b/doc/guides/nics/fail_safe.rst @@ -93,6 +93,14 @@ Fail-safe command line parameters additional sub-device parameters if need be. They will be passed on to the sub-device. +.. note:: + + In case of whitelist sub-device probed by EAL, fail-safe PMD will take the device + as is, which means that EAL device options are taken in this case. + When trying to use a PCI device automatically probed in blacklist mode, + the syntax for the fail-safe must be with the full PCI id: + Domain:Bus:Device.Function. See the usage example section. + - **exec(<shell command>)** parameter This parameter allows the user to provide a command to the fail-safe PMD to @@ -169,6 +177,15 @@ This section shows some example of using **testpmd** with a fail-safe PMD. $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \ --vdev='net_failsafe0,exec(echo 84:00.0)' -- -i +#. Start testpmd, automatically probing the device 84:00.0 and using it with + the fail-safe. + + .. code-block:: console + + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ + --vdev 'net_failsafe0,dev(0000:84:00.0),dev(net_ring0)' -- -i + + Using the Fail-safe PMD from an application ------------------------------------------- diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c index a1fb3fa..b049b75 100644 --- a/drivers/net/failsafe/failsafe_args.c +++ b/drivers/net/failsafe/failsafe_args.c @@ -45,8 +45,6 @@ #include "failsafe_private.h" -#define DEVARGS_MAXLEN 4096 - /* Callback used when a new device is found in devargs */ typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params, uint8_t head); diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c index 19d26f5..33a5adf 100644 --- a/drivers/net/failsafe/failsafe_eal.c +++ b/drivers/net/failsafe/failsafe_eal.c @@ -36,39 +36,77 @@ #include "failsafe_private.h" static int +fs_ethdev_portid_get(const char *name, uint16_t *port_id) +{ + uint16_t pid; + size_t len; + + if (name == NULL) { + DEBUG("Null pointer is specified\n"); + return -EINVAL; + } + len = strlen(name); + RTE_ETH_FOREACH_DEV(pid) { + if (!strncmp(name, rte_eth_devices[pid].device->name, len)) { + *port_id = pid; + return 0; + } + } + return -ENODEV; +} + +static int fs_bus_init(struct rte_eth_dev *dev) { struct sub_device *sdev; struct rte_devargs *da; uint8_t i; - uint16_t j; + uint16_t pid; int ret; FOREACH_SUBDEV(sdev, i, dev) { if (sdev->state != DEV_PARSED) continue; da = &sdev->devargs; - ret = rte_eal_hotplug_add(da->bus->name, - da->name, - da->args); - if (ret) { - ERROR("sub_device %d probe failed %s%s%s", i, - rte_errno ? "(" : "", - rte_errno ? strerror(rte_errno) : "", - rte_errno ? ")" : ""); - continue; - } - RTE_ETH_FOREACH_DEV(j) { - if (strcmp(rte_eth_devices[j].device->name, - da->name) == 0) { - ETH(sdev) = &rte_eth_devices[j]; - break; + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ret = rte_eal_hotplug_add(da->bus->name, + da->name, + da->args); + if (ret) { + ERROR("sub_device %d probe failed %s%s%s", i, + rte_errno ? "(" : "", + rte_errno ? strerror(rte_errno) : "", + rte_errno ? ")" : ""); + continue; } + if (fs_ethdev_portid_get(da->name, &pid) != 0) { + ERROR("sub_device %d init went wrong", i); + return -ENODEV; + } + } else { + char devstr[DEVARGS_MAXLEN] = ""; + struct rte_devargs *probed_da = + rte_eth_devices[pid].device->devargs; + + /* Take control of device probed by EAL options. */ + free(da->args); + memset(da, 0, sizeof(*da)); + if (probed_da != NULL) + snprintf(devstr, sizeof(devstr), "%s,%s", + probed_da->name, probed_da->args); + else + snprintf(devstr, sizeof(devstr), "%s", + rte_eth_devices[pid].device->name); + ret = rte_eal_devargs_parse(devstr, da); + if (ret) { + ERROR("Probed devargs parsing failed with code" + " %d", ret); + return ret; + } + INFO("Taking control of a probed sub device" + " %d named %s", i, da->name); } - if (ETH(sdev) == NULL) { - ERROR("sub_device %d init went wrong", i); - return -ENODEV; - } + ETH(sdev) = &rte_eth_devices[pid]; SUB_ID(sdev) = i; sdev->fs_dev = dev; sdev->dev = ETH(sdev)->device; diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h index 5e04ffe..9fcf72e 100644 --- a/drivers/net/failsafe/failsafe_private.h +++ b/drivers/net/failsafe/failsafe_private.h @@ -58,6 +58,8 @@ #define FAILSAFE_MAX_ETHPORTS 2 #define FAILSAFE_MAX_ETHADDR 128 +#define DEVARGS_MAXLEN 4096 + /* TYPES */ struct rxq { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 22:34 ` Thomas Monjalon 0 siblings, 0 replies; 112+ messages in thread From: Thomas Monjalon @ 2018-01-18 22:34 UTC (permalink / raw) To: Matan Azrad; +Cc: dev, Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet 18/01/2018 14:51, Matan Azrad: > Previous fail-safe code didn't support probed sub-devices capture and > failed when it tried to probe them. > > Skip fail-safe sub-device probing when it already was probed. > > Signed-off-by: Matan Azrad <matan@mellanox.com> > Cc: Gaetan Rivet <gaetan.rivet@6wind.com> Was acked by Gaetan in v5. ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 4/8] net/vdev_netvsc: introduce Hyper-V platform driver 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (2 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 5/8] net/vdev_netvsc: implement core functionality Matan Azrad ` (4 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev This patch lays the groundwork for this driver (draft documentation, copyright notices, code base skeleton and build system hooks). While it can be successfully compiled and invoked, it's an empty shell at this stage. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- MAINTAINERS | 6 ++ config/common_base | 5 ++ config/common_linuxapp | 1 + doc/guides/nics/features/vdev_netvsc.ini | 12 +++ doc/guides/nics/index.rst | 1 + doc/guides/nics/vdev_netvsc.rst | 20 +++++ drivers/net/Makefile | 1 + drivers/net/vdev_netvsc/Makefile | 27 ++++++ .../vdev_netvsc/rte_pmd_vdev_netvsc_version.map | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 99 ++++++++++++++++++++++ mk/rte.app.mk | 1 + 11 files changed, 177 insertions(+) create mode 100644 doc/guides/nics/features/vdev_netvsc.ini create mode 100644 doc/guides/nics/vdev_netvsc.rst create mode 100644 drivers/net/vdev_netvsc/Makefile create mode 100644 drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map create mode 100644 drivers/net/vdev_netvsc/vdev_netvsc.c diff --git a/MAINTAINERS b/MAINTAINERS index af8de4f..97efbb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -462,6 +462,12 @@ F: drivers/net/mrvl/ F: doc/guides/nics/mrvl.rst F: doc/guides/nics/features/mrvl.ini +Microsoft vdev-netvsc - EXPERIMENTAL +M: Matan Azrad <matan@mellanox.com> +F: drivers/net/vdev-netvsc/ +F: doc/guides/nics/vdev-netvsc.rst +F: doc/guides/nics/features/vdev-netvsc.ini + Netcope szedata2 M: Matej Vido <vido@cesnet.cz> F: drivers/net/szedata2/ diff --git a/config/common_base b/config/common_base index 90508a8..664ff21 100644 --- a/config/common_base +++ b/config/common_base @@ -279,6 +279,11 @@ CONFIG_RTE_LIBRTE_NFP_DEBUG_RX=n CONFIG_RTE_LIBRTE_MRVL_PMD=n # +# Compile virtual device driver for NetVSC on Hyper-V/Azure +# +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=n + +# # Compile burst-oriented Broadcom BNXT PMD driver # CONFIG_RTE_LIBRTE_BNXT_PMD=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 74c7d64..e043262 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -47,6 +47,7 @@ CONFIG_RTE_LIBRTE_PMD_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_PMD_TAP=y CONFIG_RTE_LIBRTE_AVP_PMD=y +CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD=y CONFIG_RTE_LIBRTE_NFP_PMD=y CONFIG_RTE_LIBRTE_POWER=y CONFIG_RTE_VIRTIO_USER=y diff --git a/doc/guides/nics/features/vdev_netvsc.ini b/doc/guides/nics/features/vdev_netvsc.ini new file mode 100644 index 0000000..cfc5cb9 --- /dev/null +++ b/doc/guides/nics/features/vdev_netvsc.ini @@ -0,0 +1,12 @@ +; +; Supported features of the 'vdev_netvsc' network poll mode driver. +; +; Refer to default.ini for the full list of available PMD features. +; +[Features] +ARMv7 = Y +ARMv8 = Y +Power8 = Y +x86-32 = Y +x86-64 = Y +Usage doc = Y diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst index 23babe9..5666046 100644 --- a/doc/guides/nics/index.rst +++ b/doc/guides/nics/index.rst @@ -64,6 +64,7 @@ Network Interface Controller Drivers szedata2 tap thunderx + vdev_netvsc virtio vhost vmxnet3 diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst new file mode 100644 index 0000000..a952908 --- /dev/null +++ b/doc/guides/nics/vdev_netvsc.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright 2017 6WIND S.A. + Copyright 2017 Mellanox Technologies, Ltd. + +VDEV_NETVSC driver +================== + +The VDEV_NETVSC driver (librte_pmd_vdev_netvsc) provides support for NetVSC +interfaces and associated SR-IOV virtual function (VF) devices found in +Linux virtual machines running on Microsoft Hyper-V_ (including Azure) +platforms. + +.. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v + +Build options +------------- + +- ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) + + Toggle compilation of this driver. diff --git a/drivers/net/Makefile b/drivers/net/Makefile index c2fd7f5..e112732 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -39,6 +39,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += sfc DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += tap DIRS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += thunderx +DIRS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile new file mode 100644 index 0000000..2fb059d --- /dev/null +++ b/drivers/net/vdev_netvsc/Makefile @@ -0,0 +1,27 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2017 6WIND S.A. +# Copyright 2017 Mellanox Technologies, Ltd. + +include $(RTE_SDK)/mk/rte.vars.mk + +# Properties of the generated library. +LIB = librte_pmd_vdev_netvsc.a +LIBABIVER := 1 +EXPORT_MAP := rte_pmd_vdev_netvsc_version.map + +# Additional compilation flags. +CFLAGS += -O3 +CFLAGS += -g +CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += $(WERROR_FLAGS) + +# Dependencies. +LDLIBS += -lrte_bus_vdev +LDLIBS += -lrte_eal +LDLIBS += -lrte_ethdev +LDLIBS += -lrte_kvargs + +# Source files. +SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map new file mode 100644 index 0000000..179140f --- /dev/null +++ b/drivers/net/vdev_netvsc/rte_pmd_vdev_netvsc_version.map @@ -0,0 +1,4 @@ +DPDK_18.02 { + + local: *; +}; diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c new file mode 100644 index 0000000..e895b32 --- /dev/null +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright 2017 6WIND S.A. + * Copyright 2017 Mellanox Technologies, Ltd. + */ + +#include <stddef.h> + +#include <rte_bus_vdev.h> +#include <rte_common.h> +#include <rte_config.h> +#include <rte_kvargs.h> +#include <rte_log.h> + +#define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_ARG_IFACE "iface" +#define VDEV_NETVSC_ARG_MAC "mac" + +#define DRV_LOG(level, ...) \ + rte_log(RTE_LOG_ ## level, \ + vdev_netvsc_logtype, \ + RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ + RTE_FMT_TAIL(__VA_ARGS__,))) + +/** Driver-specific log messages type. */ +static int vdev_netvsc_logtype; + +/** Number of driver instances relying on context list. */ +static unsigned int vdev_netvsc_ctx_inst; + +/** + * Probe NetVSC interfaces. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0, even in case of errors. + */ +static int +vdev_netvsc_vdev_probe(struct rte_vdev_device *dev) +{ + static const char *const vdev_netvsc_arg[] = { + VDEV_NETVSC_ARG_IFACE, + VDEV_NETVSC_ARG_MAC, + NULL, + }; + const char *name = rte_vdev_device_name(dev); + const char *args = rte_vdev_device_args(dev); + struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", + vdev_netvsc_arg); + + DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); + if (!kvargs) { + DRV_LOG(ERR, "cannot parse arguments list"); + goto error; + } +error: + if (kvargs) + rte_kvargs_free(kvargs); + ++vdev_netvsc_ctx_inst; + return 0; +} + +/** + * Remove driver instance. + * + * @param dev + * Virtual device context for driver instance. + * + * @return + * Always 0. + */ +static int +vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) +{ + --vdev_netvsc_ctx_inst; + return 0; +} + +/** Virtual device descriptor. */ +static struct rte_vdev_driver vdev_netvsc_vdev = { + .probe = vdev_netvsc_vdev_probe, + .remove = vdev_netvsc_vdev_remove, +}; + +RTE_PMD_REGISTER_VDEV(VDEV_NETVSC_DRIVER, vdev_netvsc_vdev); +RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); +RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, + VDEV_NETVSC_ARG_IFACE "=<string> " + VDEV_NETVSC_ARG_MAC "=<string>"); + +/** Initialize driver log type. */ +RTE_INIT(vdev_netvsc_init_log) +{ + vdev_netvsc_logtype = rte_log_register("pmd.vdev_netvsc"); + if (vdev_netvsc_logtype >= 0) + rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); +} diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 78f23c5..2f8af49 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -157,6 +157,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_SFC_EFX_PMD) += -lrte_pmd_sfc_efx _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += -lrte_pmd_szedata2 -lsze2 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_TAP) += -lrte_pmd_tap _LDLIBS-$(CONFIG_RTE_LIBRTE_THUNDERX_NICVF_PMD) += -lrte_pmd_thunderx_nicvf +_LDLIBS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += -lrte_pmd_vdev_netvsc _LDLIBS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += -lrte_pmd_virtio ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y) _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST) += -lrte_pmd_vhost -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 5/8] net/vdev_netvsc: implement core functionality 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (3 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad ` (3 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev As described in more details in the attached documentation (see patch contents), this virtual device driver manages NetVSC interfaces in virtual machines hosted by Hyper-V/Azure platforms. This driver does not manage traffic nor Ethernet devices directly; it acts as a thin configuration layer that automatically instantiates and controls fail-safe PMD instances combining tap and PCI sub-devices, so that each NetVSC interface is exposed as a single consolidated port to DPDK applications. PCI sub-devices being hot-pluggable (e.g. during VM migration), applications automatically benefit from increased throughput when present and automatic fallback on NetVSC otherwise without interruption thanks to fail-safe's hot-plug handling. Once initialized, the sole job of the vdev_netvsc driver is to regularly scan for PCI devices to associate with NetVSC interfaces and feed their addresses to corresponding fail-safe instances. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 70 +++++ drivers/net/vdev_netvsc/Makefile | 4 + drivers/net/vdev_netvsc/vdev_netvsc.c | 550 +++++++++++++++++++++++++++++++++- 3 files changed, 623 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index a952908..fde1fb8 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -12,9 +12,79 @@ platforms. .. _Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-hyper-v +Implementation details +---------------------- + +Each instance of this driver effectively needs to drive two devices: the +NetVSC interface proper and its SR-IOV VF (referred to as "physical" from +this point on) counterpart sharing the same MAC address. + +Physical devices are part of the host system and cannot be maintained during +VM migration. From a VM standpoint they appear as hot-plug devices that come +and go without prior notice. + +When the physical device is present, egress and most of the ingress traffic +flows through it; only multicasts and other hypervisor control still flow +through NetVSC. Otherwise, NetVSC acts as a fallback for all traffic. + +To avoid unnecessary code duplication and ensure maximum performance, +handling of physical devices is left to their original PMDs; this virtual +device driver (also known as *vdev*) manages other PMDs as summarized by the +following block diagram:: + + .------------------. + | DPDK application | + `--------+---------' + | + .------+------. + | DPDK ethdev | + `------+------' Control + | | + .------------+------------. v .--------------------. + | failsafe PMD +---------+ vdev_netvsc driver | + `--+-------------------+--' `--------------------' + | | + | .........|......... + | : | : + .----+----. : .----+----. : + | tap PMD | : | any PMD | : + `----+----' : `----+----' : <-- Hot-pluggable + | : | : + .------+-------. : .-----+-----. : + | NetVSC-based | : | SR-IOV VF | : + | netdevice | : | device | : + `--------------' : `-----------' : + :.................: + + +This driver implementation may be temporary and should be improved or removed +either when hot-plug will be fully supported in EAL and bus drivers or when +a new NetVSC driver will be integrated. + Build options ------------- - ``CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD`` (default ``y``) Toggle compilation of this driver. + +Run-time parameters +------------------- + +To invoke this driver, applications have to explicitly provide the +``--vdev=net_vdev_netvsc`` EAL option. + +The following device parameters are supported: + +- ``iface`` [string] + + Provide a specific NetVSC interface (netdevice) name to attach this driver + to. Can be provided multiple times for additional instances. + +- ``mac`` [string] + + Same as ``iface`` except a suitable NetVSC interface is located using its + MAC address. + +Not specifying either ``iface`` or ``mac`` makes this driver attach itself to +all NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/Makefile b/drivers/net/vdev_netvsc/Makefile index 2fb059d..f2b2ac5 100644 --- a/drivers/net/vdev_netvsc/Makefile +++ b/drivers/net/vdev_netvsc/Makefile @@ -13,6 +13,9 @@ EXPORT_MAP := rte_pmd_vdev_netvsc_version.map CFLAGS += -O3 CFLAGS += -g CFLAGS += -std=c11 -pedantic -Wall -Wextra +CFLAGS += -D_XOPEN_SOURCE=600 +CFLAGS += -D_BSD_SOURCE +CFLAGS += -D_DEFAULT_SOURCE CFLAGS += $(WERROR_FLAGS) # Dependencies. @@ -20,6 +23,7 @@ LDLIBS += -lrte_bus_vdev LDLIBS += -lrte_eal LDLIBS += -lrte_ethdev LDLIBS += -lrte_kvargs +LDLIBS += -lrte_net # Source files. SRCS-$(CONFIG_RTE_LIBRTE_VDEV_NETVSC_PMD) += vdev_netvsc.c diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index e895b32..21c3265 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -3,17 +3,42 @@ * Copyright 2017 Mellanox Technologies, Ltd. */ +#include <errno.h> +#include <fcntl.h> +#include <inttypes.h> +#include <linux/sockios.h> +#include <net/if.h> +#include <net/if_arp.h> +#include <netinet/ip.h> +#include <stdarg.h> #include <stddef.h> +#include <stdlib.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> +#include <sys/ioctl.h> +#include <sys/queue.h> +#include <sys/socket.h> +#include <unistd.h> +#include <rte_alarm.h> +#include <rte_bus.h> #include <rte_bus_vdev.h> #include <rte_common.h> #include <rte_config.h> +#include <rte_dev.h> +#include <rte_errno.h> +#include <rte_ethdev.h> +#include <rte_ether.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_PROBE_MS 1000 + +#define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -25,12 +50,495 @@ /** Driver-specific log messages type. */ static int vdev_netvsc_logtype; +/** Context structure for a vdev_netvsc instance. */ +struct vdev_netvsc_ctx { + LIST_ENTRY(vdev_netvsc_ctx) entry; /**< Next entry in list. */ + unsigned int id; /**< Unique ID. */ + char name[64]; /**< Unique name. */ + char devname[64]; /**< Fail-safe instance name. */ + char devargs[256]; /**< Fail-safe device arguments. */ + char if_name[IF_NAMESIZE]; /**< NetVSC netdevice name. */ + unsigned int if_index; /**< NetVSC netdevice index. */ + struct ether_addr if_addr; /**< NetVSC MAC address. */ + int pipe[2]; /**< Fail-safe communication pipe. */ + char yield[256]; /**< PCI sub-device arguments. */ +}; + +/** Context list is common to all driver instances. */ +static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = + LIST_HEAD_INITIALIZER(vdev_netvsc_ctx_list); + +/** Number of entries in context list. */ +static unsigned int vdev_netvsc_ctx_count; + /** Number of driver instances relying on context list. */ static unsigned int vdev_netvsc_ctx_inst; /** + * Destroy a vdev_netvsc context instance. + * + * @param ctx + * Context to destroy. + */ +static void +vdev_netvsc_ctx_destroy(struct vdev_netvsc_ctx *ctx) +{ + if (ctx->pipe[0] != -1) + close(ctx->pipe[0]); + if (ctx->pipe[1] != -1) + close(ctx->pipe[1]); + free(ctx); +} + +/** + * Iterate over system network interfaces. + * + * This function runs a given callback function for each netdevice found on + * the system. + * + * @param func + * Callback function pointer. List traversal is aborted when this function + * returns a nonzero value. + * @param ... + * Variable parameter list passed as @p va_list to @p func. + * + * @return + * 0 when the entire list is traversed successfully, a negative error code + * in case or failure, or the nonzero value returned by @p func when list + * traversal is aborted. + */ +static int +vdev_netvsc_foreach_iface(int (*func)(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap), ...) +{ + struct if_nameindex *iface = if_nameindex(); + int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP); + unsigned int i; + int ret = 0; + + if (!iface) { + ret = -ENOBUFS; + DRV_LOG(ERR, "cannot retrieve system network interfaces"); + goto error; + } + if (s == -1) { + ret = -errno; + DRV_LOG(ERR, "cannot open socket: %s", rte_strerror(errno)); + goto error; + } + for (i = 0; iface[i].if_name; ++i) { + struct ifreq req; + struct ether_addr eth_addr; + va_list ap; + + strncpy(req.ifr_name, iface[i].if_name, sizeof(req.ifr_name)); + if (ioctl(s, SIOCGIFHWADDR, &req) == -1) { + DRV_LOG(WARNING, "cannot retrieve information about" + " interface \"%s\": %s", + req.ifr_name, rte_strerror(errno)); + continue; + } + if (req.ifr_hwaddr.sa_family != ARPHRD_ETHER) { + DRV_LOG(DEBUG, "interface %s is non-ethernet device", + req.ifr_name); + continue; + } + memcpy(eth_addr.addr_bytes, req.ifr_hwaddr.sa_data, + RTE_DIM(eth_addr.addr_bytes)); + va_start(ap, func); + ret = func(&iface[i], ð_addr, ap); + va_end(ap); + if (ret) + break; + } +error: + if (s != -1) + close(s); + if (iface) + if_freenameindex(iface); + return ret; +} + +/** + * Determine if a network interface is NetVSC. + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * + * @return + * A nonzero value when interface is detected as NetVSC. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_iface_is_netvsc(const struct if_nameindex *iface) +{ + static const char temp[] = "/sys/class/net/%s/device/class_id"; + char path[sizeof(temp) + IF_NAMESIZE]; + FILE *f; + int ret; + int len = 0; + + ret = snprintf(path, sizeof(path), temp, iface->if_name); + if (ret == -1 || (size_t)ret >= sizeof(path)) { + rte_errno = ENOBUFS; + return 0; + } + f = fopen(path, "r"); + if (!f) { + rte_errno = errno; + return 0; + } + ret = fscanf(f, NETVSC_CLASS_ID "%n", &len); + if (ret == EOF) + rte_errno = errno; + ret = len == (int)strlen(NETVSC_CLASS_ID); + fclose(f); + return ret; +} + +/** + * Retrieve network interface data from sysfs symbolic link. + * + * @param[out] buf + * Output data buffer. + * @param size + * Output buffer size. + * @param[in] if_name + * Netdevice name. + * @param[in] relpath + * Symbolic link path relative to netdevice sysfs entry. + * + * @return + * 0 on success, a negative error code otherwise. + */ +static int +vdev_netvsc_sysfs_readlink(char *buf, size_t size, const char *if_name, + const char *relpath) +{ + int ret; + + ret = snprintf(buf, size, "/sys/class/net/%s/%s", if_name, relpath); + if (ret == -1 || (size_t)ret >= size) + return -ENOBUFS; + ret = readlink(buf, buf, size); + if (ret == -1) + return -errno; + if ((size_t)ret >= size - 1) + return -ENOBUFS; + buf[ret] = '\0'; + return 0; +} + +/** + * Probe a network interface to associate with vdev_netvsc context. + * + * This function determines if the network device matches the properties of + * the NetVSC interface associated with the vdev_netvsc context and + * communicates its bus address to the fail-safe PMD instance if so. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - struct vdev_netvsc_ctx *ctx: + * Context to associate network interface with. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_device_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + struct vdev_netvsc_ctx *ctx = va_arg(ap, struct vdev_netvsc_ctx *); + char buf[RTE_MAX(sizeof(ctx->yield), 256u)]; + const char *addr; + size_t len; + int ret; + + /* Skip non-matching or unwanted NetVSC interfaces. */ + if (ctx->if_index == iface->if_index) { + if (!strcmp(ctx->if_name, iface->if_name)) + return 0; + DRV_LOG(DEBUG, + "NetVSC interface \"%s\" (index %u) renamed \"%s\"", + ctx->if_name, ctx->if_index, iface->if_name); + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + return 0; + } + if (vdev_netvsc_iface_is_netvsc(iface)) + return 0; + if (!is_same_ether_addr(eth_addr, &ctx->if_addr)) + return 0; + /* Look for associated PCI device. */ + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device/subsystem"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + if (strcmp(addr, "pci")) + return 0; + ret = vdev_netvsc_sysfs_readlink(buf, sizeof(buf), iface->if_name, + "device"); + if (ret) + return 0; + addr = strrchr(buf, '/'); + addr = addr ? addr + 1 : buf; + len = strlen(addr); + if (!len) + return 0; + /* Send PCI device argument to fail-safe PMD instance. */ + if (strcmp(addr, ctx->yield)) + DRV_LOG(DEBUG, "associating PCI device \"%s\" with NetVSC" + " interface \"%s\" (index %u)", addr, ctx->if_name, + ctx->if_index); + memmove(buf, addr, len + 1); + addr = buf; + buf[len] = '\n'; + ret = write(ctx->pipe[1], addr, len + 1); + buf[len] = '\0'; + if (ret == -1) { + if (errno == EINTR || errno == EAGAIN) + return 1; + DRV_LOG(WARNING, "cannot associate PCI device name \"%s\" with" + " interface \"%s\": %s", addr, ctx->if_name, + rte_strerror(errno)); + return 1; + } + if ((size_t)ret != len + 1) { + /* + * Attempt to override previous partial write, no need to + * recover if that fails. + */ + ret = write(ctx->pipe[1], "\n", 1); + (void)ret; + return 1; + } + fsync(ctx->pipe[1]); + memcpy(ctx->yield, addr, len + 1); + return 1; +} + +/** + * Alarm callback that regularly probes system network interfaces. + * + * This callback runs at a frequency determined by VDEV_NETVSC_PROBE_MS as + * long as an vdev_netvsc context instance exists. + * + * @param arg + * Ignored. + */ +static void +vdev_netvsc_alarm(__rte_unused void *arg) +{ + struct vdev_netvsc_ctx *ctx; + int ret; + + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) { + ret = vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + if (ret) + break; + } + if (!vdev_netvsc_ctx_count) + return; + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", + rte_strerror(-ret)); + } +} + +/** + * Probe a NetVSC interface to generate a vdev_netvsc context from. + * + * This function instantiates vdev_netvsc contexts either for all NetVSC + * devices found on the system or only a subset provided as device + * arguments. + * + * It is normally used with vdev_netvsc_foreach_iface(). + * + * @param[in] iface + * Pointer to netdevice description structure (name and index). + * @param[in] eth_addr + * MAC address associated with @p iface. + * @param ap + * Variable arguments list comprising: + * + * - const char *name: + * Name associated with current driver instance. + * + * - struct rte_kvargs *kvargs: + * Device arguments provided to current driver instance. + * + * - unsigned int specified: + * Number of specific netdevices provided as device arguments. + * + * - unsigned int *matched: + * The number of specified netdevices matched by this function. + * + * @return + * A nonzero value when interface matches, 0 otherwise or in case of + * error. + */ +static int +vdev_netvsc_netvsc_probe(const struct if_nameindex *iface, + const struct ether_addr *eth_addr, + va_list ap) +{ + const char *name = va_arg(ap, const char *); + struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + unsigned int specified = va_arg(ap, unsigned int); + unsigned int *matched = va_arg(ap, unsigned int *); + unsigned int i; + struct vdev_netvsc_ctx *ctx; + int ret; + + /* Probe all interfaces when none are specified. */ + if (specified) { + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE)) { + if (!strcmp(pair->value, iface->if_name)) + break; + } else if (!strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) { + struct ether_addr tmp; + + if (sscanf(pair->value, + "%" SCNx8 ":%" SCNx8 ":%" SCNx8 ":" + "%" SCNx8 ":%" SCNx8 ":%" SCNx8, + &tmp.addr_bytes[0], + &tmp.addr_bytes[1], + &tmp.addr_bytes[2], + &tmp.addr_bytes[3], + &tmp.addr_bytes[4], + &tmp.addr_bytes[5]) != 6) { + DRV_LOG(ERR, + "invalid MAC address format" + " \"%s\"", + pair->value); + return -EINVAL; + } + if (is_same_ether_addr(eth_addr, &tmp)) + break; + } + } + if (i == kvargs->count) + return 0; + ++(*matched); + } + /* Weed out interfaces already handled. */ + LIST_FOREACH(ctx, &vdev_netvsc_ctx_list, entry) + if (ctx->if_index == iface->if_index) + break; + if (ctx) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is already handled," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + if (!vdev_netvsc_iface_is_netvsc(iface)) { + if (!specified) + return 0; + DRV_LOG(WARNING, + "interface \"%s\" (index %u) is not NetVSC," + " skipping", + iface->if_name, iface->if_index); + return 0; + } + /* Create interface context. */ + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + ret = -errno; + DRV_LOG(ERR, "cannot allocate context for interface \"%s\": %s", + iface->if_name, rte_strerror(errno)); + goto error; + } + ctx->id = vdev_netvsc_ctx_count; + strncpy(ctx->if_name, iface->if_name, sizeof(ctx->if_name)); + ctx->if_index = iface->if_index; + ctx->if_addr = *eth_addr; + ctx->pipe[0] = -1; + ctx->pipe[1] = -1; + ctx->yield[0] = '\0'; + if (pipe(ctx->pipe) == -1) { + ret = -errno; + DRV_LOG(ERR, + "cannot allocate control pipe for interface \"%s\": %s", + ctx->if_name, rte_strerror(errno)); + goto error; + } + for (i = 0; i != RTE_DIM(ctx->pipe); ++i) { + int flf = fcntl(ctx->pipe[i], F_GETFL); + + if (flf != -1 && + fcntl(ctx->pipe[i], F_SETFL, flf | O_NONBLOCK) != -1) + continue; + ret = -errno; + DRV_LOG(ERR, "cannot toggle non-blocking flag on control file" + " descriptor #%u (%d): %s", i, ctx->pipe[i], + rte_strerror(errno)); + goto error; + } + /* Generate virtual device name and arguments. */ + i = 0; + ret = snprintf(ctx->name, sizeof(ctx->name), "%s_id%u", + name, ctx->id); + if (ret == -1 || (size_t)ret >= sizeof(ctx->name)) + ++i; + ret = snprintf(ctx->devname, sizeof(ctx->devname), "net_failsafe_%s", + ctx->name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devname)) + ++i; + ret = snprintf(ctx->devargs, sizeof(ctx->devargs), + "fd(%d),dev(net_tap_%s,remote=%s)", + ctx->pipe[0], ctx->name, ctx->if_name); + if (ret == -1 || (size_t)ret >= sizeof(ctx->devargs)) + ++i; + if (i) { + ret = -ENOBUFS; + DRV_LOG(ERR, "generated virtual device name or argument list" + " too long for interface \"%s\"", ctx->if_name); + goto error; + } + /* Request virtual device generation. */ + DRV_LOG(DEBUG, "generating virtual device \"%s\" with arguments \"%s\"", + ctx->devname, ctx->devargs); + vdev_netvsc_foreach_iface(vdev_netvsc_device_probe, ctx); + ret = rte_eal_hotplug_add("vdev", ctx->devname, ctx->devargs); + if (ret) + goto error; + LIST_INSERT_HEAD(&vdev_netvsc_ctx_list, ctx, entry); + ++vdev_netvsc_ctx_count; + DRV_LOG(DEBUG, "added NetVSC interface \"%s\" to context list", + ctx->if_name); + return 0; +error: + if (ctx) + vdev_netvsc_ctx_destroy(ctx); + return ret; +} + +/** * Probe NetVSC interfaces. * + * This function probes system netdevices according to the specified device + * arguments and starts a periodic alarm callback to notify the resulting + * fail-safe PMD instances of their sub-devices whereabouts. + * * @param dev * Virtual device context for driver instance. * @@ -49,12 +557,40 @@ const char *args = rte_vdev_device_args(dev); struct rte_kvargs *kvargs = rte_kvargs_parse(args ? args : "", vdev_netvsc_arg); + unsigned int specified = 0; + unsigned int matched = 0; + unsigned int i; + int ret; DRV_LOG(DEBUG, "invoked as \"%s\", using arguments \"%s\"", name, args); if (!kvargs) { DRV_LOG(ERR, "cannot parse arguments list"); goto error; } + for (i = 0; i != kvargs->count; ++i) { + const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; + + if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + ++specified; + } + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + /* Gather interfaces. */ + ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, + specified, &matched); + if (ret < 0) + goto error; + if (matched < specified) + DRV_LOG(WARNING, + "some of the specified parameters did not match" + " recognized network interfaces"); + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, + vdev_netvsc_alarm, NULL); + if (ret < 0) { + DRV_LOG(ERR, "unable to schedule alarm callback: %s", + rte_strerror(-ret)); + goto error; + } error: if (kvargs) rte_kvargs_free(kvargs); @@ -65,6 +601,9 @@ /** * Remove driver instance. * + * The alarm callback and underlying vdev_netvsc context instances are only + * destroyed after the last PMD instance is removed. + * * @param dev * Virtual device context for driver instance. * @@ -74,7 +613,16 @@ static int vdev_netvsc_vdev_remove(__rte_unused struct rte_vdev_device *dev) { - --vdev_netvsc_ctx_inst; + if (--vdev_netvsc_ctx_inst) + return 0; + rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); + while (!LIST_EMPTY(&vdev_netvsc_ctx_list)) { + struct vdev_netvsc_ctx *ctx = LIST_FIRST(&vdev_netvsc_ctx_list); + + LIST_REMOVE(ctx, entry); + --vdev_netvsc_ctx_count; + vdev_netvsc_ctx_destroy(ctx); + } return 0; } -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 6/8] net/vdev_netvsc: skip routed netvsc probing 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (4 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 5/8] net/vdev_netvsc: implement core functionality Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad ` (2 subsequent siblings) 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet Cc: Thomas Monjalon, dev, Raslan Darawsheh NetVSC netdevices which are already routed should not be probed because they are used for management purposes by the HyperV. prevent routed netvsc devices probing. Signed-off-by: Raslan Darawsheh <rasland@mellanox.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 2 +- drivers/net/vdev_netvsc/vdev_netvsc.c | 46 +++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index fde1fb8..f779862 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -87,4 +87,4 @@ The following device parameters are supported: MAC address. Not specifying either ``iface`` or ``mac`` makes this driver attach itself to -all NetVSC interfaces found on the system. +all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 21c3265..0055d0b 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -39,6 +39,7 @@ #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" +#define NETVSC_MAX_ROUTE_LINE_SIZE 300 #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ @@ -198,6 +199,44 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = } /** + * Determine if a network interface has a route. + * + * @param[in] name + * Network device name. + * + * @return + * A nonzero value when interface has an route. In case of error, + * rte_errno is updated and 0 returned. + */ +static int +vdev_netvsc_has_route(const char *name) +{ + FILE *fp; + int ret = 0; + char route[NETVSC_MAX_ROUTE_LINE_SIZE]; + char *netdev; + + fp = fopen("/proc/net/route", "r"); + if (!fp) { + rte_errno = errno; + return 0; + } + while (fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) { + netdev = strtok(route, "\t"); + if (strcmp(netdev, name) == 0) { + ret = 1; + break; + } + /* Move file pointer to the next line. */ + while (strchr(route, '\n') == NULL && + fgets(route, NETVSC_MAX_ROUTE_LINE_SIZE, fp) != NULL) + ; + } + fclose(fp); + return ret; +} + +/** * Retrieve network interface data from sysfs symbolic link. * * @param[out] buf @@ -459,6 +498,13 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = iface->if_name, iface->if_index); return 0; } + /* Routed NetVSC should not be probed. */ + if (vdev_netvsc_has_route(iface->if_name)) { + DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", + iface->if_name, iface->if_index); + if (!specified) + return 0; + } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); if (!ctx) { -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 7/8] net/vdev_netvsc: add "force" parameter 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (5 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-20 1:15 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Ferruh Yigit 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev This parameter allows specifying any non-NetVSC interface or routed NetVSC interfaces to use with tap sub-devices for development purposes. Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com> Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 5 +++++ drivers/net/vdev_netvsc/vdev_netvsc.c | 30 +++++++++++++++++++----------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index f779862..3c26990 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -86,5 +86,10 @@ The following device parameters are supported: Same as ``iface`` except a suitable NetVSC interface is located using its MAC address. +- ``force`` [int] + + If nonzero, forces the use of specified interfaces even if not detected as + NetVSC or detected as routed NETVSC. + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 0055d0b..2d03033 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -36,6 +36,7 @@ #define VDEV_NETVSC_DRIVER net_vdev_netvsc #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" +#define VDEV_NETVSC_ARG_FORCE "force" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -419,6 +420,9 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = * - struct rte_kvargs *kvargs: * Device arguments provided to current driver instance. * + * - int force: + * Accept specified interface even if not detected as NetVSC. + * * - unsigned int specified: * Number of specific netdevices provided as device arguments. * @@ -436,6 +440,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = { const char *name = va_arg(ap, const char *); struct rte_kvargs *kvargs = va_arg(ap, struct rte_kvargs *); + int force = va_arg(ap, int); unsigned int specified = va_arg(ap, unsigned int); unsigned int *matched = va_arg(ap, unsigned int *); unsigned int i; @@ -490,20 +495,18 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = return 0; } if (!vdev_netvsc_iface_is_netvsc(iface)) { - if (!specified) + if (!specified || !force) return 0; DRV_LOG(WARNING, - "interface \"%s\" (index %u) is not NetVSC," - " skipping", + "using non-NetVSC interface \"%s\" (index %u)", iface->if_name, iface->if_index); - return 0; } /* Routed NetVSC should not be probed. */ if (vdev_netvsc_has_route(iface->if_name)) { - DRV_LOG(WARNING, "NetVSC interface \"%s\" (index %u) is routed", - iface->if_name, iface->if_index); - if (!specified) + if (!specified || !force) return 0; + DRV_LOG(WARNING, "using routed NetVSC interface \"%s\"" + " (index %u)", iface->if_name, iface->if_index); } /* Create interface context. */ ctx = calloc(1, sizeof(*ctx)); @@ -597,6 +600,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = static const char *const vdev_netvsc_arg[] = { VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, + VDEV_NETVSC_ARG_FORCE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -605,6 +609,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = vdev_netvsc_arg); unsigned int specified = 0; unsigned int matched = 0; + int force = 0; unsigned int i; int ret; @@ -616,14 +621,16 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = for (i = 0; i != kvargs->count; ++i) { const struct rte_kvargs_pair *pair = &kvargs->pairs[i]; - if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || - !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) + if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) + force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || + !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, - specified, &matched); + force, specified, &matched); if (ret < 0) goto error; if (matched < specified) @@ -682,7 +689,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_ALIAS(VDEV_NETVSC_DRIVER, eth_vdev_netvsc); RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " - VDEV_NETVSC_ARG_MAC "=<string>"); + VDEV_NETVSC_ARG_MAC "=<string> " + VDEV_NETVSC_ARG_FORCE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* [dpdk-dev] [PATCH v6 8/8] net/vdev_netvsc: add automatic probing 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (6 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad @ 2018-01-18 13:51 ` Matan Azrad 2018-01-20 1:15 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Ferruh Yigit 8 siblings, 0 replies; 112+ messages in thread From: Matan Azrad @ 2018-01-18 13:51 UTC (permalink / raw) To: Ferruh Yigit, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev Using DPDK in Hyper-V VM systems requires vdev_netvsc driver to pair the NetVSC netdev device with the same MAC address PCI device by fail-safe PMD. Add vdev_netvsc custom scan in vdev bus to allow automatic probing in Hyper-V VM systems unless it was already specified by command line. Add "ignore" parameter to disable this auto-detection. Signed-off-by: Matan Azrad <matan@mellanox.com> --- doc/guides/nics/vdev_netvsc.rst | 9 ++++-- drivers/net/vdev_netvsc/vdev_netvsc.c | 55 +++++++++++++++++++++++++++++++++-- 2 files changed, 60 insertions(+), 4 deletions(-) diff --git a/doc/guides/nics/vdev_netvsc.rst b/doc/guides/nics/vdev_netvsc.rst index 3c26990..55d130a 100644 --- a/doc/guides/nics/vdev_netvsc.rst +++ b/doc/guides/nics/vdev_netvsc.rst @@ -71,8 +71,8 @@ Build options Run-time parameters ------------------- -To invoke this driver, applications have to explicitly provide the -``--vdev=net_vdev_netvsc`` EAL option. +This driver is invoked automatically in Hyper-V VM systems unless the user +invoked it by command line using ``--vdev=net_vdev_netvsc`` EAL option. The following device parameters are supported: @@ -91,5 +91,10 @@ The following device parameters are supported: If nonzero, forces the use of specified interfaces even if not detected as NetVSC or detected as routed NETVSC. +- ``ignore`` [int] + + If nonzero, ignores the driver runnig (actually used to disable the + auto-detection in Hyper-V VM). + Not specifying either ``iface`` or ``mac`` makes this driver attach itself to all unrouted NetVSC interfaces found on the system. diff --git a/drivers/net/vdev_netvsc/vdev_netvsc.c b/drivers/net/vdev_netvsc/vdev_netvsc.c index 2d03033..a8a1a7f 100644 --- a/drivers/net/vdev_netvsc/vdev_netvsc.c +++ b/drivers/net/vdev_netvsc/vdev_netvsc.c @@ -30,13 +30,16 @@ #include <rte_errno.h> #include <rte_ethdev.h> #include <rte_ether.h> +#include <rte_hypervisor.h> #include <rte_kvargs.h> #include <rte_log.h> #define VDEV_NETVSC_DRIVER net_vdev_netvsc +#define VDEV_NETVSC_DRIVER_NAME RTE_STR(VDEV_NETVSC_DRIVER) #define VDEV_NETVSC_ARG_IFACE "iface" #define VDEV_NETVSC_ARG_MAC "mac" #define VDEV_NETVSC_ARG_FORCE "force" +#define VDEV_NETVSC_ARG_IGNORE "ignore" #define VDEV_NETVSC_PROBE_MS 1000 #define NETVSC_CLASS_ID "{f8615163-df3e-46c5-913f-f2d2f965ed0e}" @@ -45,7 +48,7 @@ #define DRV_LOG(level, ...) \ rte_log(RTE_LOG_ ## level, \ vdev_netvsc_logtype, \ - RTE_FMT(RTE_STR(VDEV_NETVSC_DRIVER) ": " \ + RTE_FMT(VDEV_NETVSC_DRIVER_NAME ": " \ RTE_FMT_HEAD(__VA_ARGS__,) "\n", \ RTE_FMT_TAIL(__VA_ARGS__,))) @@ -601,6 +604,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = VDEV_NETVSC_ARG_IFACE, VDEV_NETVSC_ARG_MAC, VDEV_NETVSC_ARG_FORCE, + VDEV_NETVSC_ARG_IGNORE, NULL, }; const char *name = rte_vdev_device_name(dev); @@ -610,6 +614,7 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = unsigned int specified = 0; unsigned int matched = 0; int force = 0; + int ignore = 0; unsigned int i; int ret; @@ -623,10 +628,17 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (!strcmp(pair->key, VDEV_NETVSC_ARG_FORCE)) force = !!atoi(pair->value); + else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IGNORE)) + ignore = !!atoi(pair->value); else if (!strcmp(pair->key, VDEV_NETVSC_ARG_IFACE) || !strcmp(pair->key, VDEV_NETVSC_ARG_MAC)) ++specified; } + if (ignore) { + if (kvargs) + rte_kvargs_free(kvargs); + return 0; + } rte_eal_alarm_cancel(vdev_netvsc_alarm, NULL); /* Gather interfaces. */ ret = vdev_netvsc_foreach_iface(vdev_netvsc_netvsc_probe, name, kvargs, @@ -690,7 +702,8 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = RTE_PMD_REGISTER_PARAM_STRING(net_vdev_netvsc, VDEV_NETVSC_ARG_IFACE "=<string> " VDEV_NETVSC_ARG_MAC "=<string> " - VDEV_NETVSC_ARG_FORCE "=<int>"); + VDEV_NETVSC_ARG_FORCE "=<int> " + VDEV_NETVSC_ARG_IGNORE "=<int>"); /** Initialize driver log type. */ RTE_INIT(vdev_netvsc_init_log) @@ -699,3 +712,41 @@ static LIST_HEAD(, vdev_netvsc_ctx) vdev_netvsc_ctx_list = if (vdev_netvsc_logtype >= 0) rte_log_set_level(vdev_netvsc_logtype, RTE_LOG_NOTICE); } + +/** Compare function for vdev find device operation. */ +static int +vdev_netvsc_cmp_rte_device(const struct rte_device *dev1, + __rte_unused const void *_dev2) +{ + return strcmp(dev1->devargs->name, VDEV_NETVSC_DRIVER_NAME); +} + +/** + * A callback called by vdev bus scan function to ensure this driver probing + * automatically in Hyper-V VM system unless it already exists in the + * devargs list. + */ +static void +vdev_netvsc_scan_callback(__rte_unused void *arg) +{ + struct rte_vdev_device *dev; + struct rte_devargs *devargs; + struct rte_bus *vbus = rte_bus_find_by_name("vdev"); + + TAILQ_FOREACH(devargs, &devargs_list, next) + if (!strcmp(devargs->name, VDEV_NETVSC_DRIVER_NAME)) + return; + dev = (struct rte_vdev_device *)vbus->find_device(NULL, + vdev_netvsc_cmp_rte_device, VDEV_NETVSC_DRIVER_NAME); + if (dev) + return; + if (rte_eal_devargs_add(RTE_DEVTYPE_VIRTUAL, VDEV_NETVSC_DRIVER_NAME)) + DRV_LOG(ERR, "unable to add netvsc devargs."); +} + +/** Initialize the custom scan. */ +RTE_INIT(vdev_netvsc_custom_scan_add) +{ + if (rte_hypervisor_get() == RTE_HYPERVISOR_HYPERV) + rte_vdev_add_custom_scan(vdev_netvsc_scan_callback, NULL); +} -- 1.8.3.1 ^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad ` (7 preceding siblings ...) 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 8/8] net/vdev_netvsc: add automatic probing Matan Azrad @ 2018-01-20 1:15 ` Ferruh Yigit 8 siblings, 0 replies; 112+ messages in thread From: Ferruh Yigit @ 2018-01-20 1:15 UTC (permalink / raw) To: Matan Azrad, Adrien Mazarguil, Gaetan Rivet; +Cc: Thomas Monjalon, dev On 1/18/2018 1:51 PM, Matan Azrad wrote: > Virtual machines hosted by Hyper-V/Azure platforms are fitted with simplified virtual network devices named NetVSC that are used for fast communication between VM to VM, VM to hypervisor, and the outside. > > They appear as standard system netdevices to user-land applications, the main difference being they are implemented on top of VMBUS instead of emulated PCI devices. > > While this reads like a case for a standard DPDK PMD, there is more to it. > > To accelerate outside communication, NetVSC devices as they appear in a VM can be paired with physical SR-IOV virtual function (VF) devices owned by that same VM. Both netdevices share the same MAC address in that case. > > When paired, egress and most of the ingress traffic flow through the VF device, while part of it (e.g. multicasts, hypervisor control data) still flows through NetVSC. Moreover VF devices are not retained and disappear during VM migration; from a VM standpoint, they can be hot-plugged anytime with NetVSC acting as a fallback. > > Running DPDK applications in such a context involves driving VF devices using their dedicated PMDs in a vendor-independent fashion (to benefit from maximum performance without writing dedicated code) while simultaneously listening to NetVSC and handling the related hot-plug events. > > This new virtual driver (referred to as "vdev_netvsc" from this point on) automatically coordinates the Hyper-V/Azure-specific management part described above by relying on vendor-specific, failsafe and tap PMDs to expose a single consolidated Ethernet device usable directly by existing applications. > > .------------------. > | DPDK application | > `--------+---------' > | > .------+------. > | DPDK ethdev | > `------+------' Control > | | > .------------+------------. v .--------------------. > | failsafe PMD +---------+ vdev_netvsc driver | > `--+-------------------+--' `--------------------' > | | > | .........|......... > | : | : > .----+----. : .----+----. : > | tap PMD | : | any PMD | : > `----+----' : `----+----' : <-- Hot-pluggable > | : | : > .------+-------. : .-----+-----. : > | NetVSC-based | : | SR-IOV VF | : > | netdevice | : | device | : > `--------------' : `-----------' : > :.................: > > > > v2 changes(Adrien): > > - Renamed driver from "hyperv" to "vdev_netvsc". This change covers > documentation and symbols prefix. > - Driver is now tagged EXPERIMENTAL. > - Replaced ether_addr_from_str() with a basic sscanf() call. > - Removed debugging code (memset() poisoning). > - Fixed hyperv_iface_is_netvsc()'s buffer allocation according to comments. > - Removed hyperv_basename(). > - Discarded unused variables through __rte_unused. > - Added separate but necessary free() bugfix for failsafe PMD. > - Added file descriptor input support to failsafe PMD. > - Replaced temporary bash execution; failsafe now reads device definitions > directly through a pipe without an intermediate bash one-liner. > - Expanded DEBUG/INFO/WARN/ERROR() macros as PMD_DRV_LOG(). > - Added dynamic log type (pmd.vdev_netvsc). > - Modified initialization code to probe devices immediately during startup. > - Fixed several snprintf() return value checks ("ret >= sizeof(foo)" is more > appropriate than "ret >= sizeof(foo) - 1"). > > v3 changes(Matan): > - Fixed clang compilation in V2. > - Removed hotplug remove code from the new driver. > - Supported probed sub-devices getting in fail-safe. > - Added automatic probing for HyperV VM systems. > - Added option to ignore the automatic probing. > - Skiped routed NetVSC devices probing. > - Adjusted documentation and semantics. > - Replaced maintainer. > > v4 changes(Matan): > - Align descriptions of context struct(Stephen suggestion). > - Skip non-ethernet devices in netdev loop(Stephen suggestion). > - Use different variable names in "add fd parameter"(Gaetan suggestion). > - Change name of get port id function in "add automatic probing"(Gaetan suggestion). > - Update internal fail-safe devargs in case of probed device(Gaetan suggestion). > - use deferent commit title instead of "support probed sub-devices getting"(Gaetan suggestion). > > v5 changes(Matan): > - Improve fail-safe documentation as Gaetan suggested. > - Fix fcntl paramenter. > > v6 changes: > - fp!=NULL => fp==NULL in "add fd parameter". > > Adrien Mazarguil (1): > net/failsafe: fix invalid free > > Matan Azrad (7): > net/failsafe: add "fd" parameter > net/failsafe: add probed etherdev capture > net/vdev_netvsc: introduce Hyper-V platform driver > net/vdev_netvsc: implement core functionality > net/vdev_netvsc: skip routed netvsc probing > net/vdev_netvsc: add "force" parameter > net/vdev_netvsc: add automatic probing Series applied to dpdk-next-net/master, thanks. ^ permalink raw reply [flat|nested] 112+ messages in thread
end of thread, other threads:[~2018-01-20 1:15 UTC | newest] Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20171124160801.GU4062@6wind.com> [not found] ` <20171124164812.GV4062@6wind.com> 2017-11-24 17:21 ` [dpdk-dev] [RFC] Introduce virtual PMD for Hyper-V/Azure platforms Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 0/3] " Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 1/3] net/hyperv: introduce MS Hyper-V platform driver Adrien Mazarguil 2017-12-18 18:28 ` Stephen Hemminger 2017-12-18 19:54 ` Thomas Monjalon 2017-12-18 21:17 ` Stephen Hemminger 2017-12-19 10:01 ` Adrien Mazarguil 2017-12-19 11:15 ` Thomas Monjalon 2017-12-19 13:13 ` Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 2/3] net/hyperv: implement core functionality Adrien Mazarguil 2017-12-18 17:04 ` Wiles, Keith 2017-12-18 17:59 ` Adrien Mazarguil 2017-12-18 18:43 ` Wiles, Keith 2017-12-19 8:25 ` Nelio Laranjeiro 2017-12-18 18:26 ` Stephen Hemminger 2017-12-18 20:21 ` Adrien Mazarguil 2017-12-18 21:03 ` Thomas Monjalon 2017-12-18 21:19 ` Stephen Hemminger 2017-12-18 18:34 ` Stephen Hemminger 2017-12-18 20:23 ` Adrien Mazarguil 2017-12-19 9:53 ` Bruce Richardson 2017-12-19 10:15 ` Adrien Mazarguil 2017-12-19 15:31 ` Stephen Hemminger 2017-12-18 23:59 ` Stephen Hemminger 2017-12-19 10:01 ` Adrien Mazarguil 2017-12-19 15:37 ` Stephen Hemminger 2017-12-19 1:54 ` Ferruh Yigit 2017-12-19 15:06 ` Adrien Mazarguil 2017-12-19 20:44 ` Ferruh Yigit 2017-12-20 14:13 ` Thomas Monjalon 2017-12-21 16:19 ` Adrien Mazarguil 2017-12-18 16:46 ` [dpdk-dev] [PATCH v1 3/3] net/hyperv: add "force" parameter Adrien Mazarguil 2017-12-18 18:23 ` [dpdk-dev] [PATCH v1 0/3] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger 2017-12-18 20:13 ` Thomas Monjalon 2017-12-19 0:40 ` Stephen Hemminger 2017-12-18 20:21 ` Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 0/5] " Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 1/5] net/failsafe: fix invalid free Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 2/5] net/failsafe: add "fd" parameter Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 3/5] net/vdev_netvsc: introduce Hyper-V platform driver Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 4/5] net/vdev_netvsc: implement core functionality Adrien Mazarguil 2017-12-22 18:01 ` [dpdk-dev] [PATCH v2 5/5] net/vdev_netvsc: add "force" parameter Adrien Mazarguil 2017-12-23 2:06 ` [dpdk-dev] [PATCH v2 0/5] Introduce virtual PMD for Hyper-V/Azure platforms Stephen Hemminger 2017-12-23 14:28 ` Thomas Monjalon 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 0/8] Introduce virtual driver " Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-16 10:24 ` Gaëtan Rivet 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 2/8] net/failsafe: add "fd" parameter Matan Azrad 2018-01-16 10:54 ` Gaëtan Rivet 2018-01-16 11:19 ` Gaëtan Rivet 2018-01-16 16:17 ` Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting Matan Azrad 2018-01-16 11:09 ` Gaëtan Rivet 2018-01-16 12:27 ` Matan Azrad 2018-01-16 14:40 ` Gaëtan Rivet 2018-01-16 16:15 ` Matan Azrad 2018-01-16 16:54 ` Gaëtan Rivet 2018-01-16 17:20 ` Matan Azrad 2018-01-16 22:31 ` Gaëtan Rivet 2018-01-17 8:40 ` Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality Matan Azrad 2018-01-09 18:49 ` Stephen Hemminger 2018-01-10 15:02 ` Matan Azrad 2018-01-17 16:51 ` Thomas Monjalon 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad 2018-01-09 18:51 ` Stephen Hemminger 2018-01-10 15:07 ` Matan Azrad 2018-01-10 16:43 ` Stephen Hemminger 2018-01-11 9:00 ` Matan Azrad 2018-01-17 16:59 ` Thomas Monjalon 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad 2018-01-09 14:47 ` [dpdk-dev] [PATCH v3 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 2/8] net/failsafe: add "fd" parameter Matan Azrad 2018-01-18 8:51 ` Gaëtan Rivet 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 3/8] net/failsafe: add probed etherdev capture Matan Azrad 2018-01-18 9:10 ` Gaëtan Rivet 2018-01-18 9:33 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 5/8] net/vdev_netvsc: implement core functionality Matan Azrad 2018-01-18 18:25 ` Stephen Hemminger 2018-01-18 18:28 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad 2018-01-18 18:26 ` Stephen Hemminger 2018-01-18 18:47 ` Thomas Monjalon 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad 2018-01-18 18:27 ` Stephen Hemminger 2018-01-18 18:30 ` Matan Azrad 2018-01-18 8:43 ` [dpdk-dev] [PATCH v4 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 2/8] net/failsafe: add "fd" parameter Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 3/8] net/failsafe: add probed etherdev capture Matan Azrad 2018-01-18 10:08 ` Gaëtan Rivet 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 5/8] net/vdev_netvsc: implement core functionality Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad 2018-01-18 10:01 ` [dpdk-dev] [PATCH v5 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 1/8] net/failsafe: fix invalid free Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 2/8] net/failsafe: add "fd" parameter Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 3/8] net/failsafe: add probed etherdev capture Matan Azrad 2018-01-18 22:34 ` Thomas Monjalon 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 4/8] net/vdev_netvsc: introduce Hyper-V platform driver Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 5/8] net/vdev_netvsc: implement core functionality Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 6/8] net/vdev_netvsc: skip routed netvsc probing Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 7/8] net/vdev_netvsc: add "force" parameter Matan Azrad 2018-01-18 13:51 ` [dpdk-dev] [PATCH v6 8/8] net/vdev_netvsc: add automatic probing Matan Azrad 2018-01-20 1:15 ` [dpdk-dev] [PATCH v6 0/8] Introduce virtual driver for Hyper-V/Azure platforms Ferruh Yigit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).