From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 78B3B1F5 for ; Mon, 17 Jul 2017 15:57:30 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Jul 2017 06:57:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,374,1496127600"; d="scan'208";a="108918014" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.255.132.50]) ([10.255.132.50]) by orsmga004.jf.intel.com with ESMTP; 17 Jul 2017 06:56:56 -0700 To: Gaetan Rivet , dev@dpdk.org References: <7390b7f14a1925ece0c55c6b1df8da358c725017.1500130634.git.gaetan.rivet@6wind.com> From: Ferruh Yigit Message-ID: <784ab02a-2454-f209-ac52-ff1e53ec1a63@intel.com> Date: Mon, 17 Jul 2017 14:56:54 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <7390b7f14a1925ece0c55c6b1df8da358c725017.1500130634.git.gaetan.rivet@6wind.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v10 03/11] net/failsafe: add fail-safe PMD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Jul 2017 13:57:32 -0000 On 7/15/2017 6:57 PM, Gaetan Rivet wrote: > Introduce the fail-safe poll mode driver initialization and enable its > build infrastructure. > > This PMD allows for applications to benefit from true hot-plugging > support without having to implement it. > > It intercepts and manages Ethernet device removal events issued by > slave PMDs and re-initializes them transparently when brought back. > It also allows defining a contingency to the removal of a device, by > designating a fail-over device that will take on transmitting operations > if the preferred device is removed. > > Applications only see a fail-safe instance, without caring for > underlying activity ensuring their continued operations. All PMD in a single patch is hard to review, I am sure some details missed during the review, but taking account the histroy of the PMD I accept this as it is, but I will rely on your support to fix issues in the future. > > Signed-off-by: Gaetan Rivet > Acked-by: Olga Shern <...> > +Usage example > +~~~~~~~~~~~~~ > + > +This section shows some example of using **testpmd** with a fail-safe PMD. > + > +#. Request huge pages: > + > + .. code-block:: console > + > + echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages I think this is extra for usage sample, if you want there is generic guide [1] that you ca reference. [1] http://dpdk.org/doc/guides/nics/build_and_test.html > + > +#. Start testpmd. The slave device should be blacklisted from normal EAL > + operations to avoid probing it twice when in PCI blacklist mode. > + > + .. code-block:: console > + > + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ > + -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)' > + -b 84:00.0 -b 00:04.0 -- -i Do you think does it make sense to stress sub-device shouldn't be probed by EAL, I believe it is not clear above. > + > + Note that PCI blacklist mode is the default PCI operating mode. In this > + configuration, the fail-safe cannot proceed with its slaves if they have > + been probed beforehand. > + > +#. Alternatively, it can be used alongside any other device in whitelist mode. > + > + .. code-block:: console > + > + $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \ > + -w 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)' > + -w 81:00.0 -- -i > + <...> > +[Features] > +Link status = Y > +MTU update = Y > +Promiscuous mode = Y > +Allmulticast mode = Y > +VLAN filter = Y > +Packet type parsing = Y I am not sure how to document some of these features, because they depends on sub-device capability. I guess if sub-device doesn't support packet type parsing, this feature won't be supported? > +Basic stats = Y > +Stats per queue = Y > +Unicast MAC filter = Y > +Queue start/stop = Y > +Jumbo frame = Y > +Multicast MAC filter = Y Is above ones supported by PMD, I don't see them unless I miss something. + "Flow Control" seems supported. <...> > +# This lib depends upon: > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf DEPDIRS-y no more used, can be removed. > + > +# Basic CFLAGS: > +CFLAGS += -std=gnu99 -Wall -Wextra -Wall should be coming from $(WERROR_FLAGS), no need to add here. And are you sure about gnu99, mlx drivers tends to enforce a standard and they updated to c11, do you want to do same here. > +CFLAGS += -O3 > +CFLAGS += -I. > +CFLAGS += -D_DEFAULT_SOURCE > +CFLAGS += -D_XOPEN_SOURCE=700 Is there a reason for these variables, or are these copy-paste? > +CFLAGS += $(WERROR_FLAGS) > +CFLAGS += -Wno-strict-prototypes > +CFLAGS += -pedantic -DPEDANTIC Again, just question, is pedantic mode intentional, or copy-paste? <...> > +static int > +fs_eth_dev_create(struct rte_vdev_device *vdev) > +{ > + struct rte_eth_dev *dev; > + struct ether_addr *mac; > + struct fs_priv *priv; > + struct sub_device *sdev; > + const char *params; > + unsigned int socket_id; > + uint8_t i; > + int ret; > + > + dev = NULL; > + priv = NULL; > + params = rte_vdev_device_args(vdev); > + socket_id = rte_socket_id(); > + INFO("Creating fail-safe device on NUMA socket %u", > + socket_id); No line break required. > + dev = rte_eth_vdev_allocate(vdev, sizeof(*priv)); > + if (dev == NULL) { > + ERROR("Unable to allocate rte_eth_dev"); > + return -1; > + } > + priv = dev->data->dev_private; > + PRIV(dev)->dev = dev; Altough this is valid, what about? priv = PRIV(dev); priv->dev = dev; > + dev->dev_ops = &failsafe_ops; > + TAILQ_INIT(&dev->link_intr_cbs); Not required, rte_eth_dev_allocate() initializes this. > + dev->data->dev_flags = 0x0; Not required to set zero, dev->data already memset to zero. > + dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0]; > + dev->data->dev_link = eth_link; > + PRIV(dev)->nb_mac_addr = 1; > + dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst; > + dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst; > + if (params == NULL) { I would prefer input control as first thing in the function, before allocating the device. > + ERROR("This PMD requires sub-devices, none provided"); > + goto free_dev; > + } > + ret = fs_sub_device_create(dev, params); This function looks like just allocates memory for sub devices, does it make sense to rename it as fs_sub_device_alloc()? > + if (ret) { > + ERROR("Could not allocate sub_devices"); > + goto free_dev; > + } <...> > +free_args: > + failsafe_args_free(dev); > +free_subs: > + fs_sub_device_free(dev); > +free_dev: > + rte_eth_dev_release_port(dev); Device private data should be freed. > + return -1; > +} <...> > +static int > +rte_pmd_failsafe_probe(struct rte_vdev_device *vdev) > +{ > + const char *name; > + > + name = rte_vdev_device_name(vdev); > + if (vdev == NULL) > + return -EINVAL; I think you don't need this check, if name is NULL, probe shouldn't be called, same for remove(). > + INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s", > + name); Line break not required. <...> > +RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv); > +RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe); I belive alias is not required for new PMDs, this is for backward compability for old drivers. <...> > +int > +failsafe_args_parse(struct rte_eth_dev *dev, const char *params) > +{ > + struct fs_priv *priv; > + char mut_params[DEVARGS_MAXLEN] = ""; Out of curiosity, what does "mut" stands for? > + struct rte_kvargs *kvlist = NULL; > + unsigned int arg_count; > + size_t n; > + int ret; > + > + if (dev == NULL || params == NULL) > + return -EINVAL; This check looks like redundant. > + priv = PRIV(dev); > + ret = 0; > + priv->subs_tx = FAILSAFE_MAX_ETHPORTS; > + /* default parameters */ > + mac_from_arg = 0; This is global value, I believe it is better to set default value where variable defined with a comment. Here is easy to miss the default value. > + n = snprintf(mut_params, sizeof(mut_params), "%s", params); > + if (n >= sizeof(mut_params)) { > + ERROR("Parameter string too long (>=%zu)", > + sizeof(mut_params)); > + return -ENOMEM; > + } > + ret = fs_parse_sub_devices(fs_parse_device_param, > + dev, params); Why the device argument is not defined as dev=xxx, instead of current dev(xxx). "dev=xxx" will be compatible with rest of the argument usage, and it will be possible to use kvargs to parse it, which will make this code simpler I believe. What is the reason of using different syntax? > + if (ret < 0) > + return ret; > + ret = fs_remove_sub_devices_definition(mut_params); > + if (ret < 0) > + return ret; > + if (strnlen(mut_params, sizeof(mut_params)) > 0) { > + kvlist = rte_kvargs_parse(mut_params, > + pmd_failsafe_init_parameters); > + if (kvlist == NULL) { > + ERROR("Error parsing parameters, usage:\n" > + PMD_FAILSAFE_PARAM_STRING); > + return -1; > + } > + /* MAC addr */ > + arg_count = rte_kvargs_count(kvlist, > + PMD_FAILSAFE_MAC_KVARG); > + if (arg_count == 1) { > + ret = rte_kvargs_process(kvlist, > + PMD_FAILSAFE_MAC_KVARG, > + &fs_get_mac_addr_arg, > + &dev->data->mac_addrs[0]); > + if (ret < 0) > + goto free_kvlist; > + mac_from_arg = 1; > + } Is ignoring the case mac defined more than once intentional? > + } > +free_kvlist: > + rte_kvargs_free(kvlist); > + return ret; > +} > + <...> > +static int > +fs_count_device(struct rte_eth_dev *dev, const char *param, > + uint8_t head __rte_unused) > +{ > + size_t b = 0; > + > + while (param[b] != '(' && > + param[b] != '\0') > + b++; > + if (strncmp(param, "dev", b) && > + strncmp(param, "exec", b)) { I believe param "exec" will be introduced in further patches? > + ERROR("Unrecognized device type: %.*s", (int)b, param); > + return -EINVAL; > + } > + PRIV(dev)->subs_tail += 1; > + return 0; > +} > + <...> > +static int > +fs_bus_init(struct rte_eth_dev *dev) > +{ > + struct sub_device *sdev; > + struct rte_devargs *da; > + uint8_t i; > + int ret; > + > + FOREACH_SUBDEV(sdev, i, dev) { Can FOREACH_SUBDEV_ST(..., DEV_PARSED) be used here? And what do you think renaming "FOREACH_SUBDEV_ST" to "FOREACH_SUBDEV_STATE"? > + if (sdev->state != DEV_PARSED) > + continue; > + da = &sdev->devargs; > + ret = rte_eal_hotplug_add(da->bus->name, > + da->name, > + da->args); <...> > + /* > + * We only update TX_SUBDEV if we are not started. > + * If a sub_device is emitting, we will switch the TX_SUBDEV to the > + * preferred port only upon starting it, so that the switch is smoother. > + */ > + if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) Can you please document concept of the "prefered sub device" in documentation? > + if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) && > + (TX_SUBDEV(dev) == NULL || > + (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) { > + DEBUG("Switching tx_dev to preferred sub_device"); > + PRIV(dev)->subs_tx = 0; > + } <...> > +static struct rte_eth_dev_info default_infos = { > + .driver_name = pmd_failsafe_driver_name, This should be dev->device->driver->name, but already overwriiten by rte_eth_dev_info_get() so you can drop this. > + /* Max possible number of elements */ <...> > + FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) { > + DEBUG("Closing sub_device %d", i); > + rte_eth_dev_close(PORT_ID(sdev)); > + sdev->state = DEV_ACTIVE - 1; Should it be better to set state to DEV_PROBED? Instead of calculation. > + } > + fs_dev_free_queues(dev); > +} > + <...> > +static void > +fs_stats_get(struct rte_eth_dev *dev, > + struct rte_eth_stats *stats) > +{ > + memset(stats, 0, sizeof(*stats)); memset not required, done by API > + if (TX_SUBDEV(dev) == NULL) > + return; > + rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats); > +} > + <...> > + sdev = TX_SUBDEV(dev); > + rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos); > + PRIV(dev)->infos.rx_offload_capa = rx_offload_capa; Is intention &= ? > + PRIV(dev)->infos.tx_offload_capa &= > + default_infos.tx_offload_capa; > + PRIV(dev)->infos.flow_type_rss_offloads &= > + default_infos.flow_type_rss_offloads; > + } > + rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos)); > +} <...> > + FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) { > + DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i); > + ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu); > + if (ret) { > + ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d" > + " with error %d", i, ret); You can prefer to not break the log message. > + return ret; > + } > + } > + return 0; > +} <...> > + > +#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000 Is this related to next patches in the set? <...> > +enum dev_state { > + DEV_UNDEFINED = 0, Setting value not required. > + DEV_PARSED, > + DEV_PROBED, > + DEV_ACTIVE, > + DEV_STARTED, > +}; <...>