From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4E21441B9B; Wed, 1 Feb 2023 09:27:57 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0604141148; Wed, 1 Feb 2023 09:27:57 +0100 (CET) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by mails.dpdk.org (Postfix) with ESMTP id 9251A4113F for ; Wed, 1 Feb 2023 09:27:55 +0100 (CET) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 1AD5C5C0130; Wed, 1 Feb 2023 03:27:53 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 01 Feb 2023 03:27:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1675240073; x= 1675326473; bh=UGmD57fq4KsvLGfBztiQd10AIpgBNkDD/s6a/DTkA9c=; b=n 6WJHnnuigtqlCLQhkx5yCORgRuQgW9h4eS8k5mURA+0ZA2u985yjChrXhO9+IZ99 EeBkgE9t4JnH+QIohP/bCagM4TeKQteIUO/dDS5cvjANG5sYFJjDDaTS8OmL3KMU T/hul/hj9FIMP2SiG/fkd2JY4PoesPpuUbMw82K2MFN1QgjtaVDh9Ba3C/ck2eVM DCDaCCBEzh2ymJHE+7lunpr1t9t/9Hk0SVjfTag4klEK5zZ07pUFNzL1Jbbk+OqX V4gZwaUGOnck3EP542jTx4a0zPzaiS9emH2XJgIJu5Daf3eRMv7uZQvnj8BR63m/ 41gPlzAg6bgszAcUiH3Eg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1675240073; x= 1675326473; bh=UGmD57fq4KsvLGfBztiQd10AIpgBNkDD/s6a/DTkA9c=; b=j 6eJhKAYy5FGAZHeZpLB4SghRaUQkm3WLxIbIkywv4MGt/wQTf4m6I6iowLv6mx1a tETmTxAmYdAQgSUo5SbM2SoesHMgtL5Au7qIVejhADg2fWC+7yjqL8Zp/V7Cw7Qr Ya+C6KDupNDoWhIuDt/ZohJKAVxvkpcLdtDMfbjvEbhyKDz8AmyYeCa2ua2hUrvb YabsMyLtFFKt9QrdQXsnmGorzeVALWZPVxkVEn1KTeJPP+qyED67XUfwKRQ+gEAt PsnAd0IdNShTKCSbWjTU+Pm+lrk74+r6S7vF4JTo4yJCLNWyJM3/Dk6ZRVaL7xk2 Z9Bw211yqBR5sW4o3y08A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrudefhedguddvvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvfevufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhho mhgrshcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqne cuggftrfgrthhtvghrnheptdejieeifeehtdffgfdvleetueeffeehueejgfeuteeftddt ieekgfekudehtdfgnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilh hfrhhomhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvth X-ME-Proxy: Feedback-ID: i47234305:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 1 Feb 2023 03:27:51 -0500 (EST) From: Thomas Monjalon To: Rongwei Liu , Andrew Rybchenko Cc: dev@dpdk.org, matan@nvidia.com, viacheslavo@nvidia.com, orika@nvidia.com, jerinjacobk@gmail.com, stephen@networkplumber.org, rasland@nvidia.com, Ferruh Yigit , jerinj@marvell.com Subject: Re: [PATCH v4 2/3] ethdev: add standby state for live migration Date: Wed, 01 Feb 2023 09:27:49 +0100 Message-ID: <2738874.BEx9A2HvPv@thomas> In-Reply-To: <798721cd-92b8-3ee1-0aaa-20a63bb1cd67@oktetlabs.ru> References: <20221221090017.3715030-2-rongweil@nvidia.com> <20230118154447.595231-3-rongweil@nvidia.com> <798721cd-92b8-3ee1-0aaa-20a63bb1cd67@oktetlabs.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 01/02/2023 08:52, Andrew Rybchenko: > On 1/18/23 18:44, Rongwei Liu wrote: > > +* **Added process state in ethdev to improve live migration.** > > + > > + Hot upgrade of an application may be accelerated by configuring > > + the new application in standby state while the old one is still active. > > + Such double ethdev configuration of the same device is possible > > + with the added process state API. > > + > > Will we have the same API for other device classes? Any ideas > how to avoid duplication? (I'm afraid we don't have generic > place to put such functionality). It could be needed to have such feature in other classes, yes. That's a device feature, so it must be in each class, with different flags/options per class. > > + /* Check if all devices support process role. */ > > + RTE_ETH_FOREACH_DEV(port_id) { > > + dev = &rte_eth_devices[port_id]; > > + if (*dev->dev_ops->process_set_role != NULL && > > + *dev->dev_ops->dev_infos_get != NULL && > > + (*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 && > > + (dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0) > > Why do we need both non-NULL callback and dev_capa flag? > Isn't just non-NULL callback sufficient? We could have a callback returning ENOTSUP. Anyway we need the capability for application query. > > + continue; > > + rte_errno = ENOTSUP; > > + return -rte_errno; > > + } > > + /* Call the driver callbacks. */ > > + RTE_ETH_FOREACH_DEV(port_id) { > > + dev = &rte_eth_devices[port_id]; > > + if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0) > > Please, add error logging on failure. good idea > > > + goto failure; > > IMHO it would be useful to have info logs on success. When setting the role, we could have debug log, yes. > > + ret++; > > ret variable name is very confusing, please, be > more specific in variable naming. > > > + } > > + return ret; > > + > > +failure: > > + /* Rollback all changed devices in case one failed. */ > > + if (ret) { > > Compare vs 0 > > > + RTE_ETH_FOREACH_DEV(port_id) { > > + dev = &rte_eth_devices[port_id]; > > + (*dev->dev_ops->process_set_role)(dev, !standby, flags); > > IMHO, it would be useful to have error logs on rollback > failure and info logs on success. > > > + if (--ret == 0) > > + break; > > + } > > + } > > + rte_errno = EPERM; > > Why is it always EPERM? Isn't it better to forward > failure code from driver? > > > + return -rte_errno; > > +} Thanks for the good detailed review Andrew. > > --- a/lib/ethdev/rte_ethdev.h > > +++ b/lib/ethdev/rte_ethdev.h > > @@ -1606,6 +1606,8 @@ struct rte_eth_conf { > > #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3) > > /** Device supports keeping shared flow objects across restart. */ > > #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4) > > +/** Device supports process role changing. @see rte_eth_process_set_active */ > > There is no rte_eth_process_set_active() > > > +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE RTE_BIT64(5) > > /**@}*/ > > > > /* > > @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id); > > int rte_eth_dev_owner_get(const uint16_t port_id, > > struct rte_eth_dev_owner *owner); > > > > +/** > > + * @warning > > + * @b EXPERIMENTAL: this API may change without prior notice > > + * > > + * Set the role of the process to active or standby, > > + * affecting network traffic handling. > > + * > > + * If one device does not support this operation or fails, > > + * the whole operation is failed and rolled back. > > + * > > + * It is forbidden to have multiple processes with the same role > > + * unless only one of them is configured to handle the traffic. > > Why is it not allowed to have many processes in standby? > > > + * > > + * The application is active by default. > > + * The configuration from the active process is effective immediately > > + * while the configuration from the standby process is queued by hardware. > > I'm afraid error handling could be tricky in this case. > I think it makes sense to highlight that configuration > propagation failures will be returned on attempt to set > the process active. > > > + * When configuring the device from a standby process, > > + * it has no effect except for below situations: > > + * - traffic not handled by the active process configuration > > + * - no active process > > + * > > + * When a process is changed from a standby to an active role, > > + * all preceding configurations that are queued by hardware > > + * should become effective immediately. > > + * Before role transition, all the traffic handling configurations > > + * set by the active process should be flushed first. > > + * > > + * In summary, the operations are expected to happen in this order > > + * in "old" and "new" applications: > > + * device: already configured by the old application > > + * new: start as active > > + * new: probe the same device > > + * new: set as standby > > + * new: configure the device > > + * device: has configurations from old and new applications > > + * old: clear its device configuration > > + * device: has only 1 configuration from new application > > + * new: set as active > > + * device: downtime for connecting all to the new application > > + * old: shutdown > > + * > > + * @param standby > > + * Role active if false, standby if true. > > Typically API with boolean parameters is bad. May be in this > particular case it is better to have two functions: > rte_eth_process_set_active() and rte_eth_process_set_standby(). Why? It could be an enum as well. > > + * @param flags > > + * Role specific flags. > > Please, define namespace for flags. > > > + * @return > > + * Positive value on success, -rte_errno value on error: > > + * - (> 0) Number of switched devices. > > Isn't is success if there is no devices to switch? It should be >= 0 I guess.