From: Thomas Monjalon <thomas@monjalon.net>
To: Rongwei Liu <rongweil@nvidia.com>,
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: dev@dpdk.org, matan@nvidia.com, viacheslavo@nvidia.com,
orika@nvidia.com, jerinjacobk@gmail.com,
stephen@networkplumber.org, rasland@nvidia.com,
Ferruh Yigit <ferruh.yigit@amd.com>,
jerinj@marvell.com
Subject: Re: [PATCH v4 2/3] ethdev: add standby state for live migration
Date: Wed, 01 Feb 2023 09:27:49 +0100 [thread overview]
Message-ID: <2738874.BEx9A2HvPv@thomas> (raw)
In-Reply-To: <798721cd-92b8-3ee1-0aaa-20a63bb1cd67@oktetlabs.ru>
01/02/2023 08:52, Andrew Rybchenko:
> On 1/18/23 18:44, Rongwei Liu wrote:
> > +* **Added process state in ethdev to improve live migration.**
> > +
> > + Hot upgrade of an application may be accelerated by configuring
> > + the new application in standby state while the old one is still active.
> > + Such double ethdev configuration of the same device is possible
> > + with the added process state API.
> > +
>
> Will we have the same API for other device classes? Any ideas
> how to avoid duplication? (I'm afraid we don't have generic
> place to put such functionality).
It could be needed to have such feature in other classes, yes.
That's a device feature, so it must be in each class,
with different flags/options per class.
> > + /* Check if all devices support process role. */
> > + RTE_ETH_FOREACH_DEV(port_id) {
> > + dev = &rte_eth_devices[port_id];
> > + if (*dev->dev_ops->process_set_role != NULL &&
> > + *dev->dev_ops->dev_infos_get != NULL &&
> > + (*dev->dev_ops->dev_infos_get)(dev, &dev_info) == 0 &&
> > + (dev_info.dev_capa & RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
>
> Why do we need both non-NULL callback and dev_capa flag?
> Isn't just non-NULL callback sufficient?
We could have a callback returning ENOTSUP.
Anyway we need the capability for application query.
> > + continue;
> > + rte_errno = ENOTSUP;
> > + return -rte_errno;
> > + }
> > + /* Call the driver callbacks. */
> > + RTE_ETH_FOREACH_DEV(port_id) {
> > + dev = &rte_eth_devices[port_id];
> > + if ((*dev->dev_ops->process_set_role)(dev, standby, flags) < 0)
>
> Please, add error logging on failure.
good idea
>
> > + goto failure;
>
> IMHO it would be useful to have info logs on success.
When setting the role, we could have debug log, yes.
> > + ret++;
>
> ret variable name is very confusing, please, be
> more specific in variable naming.
>
> > + }
> > + return ret;
> > +
> > +failure:
> > + /* Rollback all changed devices in case one failed. */
> > + if (ret) {
>
> Compare vs 0
>
> > + RTE_ETH_FOREACH_DEV(port_id) {
> > + dev = &rte_eth_devices[port_id];
> > + (*dev->dev_ops->process_set_role)(dev, !standby, flags);
>
> IMHO, it would be useful to have error logs on rollback
> failure and info logs on success.
>
> > + if (--ret == 0)
> > + break;
> > + }
> > + }
> > + rte_errno = EPERM;
>
> Why is it always EPERM? Isn't it better to forward
> failure code from driver?
>
> > + return -rte_errno;
> > +}
Thanks for the good detailed review Andrew.
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
> > #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3)
> > /** Device supports keeping shared flow objects across restart. */
> > #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> > +/** Device supports process role changing. @see rte_eth_process_set_active */
>
> There is no rte_eth_process_set_active()
>
> > +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE RTE_BIT64(5)
> > /**@}*/
> >
> > /*
> > @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t owner_id);
> > int rte_eth_dev_owner_get(const uint16_t port_id,
> > struct rte_eth_dev_owner *owner);
> >
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Set the role of the process to active or standby,
> > + * affecting network traffic handling.
> > + *
> > + * If one device does not support this operation or fails,
> > + * the whole operation is failed and rolled back.
> > + *
> > + * It is forbidden to have multiple processes with the same role
> > + * unless only one of them is configured to handle the traffic.
>
> Why is it not allowed to have many processes in standby?
>
> > + *
> > + * The application is active by default.
> > + * The configuration from the active process is effective immediately
> > + * while the configuration from the standby process is queued by hardware.
>
> I'm afraid error handling could be tricky in this case.
> I think it makes sense to highlight that configuration
> propagation failures will be returned on attempt to set
> the process active.
>
> > + * When configuring the device from a standby process,
> > + * it has no effect except for below situations:
> > + * - traffic not handled by the active process configuration
> > + * - no active process
> > + *
> > + * When a process is changed from a standby to an active role,
> > + * all preceding configurations that are queued by hardware
> > + * should become effective immediately.
> > + * Before role transition, all the traffic handling configurations
> > + * set by the active process should be flushed first.
> > + *
> > + * In summary, the operations are expected to happen in this order
> > + * in "old" and "new" applications:
> > + * device: already configured by the old application
> > + * new: start as active
> > + * new: probe the same device
> > + * new: set as standby
> > + * new: configure the device
> > + * device: has configurations from old and new applications
> > + * old: clear its device configuration
> > + * device: has only 1 configuration from new application
> > + * new: set as active
> > + * device: downtime for connecting all to the new application
> > + * old: shutdown
> > + *
> > + * @param standby
> > + * Role active if false, standby if true.
>
> Typically API with boolean parameters is bad. May be in this
> particular case it is better to have two functions:
> rte_eth_process_set_active() and rte_eth_process_set_standby().
Why?
It could be an enum as well.
> > + * @param flags
> > + * Role specific flags.
>
> Please, define namespace for flags.
>
> > + * @return
> > + * Positive value on success, -rte_errno value on error:
> > + * - (> 0) Number of switched devices.
>
> Isn't is success if there is no devices to switch?
It should be >= 0 I guess.
next prev parent reply other threads:[~2023-02-01 8:27 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-01 8:20 [RFC 0/2] add API to set process to primary or secondary Rongwei Liu
2022-12-01 8:20 ` [RFC 1/2] ethdev: add group description Rongwei Liu
2022-12-01 8:20 ` [RFC 2/2] ethdev: add API to set process to primary or secondary Rongwei Liu
2022-12-01 15:10 ` Stephen Hemminger
2022-12-02 3:27 ` Rongwei Liu
2022-12-05 16:08 ` Stephen Hemminger
2022-12-06 3:47 ` Rongwei Liu
2022-12-06 5:54 ` Stephen Hemminger
2022-12-21 9:00 ` [RFC v3 0/2] add API to set process to active or standby Rongwei Liu
2022-12-21 9:00 ` [RFC v3 1/2] ethdev: add group description Rongwei Liu
2023-01-18 15:44 ` [PATCH v4 0/3] add API for live migration Rongwei Liu
2023-01-18 15:44 ` [PATCH v4 1/3] ethdev: add flow rule group description Rongwei Liu
2023-01-31 11:53 ` Ori Kam
2023-02-06 12:15 ` Rongwei Liu
2023-02-07 2:57 ` [PATCH v5] " Rongwei Liu
2023-02-08 20:28 ` Ferruh Yigit
2023-02-09 2:06 ` Rongwei Liu
2023-02-09 7:32 ` [PATCH v6] " Rongwei Liu
2023-02-09 8:01 ` Ori Kam
2023-02-09 11:26 ` Ferruh Yigit
2023-01-18 15:44 ` [PATCH v4 2/3] ethdev: add standby state for live migration Rongwei Liu
2023-01-31 13:50 ` Ori Kam
2023-01-31 18:14 ` Jerin Jacob
2023-01-31 22:55 ` Thomas Monjalon
2023-02-01 7:32 ` Andrew Rybchenko
2023-02-01 8:31 ` Thomas Monjalon
2023-02-01 8:40 ` Jerin Jacob
2023-02-01 8:46 ` Thomas Monjalon
2023-02-02 10:23 ` Rongwei Liu
2023-02-01 7:52 ` Andrew Rybchenko
2023-02-01 8:27 ` Thomas Monjalon [this message]
2023-02-01 8:40 ` Andrew Rybchenko
2023-01-18 15:44 ` [PATCH v4 3/3] ethdev: add standby flags " Rongwei Liu
2023-01-23 13:20 ` Jerin Jacob
2023-01-30 2:47 ` Rongwei Liu
2023-01-30 17:10 ` Jerin Jacob
2023-01-31 2:53 ` Rongwei Liu
2023-01-31 8:45 ` Jerin Jacob
2023-01-31 9:01 ` Rongwei Liu
2023-01-31 14:37 ` Jerin Jacob
2023-01-31 14:45 ` Ori Kam
2023-01-31 17:50 ` Thomas Monjalon
2023-01-31 18:10 ` Jerin Jacob
2022-12-21 9:00 ` [RFC v3 2/2] ethdev: add API to set process to active or standby Rongwei Liu
2022-12-21 9:12 ` Jerin Jacob
2022-12-21 9:32 ` Rongwei Liu
2022-12-21 10:59 ` Jerin Jacob
2022-12-21 12:05 ` Rongwei Liu
2022-12-21 12:44 ` Jerin Jacob
2022-12-21 12:50 ` Rongwei Liu
2022-12-21 13:12 ` Jerin Jacob
2022-12-21 14:33 ` Rongwei Liu
2022-12-26 16:44 ` Ori Kam
2023-01-15 22:46 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2738874.BEx9A2HvPv@thomas \
--to=thomas@monjalon.net \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=jerinj@marvell.com \
--cc=jerinjacobk@gmail.com \
--cc=matan@nvidia.com \
--cc=orika@nvidia.com \
--cc=rasland@nvidia.com \
--cc=rongweil@nvidia.com \
--cc=stephen@networkplumber.org \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).