DPDK patches and discussions
 help / color / mirror / Atom feed
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Bruce Richardson <bruce.richardson@intel.com>
Subject: RE: [RFC 0/4] ethdev: rework config restore
Date: Thu, 10 Oct 2024 12:08:48 +0000	[thread overview]
Message-ID: <CH3PR12MB8460ED019BDDFF59B8DE8592A4782@CH3PR12MB8460.namprd12.prod.outlook.com> (raw)
In-Reply-To: <5ef19db7-f6fe-4a48-8497-e17b33a262b5@amd.com>

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, October 10, 2024 01:17
> To: Dariusz Sosnowski <dsosnowski@nvidia.com>; Konstantin Ananyev
> <konstantin.ananyev@huawei.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
> <thomas@monjalon.net>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>
> Subject: Re: [RFC 0/4] ethdev: rework config restore
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, October 9, 2024 03:08
> >> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>; Dariusz
> >> Sosnowski <dsosnowski@nvidia.com>; NBU-Contact-Thomas Monjalon
> >> (EXTERNAL) <thomas@monjalon.net>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>
> >> Subject: Re: [RFC 0/4] ethdev: rework config restore
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote:
> >>>
> >>>
> >>>>>>>> We have been working on optimizing the latency of calls to
> >>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of the
> >>>>>>>> work requires changes in the implementation of
> >>>>>>>> .dev_start() PMD callback, but I also wanted to start a
> >>>>>>>> discussion regarding configuration restore.
> >>>>>>>>
> >>>>>>>> rte_eth_dev_start() does a few things on top of calling
> >>>>>>>> .dev_start()
> >> callback:
> >>>>>>>>
> >>>>>>>> - Before calling it:
> >>>>>>>>     - eth_dev_mac_restore() - if device supports
> >>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >>>>>>>> - After calling it:
> >>>>>>>>     - eth_dev_mac_restore() - if device does not support
> >>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
> >>>>>>>>     - restore promiscuous config
> >>>>>>>>     - restore all multicast config
> >>>>>>>>
> >>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses -
> >>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls
> >>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these
> >>>>>>>> MAC
> >> addresses.
> >>>>>>>>
> >>>>>>>> Promiscuous config restore checks if promiscuous mode is
> >>>>>>>> enabled or not, and calls .promiscuous_enable() or
> >>>>>>>> .promiscuous_disable()
> >> callback.
> >>>>>>>>
> >>>>>>>> All multicast config restore checks if all multicast mode is
> >>>>>>>> enabled or not, and calls .allmulticast_enable() or
> >>>>>>>> .allmulticast_disable()
> >> callback.
> >>>>>>>>
> >>>>>>>> Callbacks are called directly in all of these cases, to bypass
> >>>>>>>> the checks for applying the same configuration, which exist in
> >>>>>>>> relevant
> >> APIs.
> >>>>>>>> Checks are bypassed to force drivers to reapply the configuration.
> >>>>>>>>
> >>>>>>>> Let's consider what happens in the following sequence of API calls.
> >>>>>>>>
> >>>>>>>> 1. rte_eth_dev_configure()
> >>>>>>>> 2. rte_eth_tx_queue_setup()
> >>>>>>>> 3. rte_eth_rx_queue_setup()
> >>>>>>>> 4. rte_eth_promiscuous_enable()
> >>>>>>>>     - Call dev->dev_ops->promiscuous_enable()
> >>>>>>>>     - Stores promiscuous state in dev->data->promiscuous 5.
> >>>>>>>> rte_eth_allmulticast_enable()
> >>>>>>>>     - Call dev->dev_ops->allmulticast_enable()
> >>>>>>>>     - Stores allmulticast state in dev->data->allmulticast 6.
> >>>>>>>> rte_eth_dev_start()
> >>>>>>>>     - Call dev->dev_ops->dev_start()
> >>>>>>>>     - Call dev->dev_ops->mac_addr_set() - apply default MAC address
> >>>>>>>>     - Call dev->dev_ops->promiscuous_enable()
> >>>>>>>>     - Call dev->dev_ops->allmulticast_enable()
> >>>>>>>>
> >>>>>>>> Even though all configuration is available in dev->data after
> >>>>>>>> step 5, library forces reapplying this configuration in step 6.
> >>>>>>>>
> >>>>>>>> In mlx5 PMD case all relevant callbacks require communication
> >>>>>>>> with the kernel driver, to configure the device (mlx5 PMD must
> >>>>>>>> create/destroy new kernel flow rules and/or change netdev config).
> >>>>>>>>
> >>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), so
> >>>>>>>> the following forced callbacks force additional communication
> >>>>>>>> with the kernel. The
> >>>>>>> same configuration is applied multiple times.
> >>>>>>>>
> >>>>>>>> As an optimization, mlx5 PMD could check if a given
> >>>>>>>> configuration was applied, but this would duplicate the
> >>>>>>>> functionality of the library (for example
> >>>>>>>> rte_eth_promiscuous_enable() does not call the driver if
> >>>>>>>> dev->data->promiscuous is set).
> >>>>>>>>
> >>>>>>>> Question: Since all of the configuration is available before
> >>>>>>>> .dev_start() callback is called, why ethdev library does not
> >>>>>>>> expect .dev_start() to
> >>>>>>> take this configuration into account?
> >>>>>>>> In other words, why library has to reapply the configuration?
> >>>>>>>>
> >>>>>>>> I could not find any particular reason why configuration
> >>>>>>>> restore exists as part of the process (it was in the initial DPDK commit).
> >>>>>>>>
> >>>>>>>
> >>>>>>> My assumption is .dev_stop() cause these values reset in some
> >>>>>>> devices, so
> >>>>>>> .dev_start() restores them back.
> >>>>>>> @Bruce or @Konstantin may remember the history.
> >>>>>
> >>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() ad
> >>>>> dec_stop() and even dev_start() to make sure that HW is in a clean
> >>>>> (known)
> >> state.
> >>>>>
> >>>>>>>
> >>>>>>> But I agree this is device specific behavior, and can be managed
> >>>>>>> by what device requires.
> >>>>>
> >>>>> Probably yes.
> >>>>>
> >>>>>>>
> >>>>>>>> The patches included in this RFC, propose a mechanism which
> >>>>>>>> would help with managing which drivers rely on forceful configuration
> restore.
> >>>>>>>> Drivers could advertise if forceful configuration restore is
> >>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If
> >>>>>>>> this flag is set, then the driver in question requires ethdev
> >>>>>>>> to forcefully restore
> >>>>>>> configuration.
> >>>>>>>>
> >>>>>>>
> >>>>>>> OK to use flag for it, but not sure about using 'dev_info->dev_flags'
> >>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and
> >>>>>>> this is all dpdk internal.
> >>>>>>>
> >>>>>>> What about to have a dedicated flag for it? We can have a
> >>>>>>> dedicated set of flag values for restore.
> >>>>>>
> >>>>>> Agreed. What do you think about the following?
> >>>>>
> >>>>> Instead of exposing that, can we probably make it transparent to
> >>>>> the user and probably ethdev layer too?
> >>>>>
> >>>>
> >>>> +1 to make it transparent to user, but not sure if we can make it
> >>>> transparent to ethdev layer.
> >>>
> >>> Just to be clear:
> >>> Let say, using example from above:
> >>>
> >>>  rte_eth_dev_start()
> >>>      - Call dev->dev_ops->dev_start()
> >>>      - Call dev->dev_ops->mac_addr_set() - apply default MAC address
> >>>      - Call dev->dev_ops->promiscuous_enable()
> >>>      - Call dev->dev_ops->allmulticast_enable()
> >>>
> >>> We probably can introduce ethdev internal function (still visible to
> >>> PMDs) that would do last 3 steps:
> >>> ethdev_replay_user_conf(...)
> >>>      - Call dev->dev_ops->mac_addr_set() - apply default MAC address
> >>>      - Call dev->dev_ops->promiscuous_enable()
> >>>      - Call dev->dev_ops->allmulticast_enable()
> >>>
> >>> And let PMD itself to decide does it needs to call it at dev_start() or not.
> >>> So it will become:
> >>> rte_eth_dev_start()
> >>>      - Call dev->dev_ops->dev_start()
> >>>       -Call ethdev_replay_user_conf(.)
> >>>               - Call dev->dev_ops->mac_addr_set() - apply default MAC address
> >>>               - Call dev->dev_ops->promiscuous_enable()
> >>>               -Call dev->dev_ops->allmulticast_enable()
> >>>
> >>> For PMDs that do need to restore user provided config And
> >>> rte_eth_dev_start()
> >>>      - Call dev->dev_ops->dev_start()
> >>>
> >>> For those who do not.
> >>>
> >>
> >> OK, got it what you mean.
> >> Pushing restore functionality to PMDs works, but this may be doing
> >> redundant work on each PMD.
> >>
> >> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to
> >> restore and common code in ethdev does the work.
> >> My below dedicated data struct comment is to have this flag in a new
> >> struct, overall like following:
> >>
> >> rte_eth_dev_start()
> >>    - Call dev->dev_ops->dev_start()
> >>    - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, &flags)
> >>    - if (flags & MAC) dev->dev_ops->mac_addr_set()
> >>    - if (flags & PROMISC) dev->dev_ops->promiscuous_enable()
> >>    - ...
> >
> > Could you please explain what is the benefit of exposing flags through dev_ops
> callback vs a dedicated flags field in rte_eth_dev_data?
> > In both solutions:
> > - config restore is transparent to the user,
> > - drivers can omit config restore (either by not implementing the
> > callback or not providing the flags),
> > - an ABI change is introduced (not a huge concern, at least for 24.11).
> >
> > I understand that my initial proposal with "internal_flags" was too
> > vague, but renaming and splitting this field into:
> >
> > - dev_start_restore_flags
> > - dev_reset_restore_flags
> > - and so on...
> >
> > seems sufficient, at least in my opinion.
> >
> 
> Hi Dariusz,
> 
> Putting flags to rte_eth_dev_data works, and it is easier since there is direct access
> from rte_eth_dev to rte_eth_dev_data, so you don't need new dev_ops. So this is
> a valid option.
> 
> But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean.
> 
> "struct rte_eth_dev_data" is integral data structure for ethdev and it is used in
> multiple locations, mostly related to the datapath and all drivers needs to deal
> with fields of this struct.
> Like [rx]_queues, dev_private, dev_conf all important and used a lot.
> 
> I want to protect "struct rte_eth_dev_data" from noise as much as possible,
> though what is noise is not always that clear.
> 
> This restore flag is not critical, and I expect most of the drivers won't care and
> populate this restore flag at all. That is why to me it is better have dedicated struct
> for it and only drivers care about restore feature know it.

I see. Thank you very much for the explanation.

In this case, it looks like adding this to dev_ops is the way to go.

So, summarizing it all:

1. dev_ops should be extended with a callback with the following signature and enums/flags:

enum rte_eth_dev_operation op {
	RTE_ETH_START,
	RTE_ETH_STOP,
	RTE_ETH_RESET,
};

#define RTE_ETH_RESTORE_MAC_ADDR RTE_BIT32(0)
#define RTE_ETH_RESTORE_PROMISC RTE_BIT32(1)
#define RTE_ETH_RESTORE_ALLMULTI RTE_BIT32(2)

void (*get_restore_flags)(
	struct rte_eth_dev *dev,
	enum rte_eth_dev_operation op,
	uint32_t *flags);

2. rte_eth_dev_start() will work as follows:

- Call dev->dev_ops->dev_start()
- Call dev->dev_ops->get_restore_flags(dev, RTE_ETH_START, &flags). If callback is not provided, assume flags == 0.
- if (flags & RTE_ETH_RESTORE_MAC_ADDR) - restore MAC addresses
- and so on...

Also, I would like to add the following:

3. Patchset introducing this change should add get_restore_flags() implementation to all drivers, which informs that all config should be restored.
This would preserve the current behavior.
Later, this could be refined driver by driver.

What do you think?

Also, there's an open question about 'stop' and 'reset' operations.
At the moment, ethdev layer does not do any config manipulation during these operations.
Maybe we should limit get_restore_flags() to 'start' only?

> 
> 
> 
> >>
> >> So PMDs only will provide what to restore with an internal API and
> >> common ethdev layer will restore it.
> >> If no restore required PMD may not implement .get_restore_flags() at all.
> >>
> >> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided
> >> to internal API to get what to restore in different states...
> >>
> >>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be
> >>>> confusing and open to interpretation what to use it for and by time
> >>>> become source of defect.
> >>>
> >>> Yes, same thoughts.
> >>>
> >>>> Instead what do you think to have a separate, dedicated data struct for it?
> >>>
> >>> Hmm... not sure I understood you here...
> >>>
> >>>>
> >>>>> Might be we can move this restoration code into the new ethdev
> >>>>> helper function,(ethdevd_user_config_restore()  or so) that PMD
> >>>>> can invoke
> >> during its dev_start() if needed?
> >>>>>
> >>>>>>
> >>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE
> RTE_BIT32(0)
> >>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE
> RTE_BIT32(1)
> >>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE
> >> RTE_BIT32(2)
> >>>>>>
> >>>>>> struct rte_eth_dev_data {
> >>>>>>    /* snip */
> >>>>>>
> >>>>>>    uint32_t dev_flags;
> >>>>>>
> >>>>>>    /**
> >>>>>>     * Internal device capabilities, used only by ethdev library.
> >>>>>>     * Certain functionalities provided by the library might
> enabled/disabled,
> >>>>>>     * based on driver exposing certain capabilities.
> >>>>>>     */
> >>>>>>    uint32_t internal_flags;
> >>>>>>
> >>>>>>    /* snip */
> >>>>>> };
> >>>>>>
> >>>>>>> Also perhaps we have go into details what needs to be restored
> >>>>>>> after 'stop' and what needs to be restored after 'reset' and use
> >>>>>>> similar
> >> mechanism etc...
> >>>>>>
> >>>>>> I think we should look into that.
> >>>>>> Any 'codification' of semantics between drivers and ethdev
> >>>>>> library is good in
> >> my opinion.
> >>>>>>
> >>>>>> At least right now, ethdev does not change any configuration in
> >>>>>> 'stop' and
> >> 'reset' from what I see.
> >>>>>> But that's on library side only.
> >>>>>>
> >>>>>>>> This way, if we would conclude that it makes sense for
> >>>>>>>> .dev_start() to handle all starting configuration aspects, we
> >>>>>>>> could track which drivers still rely
> >>>>>>> on configuration restore.
> >>>>>>>>
> >>>>>>>> Dariusz Sosnowski (4):
> >>>>>>>>   ethdev: rework config restore
> >>>>>>>>   ethdev: omit promiscuous config restore if not required
> >>>>>>>>   ethdev: omit all multicast config restore if not required
> >>>>>>>>   ethdev: omit MAC address restore if not required
> >>>>>>>>
> >>>>>>>>  lib/ethdev/rte_ethdev.c | 39
> >>>>>>>> ++++++++++++++++++++++++++++++++++-----
> >>>>>>>>  lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++
> >>>>>>>>  2 files changed, 52 insertions(+), 5 deletions(-)
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 2.39.5
> >>>>>>>>
> >>>>>
> >>>
> >


  reply	other threads:[~2024-10-10 12:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-18  9:21 Dariusz Sosnowski
2024-09-18  9:21 ` [RFC 1/4] " Dariusz Sosnowski
2024-09-18  9:21 ` [RFC 2/4] ethdev: omit promiscuous config restore if not required Dariusz Sosnowski
2024-09-18  9:22 ` [RFC 3/4] ethdev: omit all multicast " Dariusz Sosnowski
2024-09-18  9:22 ` [RFC 4/4] ethdev: omit MAC address " Dariusz Sosnowski
2024-09-29 23:31 ` [RFC 0/4] ethdev: rework config restore Ferruh Yigit
2024-10-04 19:13   ` Dariusz Sosnowski
2024-10-07  9:27     ` Konstantin Ananyev
2024-10-07 22:56       ` Ferruh Yigit
2024-10-08 17:21         ` Konstantin Ananyev
2024-10-09  1:07           ` Ferruh Yigit
2024-10-09 10:54             ` Konstantin Ananyev
2024-10-09 16:18             ` Dariusz Sosnowski
2024-10-09 23:16               ` Ferruh Yigit
2024-10-10 12:08                 ` Dariusz Sosnowski [this message]
2024-10-10 12:51                   ` Ferruh Yigit
2024-10-10 16:23                     ` Dariusz Sosnowski
2024-10-10 17:08                       ` Ferruh Yigit
2024-10-10 22:58                         ` Konstantin Ananyev
2024-10-11  0:02                           ` Ferruh Yigit
2024-10-11  8:23                             ` Dariusz Sosnowski
2024-10-11  8:29                             ` Konstantin Ananyev
2024-10-11  9:37                               ` Dariusz Sosnowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CH3PR12MB8460ED019BDDFF59B8DE8592A4782@CH3PR12MB8460.namprd12.prod.outlook.com \
    --to=dsosnowski@nvidia.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).