From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>,
	Konstantin Ananyev <konstantin.ananyev@huawei.com>,
	"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Bruce Richardson <bruce.richardson@intel.com>
Subject: Re: [RFC 0/4] ethdev: rework config restore
Date: Thu, 10 Oct 2024 00:16:48 +0100	[thread overview]
Message-ID: <5ef19db7-f6fe-4a48-8497-e17b33a262b5@amd.com> (raw)
In-Reply-To: <CH3PR12MB8460918E2D36BE4B4CF5B731A47F2@CH3PR12MB8460.namprd12.prod.outlook.com>
On 10/9/2024 5:18 PM, Dariusz Sosnowski wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, October 9, 2024 03:08
>> To: Konstantin Ananyev <konstantin.ananyev@huawei.com>; Dariusz Sosnowski
>> <dsosnowski@nvidia.com>; NBU-Contact-Thomas Monjalon (EXTERNAL)
>> <thomas@monjalon.net>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>
>> Subject: Re: [RFC 0/4] ethdev: rework config restore
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 10/8/2024 6:21 PM, Konstantin Ananyev wrote:
>>>
>>>
>>>>>>>> We have been working on optimizing the latency of calls to
>>>>>>>> rte_eth_dev_start(), on ports spawned by mlx5 PMD. Most of the
>>>>>>>> work requires changes in the implementation of
>>>>>>>> .dev_start() PMD callback, but I also wanted to start a
>>>>>>>> discussion regarding configuration restore.
>>>>>>>>
>>>>>>>> rte_eth_dev_start() does a few things on top of calling .dev_start()
>> callback:
>>>>>>>>
>>>>>>>> - Before calling it:
>>>>>>>>     - eth_dev_mac_restore() - if device supports
>>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
>>>>>>>> - After calling it:
>>>>>>>>     - eth_dev_mac_restore() - if device does not support
>>>>>>> RTE_ETH_DEV_NOLIVE_MAC_ADDR;
>>>>>>>>     - restore promiscuous config
>>>>>>>>     - restore all multicast config
>>>>>>>>
>>>>>>>> eth_dev_mac_restore() iterates over all known MAC addresses -
>>>>>>>> stored in rte_eth_dev_data.mac_addrs array - and calls
>>>>>>>> .mac_addr_set() and .mac_addr_add() callbacks to apply these MAC
>> addresses.
>>>>>>>>
>>>>>>>> Promiscuous config restore checks if promiscuous mode is enabled
>>>>>>>> or not, and calls .promiscuous_enable() or .promiscuous_disable()
>> callback.
>>>>>>>>
>>>>>>>> All multicast config restore checks if all multicast mode is
>>>>>>>> enabled or not, and calls .allmulticast_enable() or .allmulticast_disable()
>> callback.
>>>>>>>>
>>>>>>>> Callbacks are called directly in all of these cases, to bypass
>>>>>>>> the checks for applying the same configuration, which exist in relevant
>> APIs.
>>>>>>>> Checks are bypassed to force drivers to reapply the configuration.
>>>>>>>>
>>>>>>>> Let's consider what happens in the following sequence of API calls.
>>>>>>>>
>>>>>>>> 1. rte_eth_dev_configure()
>>>>>>>> 2. rte_eth_tx_queue_setup()
>>>>>>>> 3. rte_eth_rx_queue_setup()
>>>>>>>> 4. rte_eth_promiscuous_enable()
>>>>>>>>     - Call dev->dev_ops->promiscuous_enable()
>>>>>>>>     - Stores promiscuous state in dev->data->promiscuous 5.
>>>>>>>> rte_eth_allmulticast_enable()
>>>>>>>>     - Call dev->dev_ops->allmulticast_enable()
>>>>>>>>     - Stores allmulticast state in dev->data->allmulticast 6.
>>>>>>>> rte_eth_dev_start()
>>>>>>>>     - Call dev->dev_ops->dev_start()
>>>>>>>>     - Call dev->dev_ops->mac_addr_set() - apply default MAC address
>>>>>>>>     - Call dev->dev_ops->promiscuous_enable()
>>>>>>>>     - Call dev->dev_ops->allmulticast_enable()
>>>>>>>>
>>>>>>>> Even though all configuration is available in dev->data after
>>>>>>>> step 5, library forces reapplying this configuration in step 6.
>>>>>>>>
>>>>>>>> In mlx5 PMD case all relevant callbacks require communication
>>>>>>>> with the kernel driver, to configure the device (mlx5 PMD must
>>>>>>>> create/destroy new kernel flow rules and/or change netdev config).
>>>>>>>>
>>>>>>>> mlx5 PMD handles applying all configuration in .dev_start(), so
>>>>>>>> the following forced callbacks force additional communication
>>>>>>>> with the kernel. The
>>>>>>> same configuration is applied multiple times.
>>>>>>>>
>>>>>>>> As an optimization, mlx5 PMD could check if a given configuration
>>>>>>>> was applied, but this would duplicate the functionality of the
>>>>>>>> library (for example rte_eth_promiscuous_enable() does not call
>>>>>>>> the driver if
>>>>>>>> dev->data->promiscuous is set).
>>>>>>>>
>>>>>>>> Question: Since all of the configuration is available before
>>>>>>>> .dev_start() callback is called, why ethdev library does not
>>>>>>>> expect .dev_start() to
>>>>>>> take this configuration into account?
>>>>>>>> In other words, why library has to reapply the configuration?
>>>>>>>>
>>>>>>>> I could not find any particular reason why configuration restore
>>>>>>>> exists as part of the process (it was in the initial DPDK commit).
>>>>>>>>
>>>>>>>
>>>>>>> My assumption is .dev_stop() cause these values reset in some
>>>>>>> devices, so
>>>>>>> .dev_start() restores them back.
>>>>>>> @Bruce or @Konstantin may remember the history.
>>>>>
>>>>> Yep, as I remember, at least some Intel PMDs calling hw_reset() ad
>>>>> dec_stop() and even dev_start() to make sure that HW is in a clean (known)
>> state.
>>>>>
>>>>>>>
>>>>>>> But I agree this is device specific behavior, and can be managed
>>>>>>> by what device requires.
>>>>>
>>>>> Probably yes.
>>>>>
>>>>>>>
>>>>>>>> The patches included in this RFC, propose a mechanism which would
>>>>>>>> help with managing which drivers rely on forceful configuration restore.
>>>>>>>> Drivers could advertise if forceful configuration restore is
>>>>>>>> needed through `RTE_ETH_DEV_*_FORCE_RESTORE` device flag. If this
>>>>>>>> flag is set, then the driver in question requires ethdev to
>>>>>>>> forcefully restore
>>>>>>> configuration.
>>>>>>>>
>>>>>>>
>>>>>>> OK to use flag for it, but not sure about using 'dev_info->dev_flags'
>>>>>>> (RTE_ETH_DEV_*) for this, as this flag is shared with user and
>>>>>>> this is all dpdk internal.
>>>>>>>
>>>>>>> What about to have a dedicated flag for it? We can have a
>>>>>>> dedicated set of flag values for restore.
>>>>>>
>>>>>> Agreed. What do you think about the following?
>>>>>
>>>>> Instead of exposing that, can we probably make it transparent to the
>>>>> user and probably ethdev layer too?
>>>>>
>>>>
>>>> +1 to make it transparent to user, but not sure if we can make it
>>>> transparent to ethdev layer.
>>>
>>> Just to be clear:
>>> Let say, using example from above:
>>>
>>>  rte_eth_dev_start()
>>>      - Call dev->dev_ops->dev_start()
>>>      - Call dev->dev_ops->mac_addr_set() - apply default MAC address
>>>      - Call dev->dev_ops->promiscuous_enable()
>>>      - Call dev->dev_ops->allmulticast_enable()
>>>
>>> We probably can introduce ethdev internal function (still visible to
>>> PMDs) that would do last 3 steps:
>>> ethdev_replay_user_conf(...)
>>>      - Call dev->dev_ops->mac_addr_set() - apply default MAC address
>>>      - Call dev->dev_ops->promiscuous_enable()
>>>      - Call dev->dev_ops->allmulticast_enable()
>>>
>>> And let PMD itself to decide does it needs to call it at dev_start() or not.
>>> So it will become:
>>> rte_eth_dev_start()
>>>      - Call dev->dev_ops->dev_start()
>>>       -Call ethdev_replay_user_conf(.)
>>>               - Call dev->dev_ops->mac_addr_set() - apply default MAC address
>>>               - Call dev->dev_ops->promiscuous_enable()
>>>               -Call dev->dev_ops->allmulticast_enable()
>>>
>>> For PMDs that do need to restore user provided config And
>>> rte_eth_dev_start()
>>>      - Call dev->dev_ops->dev_start()
>>>
>>> For those who do not.
>>>
>>
>> OK, got it what you mean.
>> Pushing restore functionality to PMDs works, but this may be doing redundant
>> work on each PMD.
>>
>> Instead Dariusz suggests PMD to provide a flag to ehtdev to what to restore and
>> common code in ethdev does the work.
>> My below dedicated data struct comment is to have this flag in a new struct,
>> overall like following:
>>
>> rte_eth_dev_start()
>>    - Call dev->dev_ops->dev_start()
>>    - Call dev->dev_ops->get_restore_flags(ethdev, RTE_ETH_START, &flags)
>>    - if (flags & MAC) dev->dev_ops->mac_addr_set()
>>    - if (flags & PROMISC) dev->dev_ops->promiscuous_enable()
>>    - ...
> 
> Could you please explain what is the benefit of exposing flags through dev_ops callback vs a dedicated flags field in rte_eth_dev_data?
> In both solutions:
> - config restore is transparent to the user,
> - drivers can omit config restore (either by not implementing the callback or not providing the flags),
> - an ABI change is introduced (not a huge concern, at least for 24.11).
> 
> I understand that my initial proposal with "internal_flags" was too vague,
> but renaming and splitting this field into:
> 
> - dev_start_restore_flags
> - dev_reset_restore_flags
> - and so on...
> 
> seems sufficient, at least in my opinion.
> 
Hi Dariusz,
Putting flags to rte_eth_dev_data works, and it is easier since there is
direct access from rte_eth_dev to rte_eth_dev_data, so you don't need
new dev_ops. So this is a valid option.
But benefit of new dev_ops is to keep "struct rte_eth_dev_data" clean.
"struct rte_eth_dev_data" is integral data structure for ethdev and it
is used in multiple locations, mostly related to the datapath and all
drivers needs to deal with fields of this struct.
Like [rx]_queues, dev_private, dev_conf all important and used a lot.
I want to protect "struct rte_eth_dev_data" from noise as much as
possible, though what is noise is not always that clear.
This restore flag is not critical, and I expect most of the drivers
won't care and populate this restore flag at all. That is why to me it
is better have dedicated struct for it and only drivers care about
restore feature know it.
>>
>> So PMDs only will provide what to restore with an internal API and common
>> ethdev layer will restore it.
>> If no restore required PMD may not implement .get_restore_flags() at all.
>>
>> Additionally, RTE_ETH_START, RTE_ETH_RESET etc flag can be provided to internal
>> API to get what to restore in different states...
>>
>>>> Suggested 'internal_flag' in "struct rte_eth_dev_data" can be
>>>> confusing and open to interpretation what to use it for and by time
>>>> become source of defect.
>>>
>>> Yes, same thoughts.
>>>
>>>> Instead what do you think to have a separate, dedicated data struct for it?
>>>
>>> Hmm... not sure I understood you here...
>>>
>>>>
>>>>> Might be we can move this restoration code into the new ethdev
>>>>> helper function,(ethdevd_user_config_restore()  or so) that PMD can invoke
>> during its dev_start() if needed?
>>>>>
>>>>>>
>>>>>> #define RTE_ETH_DEV_INTERNAL_PROMISC_FORCE_RESTORE RTE_BIT32(0)
>>>>>> #define RTE_ETH_DEV_INTERNAL_ALLMULTI_FORCE_RESTORE RTE_BIT32(1)
>>>>>> #define RTE_ETH_DEV_INTERNAL_MAC_ADDR_FORCE_RESTORE
>> RTE_BIT32(2)
>>>>>>
>>>>>> struct rte_eth_dev_data {
>>>>>>    /* snip */
>>>>>>
>>>>>>    uint32_t dev_flags;
>>>>>>
>>>>>>    /**
>>>>>>     * Internal device capabilities, used only by ethdev library.
>>>>>>     * Certain functionalities provided by the library might enabled/disabled,
>>>>>>     * based on driver exposing certain capabilities.
>>>>>>     */
>>>>>>    uint32_t internal_flags;
>>>>>>
>>>>>>    /* snip */
>>>>>> };
>>>>>>
>>>>>>> Also perhaps we have go into details what needs to be restored
>>>>>>> after 'stop' and what needs to be restored after 'reset' and use similar
>> mechanism etc...
>>>>>>
>>>>>> I think we should look into that.
>>>>>> Any 'codification' of semantics between drivers and ethdev library is good in
>> my opinion.
>>>>>>
>>>>>> At least right now, ethdev does not change any configuration in 'stop' and
>> 'reset' from what I see.
>>>>>> But that's on library side only.
>>>>>>
>>>>>>>> This way, if we would conclude that it makes sense for
>>>>>>>> .dev_start() to handle all starting configuration aspects, we
>>>>>>>> could track which drivers still rely
>>>>>>> on configuration restore.
>>>>>>>>
>>>>>>>> Dariusz Sosnowski (4):
>>>>>>>>   ethdev: rework config restore
>>>>>>>>   ethdev: omit promiscuous config restore if not required
>>>>>>>>   ethdev: omit all multicast config restore if not required
>>>>>>>>   ethdev: omit MAC address restore if not required
>>>>>>>>
>>>>>>>>  lib/ethdev/rte_ethdev.c | 39
>>>>>>>> ++++++++++++++++++++++++++++++++++-----
>>>>>>>>  lib/ethdev/rte_ethdev.h | 18 ++++++++++++++++++
>>>>>>>>  2 files changed, 52 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>> --
>>>>>>>> 2.39.5
>>>>>>>>
>>>>>
>>>
> 
next prev parent reply	other threads:[~2024-10-09 23:16 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-18  9:21 Dariusz Sosnowski
2024-09-18  9:21 ` [RFC 1/4] " Dariusz Sosnowski
2024-09-18  9:21 ` [RFC 2/4] ethdev: omit promiscuous config restore if not required Dariusz Sosnowski
2024-09-18  9:22 ` [RFC 3/4] ethdev: omit all multicast " Dariusz Sosnowski
2024-09-18  9:22 ` [RFC 4/4] ethdev: omit MAC address " Dariusz Sosnowski
2024-09-29 23:31 ` [RFC 0/4] ethdev: rework config restore Ferruh Yigit
2024-10-04 19:13   ` Dariusz Sosnowski
2024-10-07  9:27     ` Konstantin Ananyev
2024-10-07 22:56       ` Ferruh Yigit
2024-10-08 17:21         ` Konstantin Ananyev
2024-10-09  1:07           ` Ferruh Yigit
2024-10-09 10:54             ` Konstantin Ananyev
2024-10-09 16:18             ` Dariusz Sosnowski
2024-10-09 23:16               ` Ferruh Yigit [this message]
2024-10-10 12:08                 ` Dariusz Sosnowski
2024-10-10 12:51                   ` Ferruh Yigit
2024-10-10 16:23                     ` Dariusz Sosnowski
2024-10-10 17:08                       ` Ferruh Yigit
2024-10-10 22:58                         ` Konstantin Ananyev
2024-10-11  0:02                           ` Ferruh Yigit
2024-10-11  8:23                             ` Dariusz Sosnowski
2024-10-11  8:29                             ` Konstantin Ananyev
2024-10-11  9:37                               ` Dariusz Sosnowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=5ef19db7-f6fe-4a48-8497-e17b33a262b5@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).