ABI - search results

DPDK patches and discussions
 help / color / mirror / Atom feed

Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download:

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  @ 2020-06-27  7:44  5%   ` Jerin Jacob
  2020-06-29 19:30  4%     ` McDaniel, Timothy
  2020-06-30 11:22  0%     ` Kinsella, Ray
  0 siblings, 2 replies; 200+ results
From: Jerin Jacob @ 2020-06-27  7:44 UTC (permalink / raw)
  To: Tim McDaniel, Ray Kinsella, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Gage Eads,
	Van Haaren, Harry

> +
> +/** Event port configuration structure */
> +struct rte_event_port_conf_v20 {
> +       int32_t new_event_threshold;
> +       /**< A backpressure threshold for new event enqueues on this port.
> +        * Use for *closed system* event dev where event capacity is limited,
> +        * and cannot exceed the capacity of the event dev.
> +        * Configuring ports with different thresholds can make higher priority
> +        * traffic less likely to  be backpressured.
> +        * For example, a port used to inject NIC Rx packets into the event dev
> +        * can have a lower threshold so as not to overwhelm the device,
> +        * while ports used for worker pools can have a higher threshold.
> +        * This value cannot exceed the *nb_events_limit*
> +        * which was previously supplied to rte_event_dev_configure().
> +        * This should be set to '-1' for *open system*.
> +        */
> +       uint16_t dequeue_depth;
> +       /**< Configure number of bulk dequeues for this event port.
> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
> +       uint16_t enqueue_depth;
> +       /**< Configure number of bulk enqueues for this event port.
> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
>         uint8_t disable_implicit_release;
>         /**< Configure the port not to release outstanding events in
>          * rte_event_dev_dequeue_burst(). If true, all events received through
> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>                                 struct rte_event_port_conf *port_conf);
>
> +int
> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event_port_conf_v20 *port_conf);
> +
> +int
> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> +                                     struct rte_event_port_conf *port_conf);

Hi Timothy,

+ ABI Maintainers (Ray, Neil)

# As per my understanding, the structures can not be versioned, only
function can be versioned.
i.e we can not make any change to " struct rte_event_port_conf"

# We have a similar case with ethdev and it deferred to next release v20.11
http://patches.dpdk.org/patch/69113/

Regarding the API changes:
# The slow path changes general looks good to me. I will review the
next level in the coming days
# The following fast path changes bothers to me. Could you share more
details on below change?

diff --git a/app/test-eventdev/test_order_atq.c
b/app/test-eventdev/test_order_atq.c
index 3366cfc..8246b96 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -34,6 +34,8 @@
                        continue;
                }

+               ev.flow_id = ev.mbuf->udata64;
+
# Since RC1 is near, I am not sure how to accommodate the API changes
now and sort out ABI stuffs.
# Other concern is eventdev spec get bloated with versioning files
just for ONE release as 20.11 will be OK to change the ABI.
# While we discuss the API change, Please send deprecation notice for
ABI change for 20.11,
so that there is no ambiguity of this patch for the 20.11 release.

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item
  @ 2020-06-28 14:52  0%             ` Dekel Peled
  0 siblings, 0 replies; 200+ results
From: Dekel Peled @ 2020-06-28 14:52 UTC (permalink / raw)
  To: Adrien Mazarguil, Ori Kam, Andrew Rybchenko
  Cc: ferruh.yigit, john.mcnamara, marko.kovacevic, Asaf Penso,
	Matan Azrad, Eli Britstein, dev, Ivan Malov

Hi,

This change is proposed for 20.11.
It is suggested after internal discussions, where multiple suggestions were considered, some of them similar to the ones suggested below. 
Continuing the earlier correspondence in this thread, please send any other comments/suggestions you have.

Regards,
Dekel

> -----Original Message-----
> From: Dekel Peled <dekelp@mellanox.com>
> Sent: Thursday, June 18, 2020 9:59 AM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Ori Kam
> <orika@mellanox.com>; Andrew Rybchenko <arybchenko@solarflare.com>
> Cc: ferruh.yigit@intel.com; john.mcnamara@intel.com;
> marko.kovacevic@intel.com; Asaf Penso <asafp@mellanox.com>; Matan
> Azrad <matan@mellanox.com>; Eli Britstein <elibr@mellanox.com>;
> dev@dpdk.org; Ivan Malov <Ivan.Malov@oktetlabs.ru>
> Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> 
> Hi,
> 
> Kind reminder, please respond on the recent correspondence so we can
> conclude this issue.
> 
> Regards,
> Dekel
> 
> > -----Original Message-----
> > From: Dekel Peled <dekelp@mellanox.com>
> > Sent: Wednesday, June 3, 2020 3:11 PM
> > To: Ori Kam <orika@mellanox.com>; Adrien Mazarguil
> > <adrien.mazarguil@6wind.com>
> > Cc: Andrew Rybchenko <arybchenko@solarflare.com>;
> > ferruh.yigit@intel.com; john.mcnamara@intel.com;
> > marko.kovacevic@intel.com; Asaf Penso <asafp@mellanox.com>; Matan
> > Azrad <matan@mellanox.com>; Eli Britstein <elibr@mellanox.com>;
> > dev@dpdk.org; Ivan Malov <Ivan.Malov@oktetlabs.ru>
> > Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> >
> > Hi, PSB.
> >
> > > -----Original Message-----
> > > From: Ori Kam <orika@mellanox.com>
> > > Sent: Wednesday, June 3, 2020 11:16 AM
> > > To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Dekel Peled
> > > <dekelp@mellanox.com>; ferruh.yigit@intel.com;
> > > john.mcnamara@intel.com; marko.kovacevic@intel.com; Asaf Penso
> > > <asafp@mellanox.com>; Matan Azrad <matan@mellanox.com>; Eli
> > Britstein
> > > <elibr@mellanox.com>; dev@dpdk.org; Ivan Malov
> > > <Ivan.Malov@oktetlabs.ru>
> > > Subject: RE: [RFC] ethdev: add fragment attribute to IPv6 item
> > >
> > > Hi Adrien,
> > >
> > > Great to hear from you again.
> > >
> > > > -----Original Message-----
> > > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > Sent: Tuesday, June 2, 2020 10:04 PM
> > > > To: Ori Kam <orika@mellanox.com>
> > > > Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Dekel Peled
> > > > <dekelp@mellanox.com>; ferruh.yigit@intel.com;
> > > > john.mcnamara@intel.com; marko.kovacevic@intel.com; Asaf Penso
> > > > <asafp@mellanox.com>; Matan Azrad <matan@mellanox.com>; Eli
> > > Britstein
> > > > <elibr@mellanox.com>; dev@dpdk.org; Ivan Malov
> > > > <Ivan.Malov@oktetlabs.ru>
> > > > Subject: Re: [RFC] ethdev: add fragment attribute to IPv6 item
> > > >
> > > > Hi Ori, Andrew, Delek,
> >
> > It's Dekel, not Delek ;-)
> >
> > > >
> > > > (been a while eh?)
> > > >
> > > > On Tue, Jun 02, 2020 at 06:28:41PM +0000, Ori Kam wrote:
> > > > > Hi Andrew,
> > > > >
> > > > > PSB,
> > > > [...]
> > > > > > > diff --git a/lib/librte_ethdev/rte_flow.h
> > > > > > > b/lib/librte_ethdev/rte_flow.h index b0e4199..3bc8ce1 100644
> > > > > > > --- a/lib/librte_ethdev/rte_flow.h
> > > > > > > +++ b/lib/librte_ethdev/rte_flow.h
> > > > > > > @@ -787,6 +787,8 @@ struct rte_flow_item_ipv4 {
> > > > > > >   */
> > > > > > >  struct rte_flow_item_ipv6 {
> > > > > > >  	struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
> > > > > > > +	uint32_t is_frag:1; /**< Is IPv6 packet fragmented/non-
> > > fragmented. */
> > > > > > > +	uint32_t reserved:31; /**< Reserved, must be zero. */
> > > > > >
> > > > > > The solution is simple, but hardly generic and adds an example
> > > > > > for the future extensions. I doubt that it is a right way to go.
> > > > > >
> > > > > I agree with you that this is not the most generic way possible,
> > > > > but the IPV6 extensions are very unique. So the solution is also
> unique.
> > > > > In general, I'm always in favor of finding the most generic way,
> > > > > but
> > > > sometimes
> > > > > it is better to keep things simple, and see how it goes.
> > > >
> > > > Same feeling here, it doesn't look right.
> > > >
> > > > > > May be we should add 256-bit string with one bit for each IP
> > > > > > protocol number and apply it to extension headers only?
> > > > > > If bit A is set in the mask:
> > > > > >  - if bit A is set in spec as well, extension header with
> > > > > >    IP protocol (1 << A) number must present
> > > > > >  - if bit A is clear in spec, extension header with
> > > > > >    IP protocol (1 << A) number must absent If bit is clear in
> > > > > > the mask, corresponding extension header may present and may
> > > > > > absent (i.e. don't care).
> > > > > >
> > > > > There are only 12 possible extension headers and currently none
> > > > > of them are supported in rte_flow. So adding a logic to parse
> > > > > the 256 just to get a max
> > > > of 12
> > > > > possible values is an overkill. Also, if we disregard the case
> > > > > of the extension, the application must select only one next proto.
> > > > > For example, the application can't select udp + tcp. There is
> > > > > the option to add a flag for each of the possible extensions,
> > > > > does it makes more
> > > sense to you?
> > > >
> > > > Each of these extension headers has its own structure, we first
> > > > need the ability to match them properly by adding the necessary
> pattern items.
> > > >
> > > > > > The RFC indirectly touches IPv6 proto (next header) matching
> > > > > > logic.
> > > > > >
> > > > > > If logic used in ETH+VLAN is applied on IPv6 as well, it would
> > > > > > make pattern specification and handling complicated. E.g.:
> > > > > >   eth / ipv6 / udp / end
> > > > > > should match UDP over IPv6 without any extension headers only.
> > > > > >
> > > > > The issue with VLAN I agree is different since by definition
> > > > > VLAN is layer 2.5. We can add the same logic also to the VLAN
> > > > > case, maybe it will be easier.
> > > > > In any case, in your example above and according to the RFC we
> > > > > will get all ipv6 udp traffic with and without extensions.
> > > > >
> > > > > > And how to specify UPD over IPv6 regardless extension headers?
> > > > >
> > > > > Please see above the rule will be eth / ipv6 /udp.
> > > > >
> > > > > >   eth / ipv6 / ipv6_ext / udp / end with a convention that
> > > > > > ipv6_ext is optional if spec and mask are NULL (or mask is empty).
> > > > > >
> > > > > I would guess that this flow should match all ipv6 that has one
> > > > > ext and the
> > > > next
> > > > > proto is udp.
> > > >
> > > > In my opinion RTE_FLOW_ITEM_TYPE_IPV6_EXT is a bit useless on its
> > own.
> > > > It's only for matching packets that contain some kind of extension
> > > > header, not a specific one, more about that below.
> > > >
> > > > > > I'm wondering if any driver treats it this way?
> > > > > >
> > > > > I'm not sure, we can support only the frag ext by default, but
> > > > > if required we
> > > > can support other
> > > > > ext.
> > > > >
> > > > > > I agree that the problem really comes when we'd like match
> > > > > > IPv6 frags or even worse not fragments.
> > > > > >
> > > > > > Two patterns for fragments:
> > > > > >   eth / ipv6 (proto=FRAGMENT) / end
> > > > > >   eth / ipv6 / ipv6_ext (next_hdr=FRAGMENT) / end
> > > > > >
> > > > > > Any sensible solution for not-fragments with any other
> > > > > > extension headers?
> > > > > >
> > > > > The one propose in this mail 😊
> > > > >
> > > > > > INVERT exists, but hardly useful, since it simply says that
> > > > > > patches which do not match pattern without INVERT matches the
> > > > > > pattern with INVERT and
> > > > > >   invert / eth / ipv6 (proto=FRAGMENT) / end will match ARP,
> > > > > > IPv4,
> > > > > > IPv6 with an extension header before fragment header and so on.
> > > > > >
> > > > > I agree with you, INVERT in this doesn’t help.
> > > > > We were considering adding some kind of not mask / item per item.
> > > > > some think around this line:
> > > > > user request ipv6 unfragmented udp packets. The flow would look
> > > > > something like this:
> > > > > Eth / ipv6 / Not (Ipv6.proto = frag_proto) / udp But it makes
> > > > > the rules much harder to use, and I don't think that there is
> > > > > any HW that support not, and adding such feature to all items is
> overkill.
> > > > >
> > > > >
> > > > > > Bit string suggested above will allow to match:
> > > > > >  - UDP over IPv6 with any extension headers:
> > > > > >     eth / ipv6 (ext_hdrs mask empty) / udp / end
> > > > > >  - UDP over IPv6 without any extension headers:
> > > > > >     eth / ipv6 (ext_hdrs mask full, spec empty) / udp / end
> > > > > >  - UDP over IPv6 without fragment header:
> > > > > >     eth / ipv6 (ext.spec & ~FRAGMENT, ext.mask | FRAGMENT) /
> > > > > > udp / end
> > > > > >  - UDP over IPv6 with fragment header
> > > > > >     eth / ipv6 (ext.spec | FRAGMENT, ext.mask | FRAGMENT) /
> > > > > > udp / end
> > > > > >
> > > > > > where FRAGMENT is 1 << IPPROTO_FRAGMENT.
> > > > > >
> > > > > Please see my response regarding this above.
> > > > >
> > > > > > Above I intentionally keep 'proto' unspecified in ipv6 since
> > > > > > otherwise it would specify the next header after IPv6 header.
> > > > > >
> > > > > > Extension headers mask should be empty by default.
> > > >
> > > > This is a deliberate design choice/issue with rte_flow: an empty
> > > > pattern matches everything; adding items only narrows the selection.
> > > > As Andrew said there is currently no way to provide a specific
> > > > item to reject, it can only be done globally on a pattern through
> > > > INVERT that no
> > > PMD implements so far.
> > > >
> > > > So we have two requirements here: the ability to specifically
> > > > match
> > > > IPv6 fragment headers and the ability to reject them.
> > > >
> > > > To match IPv6 fragment headers, we need a dedicated pattern item.
> > > > The generic RTE_FLOW_ITEM_TYPE_IPV6_EXT is useless for that on its
> > > > own, it must be completed with
> RTE_FLOW_ITEM_TYPE_IPV6_EXT_FRAG
> > and
> > > associated
> > > > object
> > >
> > > Yes, we must add EXT_FRAG to be able to match on the FRAG bits.
> > >
> >
> > Please see previous RFC I sent.
> > [RFC] ethdev: add IPv6 fragment extension header item
> > http://mails.dpdk.org/archives/dev/2020-March/160255.html
> > It is complemented by this RFC.
> >
> > > > to match individual fields if needed (like all the others
> > > > protocols/headers).
> > > >
> > > > Then to reject a pattern item... My preference goes to a new "NOT"
> > > > meta item affecting the meaning of the item coming immediately
> > > > after in the pattern list. That would be ultra generic, wouldn't
> > > > break any ABI/API and like INVERT, wouldn't even require a new
> > > > object associated
> > > with it.
> > > >
> > > > To match UDPv6 traffic when there is no fragment header, one could
> > > > then do something like:
> > > >
> > > >  eth / ipv6 / not / ipv6_ext_frag / udp
> > > >
> > > > PMD support would be trivial to implement (I'm sure!)
> > > >
> > > I agree with you as I said above. The issue is not PMD, the issues are:
> > > 1. think about the rule you stated above from logic point there is
> > > some contradiction, you are saying ipv6 next proto udp but you also
> > > say not frag, this is logic only for IPV6 ext.
> > > 2. HW issue, I don't know of HW that knows how to support not on an
> item.
> > > So adding something for all items for only one case is overkill.
> > >
> > >
> > >
> > > > We may later implement other kinds of "operator" items as Andrew
> > > > suggested, for bit-wise stuff and so on. Let's keep adding
> > > > features on a needed basis though.
> > > >
> > > > --
> > > > Adrien Mazarguil
> > > > 6WIND
> > >
> > > Best,
> > > Ori

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  @ 2020-06-29  7:42  3%       ` Dmitry Kozlyuk
  2020-06-29  8:12  0%         ` Tal Shnaiderman
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2020-06-29  7:42 UTC (permalink / raw)
  To: Ranjit Menon
  Cc: Fady Bader, dev, Dmitry Malloy, Narcisa Ana Maria Vasile,
	Tal Shnaiderman, Thomas Monjalon, Olivier Matz

On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:
> On 6/28/2020 7:20 AM, Fady Bader wrote:
> > Hi Dmitry,
> > I'm trying to run test-pmd on Windows and I ran into this error with cmdline.
> >
> > The error log message is :
> > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64' redeclared as different kind of symbol
> >    INT64
> >
> > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/winnt.h:150,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/windef.h:8,
> >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
> >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> >                   from ../app/test-pmd/cmdline_flow.c:18:
> > C:/mingw-w64/x86_64/mingw64/x86_64-w64-mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was here
> >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> >
> > The same error is for the other types defined in cmdline_numtype.
> >
> > This problem with windows.h is popping in many places and some of them are
> > cmdline and test-pmd and librte_net.
> > We should find a way to exclude windows.h from the unneeded places, is there any
> > suggestions on how it can be done ?  
> 
> We ran into this same issue when working with the code that is on the 
> draft repo.
> 
> The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types in 
> Windows headers for integer types. We found that it is easier to change 
> the enum in cmdline_parse_num.h than try to play with the include order 
> of headers. AFAIK, the enums were only used to determine the type in a 
> series of switch() statements in librte_cmdline, so we simply renamed 
> the enums. Not sure, if that will be acceptable here.

+1 for renaming enum values. It's not a problem of librte_cmdline itself but a
problem of its consumption on Windows, however renaming enum values doesn't
break ABI and winn make librte_cmdline API "namespaced".

I don't see a clean way not to expose windows.h, because pthread.h depends on
it, and if we hide implementation, librte_eal would have to export pthread
symbols on Windows, which is a hack (or is it?).

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 0/3] RCU integration with LPM library
  @ 2020-06-29  8:02  3% ` Ruifeng Wang
    2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-06-29  8:02 UTC (permalink / raw)
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 129 +++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 995 insertions(+), 17 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29  7:42  3%       ` Dmitry Kozlyuk
@ 2020-06-29  8:12  0%         ` Tal Shnaiderman
  2020-06-29 23:56  0%           ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tal Shnaiderman @ 2020-06-29  8:12 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Ranjit Menon
  Cc: Fady Bader, dev, Dmitry Malloy, Narcisa Ana Maria Vasile,
	Thomas Monjalon, Olivier Matz

> From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> 
> On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:
> > On 6/28/2020 7:20 AM, Fady Bader wrote:
> > > Hi Dmitry,
> > > I'm trying to run test-pmd on Windows and I ran into this error with
> cmdline.
> > >
> > > The error log message is :
> > > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64'
> redeclared as different kind of symbol
> > >    INT64
> > >
> > > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/winnt.h:150,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/minwindef.h:163,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/windef.h:8,
> > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/windows.h:69,
> > >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> > >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> > >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> > >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> > >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> > >                   from ../app/test-pmd/cmdline_flow.c:18:
> > > C:/mingw-w64/x86_64/mingw64/x86_64-w64-
> mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was
> here
> > >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> > >
> > > The same error is for the other types defined in cmdline_numtype.
> > >
> > > This problem with windows.h is popping in many places and some of
> > > them are cmdline and test-pmd and librte_net.
> > > We should find a way to exclude windows.h from the unneeded places,
> > > is there any suggestions on how it can be done ?
> >
> > We ran into this same issue when working with the code that is on the
> > draft repo.
> >
> > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > in Windows headers for integer types. We found that it is easier to
> > change the enum in cmdline_parse_num.h than try to play with the
> > include order of headers. AFAIK, the enums were only used to determine
> > the type in a series of switch() statements in librte_cmdline, so we
> > simply renamed the enums. Not sure, if that will be acceptable here.
> 
> +1 for renaming enum values. It's not a problem of librte_cmdline itself
> +but a
> problem of its consumption on Windows, however renaming enum values
> doesn't break ABI and winn make librte_cmdline API "namespaced".
> 
> I don't see a clean way not to expose windows.h, because pthread.h
> depends on it, and if we hide implementation, librte_eal would have to
> export pthread symbols on Windows, which is a hack (or is it?).

test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.

Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.

> 
> --
> Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] [PATCH v4 4/9] eal: introduce thread uninit helper
    @ 2020-06-29  8:59  0%     ` Sunil Kumar Kori
  2020-06-30  9:42  0%     ` [dpdk-dev] " Olivier Matz
  2 siblings, 0 replies; 200+ results
From: Sunil Kumar Kori @ 2020-06-29  8:59 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, Jerin Jacob Kollanukkaran, Neil Horman,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon

>-----Original Message-----
>From: David Marchand <david.marchand@redhat.com>
>Sent: Friday, June 26, 2020 8:18 PM
>To: dev@dpdk.org
>Cc: jerinjacobk@gmail.com; bruce.richardson@intel.com; mdr@ashroe.eu;
>thomas@monjalon.net; arybchenko@solarflare.com; ktraynor@redhat.com;
>ian.stokes@intel.com; i.maximets@ovn.org; Jerin Jacob Kollanukkaran
><jerinj@marvell.com>; Sunil Kumar Kori <skori@marvell.com>; Neil Horman
><nhorman@tuxdriver.com>; Harini Ramakrishnan
><harini.ramakrishnan@microsoft.com>; Omar Cardona
><ocardona@microsoft.com>; Pallavi Kadam <pallavi.kadam@intel.com>;
>Ranjit Menon <ranjit.menon@intel.com>
>Subject: [EXT] [PATCH v4 4/9] eal: introduce thread uninit helper
>
>External Email
>
>----------------------------------------------------------------------
>This is a preparation step for dynamically unregistering threads.
>
>Since we explicitly allocate a per thread trace buffer in rte_thread_init, add an
>internal helper to free this buffer.
>
>Signed-off-by: David Marchand <david.marchand@redhat.com>
>---
>Note: I preferred renaming the current internal function to free all threads
>trace buffers (new name trace_mem_free()) and reuse the previous name
>(trace_mem_per_thread_free()) when freeing this buffer for a given thread.
>
>Changes since v2:
>- added missing stub for windows tracing support,
>- moved free symbol to exported (experimental) ABI as a counterpart of
>  the alloc symbol we already had,
>
>Changes since v1:
>- rebased on master, removed Windows workaround wrt traces support,
>
>---
> lib/librte_eal/common/eal_common_thread.c |  9 ++++
>lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
> lib/librte_eal/common/eal_thread.h        |  5 +++
> lib/librte_eal/common/eal_trace.h         |  2 +-
> lib/librte_eal/include/rte_trace_point.h  |  9 ++++
> lib/librte_eal/rte_eal_version.map        |  3 ++
> lib/librte_eal/windows/eal.c              |  5 +++
> 7 files changed, 75 insertions(+), 9 deletions(-)
>
>diff --git a/lib/librte_eal/common/eal_common_thread.c
>b/lib/librte_eal/common/eal_common_thread.c
>index afb30236c5..3b30cc99d9 100644
>--- a/lib/librte_eal/common/eal_common_thread.c
>+++ b/lib/librte_eal/common/eal_common_thread.c
>@@ -20,6 +20,7 @@
> #include "eal_internal_cfg.h"
> #include "eal_private.h"
> #include "eal_thread.h"
>+#include "eal_trace.h"
>
> RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
>RTE_DEFINE_PER_LCORE(int, _thread_id) = -1; @@ -161,6 +162,14 @@
>rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
> 	__rte_trace_mem_per_thread_alloc();
> }
>
>+void
>+rte_thread_uninit(void)
>+{

Need to check whether trace is enabled or not similar to trace_mem_free(). 
>+	__rte_trace_mem_per_thread_free();
>+
>+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY; }
>+
> struct rte_thread_ctrl_params {
> 	void *(*start_routine)(void *);
> 	void *arg;

[snipped]

>2.23.0


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
  @ 2020-06-29  9:07  0%       ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-06-29  9:07 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dpdk-dev, Richardson, Bruce, Ray Kinsella, Thomas Monjalon,
	Andrew Rybchenko, Kevin Traynor, Ian Stokes, Ilya Maximets,
	Jerin Jacob, Sunil Kumar Kori, Neil Horman, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

On Fri, Jun 26, 2020 at 5:00 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Jun 26, 2020 at 8:18 PM David Marchand
> <david.marchand@redhat.com> wrote:
> >
> > This is a preparation step for dynamically unregistering threads.
> >
> > Since we explicitly allocate a per thread trace buffer in
> > rte_thread_init, add an internal helper to free this buffer.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> > Note: I preferred renaming the current internal function to free all
> > threads trace buffers (new name trace_mem_free()) and reuse the previous
> > name (trace_mem_per_thread_free()) when freeing this buffer for a given
> > thread.
> >
> > Changes since v2:
> > - added missing stub for windows tracing support,
> > - moved free symbol to exported (experimental) ABI as a counterpart of
> >   the alloc symbol we already had,
> >
> > Changes since v1:
> > - rebased on master, removed Windows workaround wrt traces support,
>
> > +/**
> > + * Uninitialize per-lcore info for current thread.
> > + */
> > +void rte_thread_uninit(void);
> > +
>
> Is it a public API? I guess not as it not adding in .map file.
> If it is private API, Is n't it better to change as eal_thread_ like
> another private API in eal_thread.h?

Before this series, we have:
- rte_thread_ public APIs for both EAL and non-EAL threads (declared
in rte_eal_interrupts.h and rte_lcore.h),
- eal_thread_ internal APIs that apply to EAL threads (declared in
eal_thread.h),

I guess __rte_thread_ could do the trick and I will move this to eal_private.h.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 10/10] build: generate version.map file for MinGW on Windows
  @ 2020-06-29 12:37  4%   ` talshn
  0 siblings, 0 replies; 200+ results
From: talshn @ 2020-06-29 12:37 UTC (permalink / raw)
  To: dev
  Cc: thomas, pallavi.kadam, dmitry.kozliuk, david.marchand, grive,
	ranjit.menon, navasile, harini.ramakrishnan, ocardona,
	anatoly.burakov, fady, bruce.richardson, Tal Shnaiderman

From: Tal Shnaiderman <talshn@mellanox.com>

The MinGW build for Windows has special cases where exported
function contain additional prefix:

__emutls_v.per_lcore__*

To avoid adding those prefixed functions to the version.map file
the map_to_def.py script was modified to create a map file for MinGW
with the needed changed.

The file name was changed to map_to_win.py and lib/meson.build map output
was unified with drivers/meson.build output

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
---
 buildtools/{map_to_def.py => map_to_win.py} | 11 ++++++++++-
 buildtools/meson.build                      |  4 ++--
 drivers/meson.build                         | 12 +++++++++---
 lib/meson.build                             | 19 ++++++++++++++-----
 4 files changed, 35 insertions(+), 11 deletions(-)
 rename buildtools/{map_to_def.py => map_to_win.py} (69%)

diff --git a/buildtools/map_to_def.py b/buildtools/map_to_win.py
similarity index 69%
rename from buildtools/map_to_def.py
rename to buildtools/map_to_win.py
index 6775b54a9d..2990b58634 100644
--- a/buildtools/map_to_def.py
+++ b/buildtools/map_to_win.py
@@ -10,12 +10,21 @@
 def is_function_line(ln):
     return ln.startswith('\t') and ln.endswith(';\n') and ":" not in ln
 
+# MinGW keeps the original .map file but replaces per_lcore* to __emutls_v.per_lcore*
+def create_mingw_map_file(input_map, output_map):
+    with open(input_map) as f_in, open(output_map, 'w') as f_out:
+        f_out.writelines([lines.replace('per_lcore', '__emutls_v.per_lcore') for lines in f_in.readlines()])
 
 def main(args):
     if not args[1].endswith('version.map') or \
-            not args[2].endswith('exports.def'):
+            not args[2].endswith('exports.def') and \
+            not args[2].endswith('mingw.map'):
         return 1
 
+    if args[2].endswith('mingw.map'):
+        create_mingw_map_file(args[1], args[2])
+        return 0
+
 # special case, allow override if an def file already exists alongside map file
     override_file = join(dirname(args[1]), basename(args[2]))
     if exists(override_file):
diff --git a/buildtools/meson.build b/buildtools/meson.build
index d5f8291beb..f9d2fdf74b 100644
--- a/buildtools/meson.build
+++ b/buildtools/meson.build
@@ -9,14 +9,14 @@ list_dir_globs = find_program('list-dir-globs.py')
 check_symbols = find_program('check-symbols.sh')
 ldflags_ibverbs_static = find_program('options-ibverbs-static.sh')
 
-# set up map-to-def script using python, either built-in or external
+# set up map-to-win script using python, either built-in or external
 python3 = import('python').find_installation(required: false)
 if python3.found()
 	py3 = [python3]
 else
 	py3 = ['meson', 'runpython']
 endif
-map_to_def_cmd = py3 + files('map_to_def.py')
+map_to_win_cmd = py3 + files('map_to_win.py')
 sphinx_wrapper = py3 + files('call-sphinx-build.py')
 
 # stable ABI always starts with "DPDK_"
diff --git a/drivers/meson.build b/drivers/meson.build
index 646a7d5eb5..2cd8505d10 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -152,16 +152,22 @@ foreach class:dpdk_driver_classes
 			implib = 'lib' + lib_name + '.dll.a'
 
 			def_file = custom_target(lib_name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
 				output: '@0@_exports.def'.format(lib_name))
-			lk_deps = [version_map, def_file]
+
+			mingw_map = custom_target(lib_name + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(lib_name))
+
+			lk_deps = [version_map, def_file, mingw_map]
 			if is_windows
 				if is_ms_linker
 					lk_args = ['-Wl,/def:' + def_file.full_path(),
 						'-Wl,/implib:drivers\\' + implib]
 				else
-					lk_args = []
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
 				endif
 			else
 				lk_args = ['-Wl,--version-script=' + version_map]
diff --git a/lib/meson.build b/lib/meson.build
index a8fd317a18..af66610fcb 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -149,19 +149,28 @@ foreach l:libraries
 					meson.current_source_dir(), dir_name, name)
 			implib = dir_name + '.dll.a'
 
-			def_file = custom_target(name + '_def',
-				command: [map_to_def_cmd, '@INPUT@', '@OUTPUT@'],
+			def_file = custom_target(libname + '_def',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
 				input: version_map,
-				output: 'rte_@0@_exports.def'.format(name))
+				output: '@0@_exports.def'.format(libname))
+
+			mingw_map = custom_target(libname + '_mingw',
+				command: [map_to_win_cmd, '@INPUT@', '@OUTPUT@'],
+				input: version_map,
+				output: '@0@_mingw.map'.format(libname))
 
 			if is_ms_linker
 				lk_args = ['-Wl,/def:' + def_file.full_path(),
 					'-Wl,/implib:lib\\' + implib]
 			else
-				lk_args = ['-Wl,--version-script=' + version_map]
+				if is_windows
+					lk_args = ['-Wl,--version-script=' + mingw_map.full_path()]
+				else
+					lk_args = ['-Wl,--version-script=' + version_map]
+				endif
 			endif
 
-			lk_deps = [version_map, def_file]
+			lk_deps = [version_map, def_file, mingw_map]
 			if not is_windows
 				# on unix systems check the output of the
 				# check-symbols.sh script, using it as a
-- 
2.16.1.windows.4


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration
  @ 2020-06-29 14:08  4%   ` Matan Azrad
  0 siblings, 0 replies; 200+ results
From: Matan Azrad @ 2020-06-29 14:08 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Xiao Wang

As an arrangement to per queue operations in the vDPA device it is
needed to change the next experimental API:

The API ``rte_vhost_host_notifier_ctrl`` was changed to be per queue
instead of per device.

A `qid` parameter was added to the API arguments list.

Setting the parameter to the value RTE_VHOST_QUEUE_ALL configures the
host notifier to all the device queues as done before this patch.

Signed-off-by: Matan Azrad <matan@mellanox.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 doc/guides/rel_notes/release_20_08.rst |  3 +++
 drivers/vdpa/ifc/ifcvf_vdpa.c          |  6 +++---
 drivers/vdpa/mlx5/mlx5_vdpa.c          |  6 ++++--
 lib/librte_vhost/rte_vdpa.h            |  8 ++++++--
 lib/librte_vhost/rte_vhost.h           |  1 -
 lib/librte_vhost/vhost_user.c          | 18 ++++++++++++++----
 6 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 44383b8..2d5a3f7 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -125,6 +125,9 @@ API Changes
 
 * ``rte_page_sizes`` enumeration is replaced with ``RTE_PGSIZE_xxx`` defines.
 
+* vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
+  queue and not per device, a qid parameter was added to the arguments list.
+
 
 ABI Changes
 -----------
diff --git a/drivers/vdpa/ifc/ifcvf_vdpa.c b/drivers/vdpa/ifc/ifcvf_vdpa.c
index ec97178..6a2fed3 100644
--- a/drivers/vdpa/ifc/ifcvf_vdpa.c
+++ b/drivers/vdpa/ifc/ifcvf_vdpa.c
@@ -839,7 +839,7 @@ struct internal_list {
 	vdpa_ifcvf_stop(internal);
 	vdpa_disable_vfio_intr(internal);
 
-	ret = rte_vhost_host_notifier_ctrl(vid, false);
+	ret = rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, false);
 	if (ret && ret != -ENOTSUP)
 		goto error;
 
@@ -858,7 +858,7 @@ struct internal_list {
 	if (ret)
 		goto stop_vf;
 
-	rte_vhost_host_notifier_ctrl(vid, true);
+	rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true);
 
 	internal->sw_fallback_running = true;
 
@@ -893,7 +893,7 @@ struct internal_list {
 	rte_atomic32_set(&internal->dev_attached, 1);
 	update_datapath(internal);
 
-	if (rte_vhost_host_notifier_ctrl(vid, true) != 0)
+	if (rte_vhost_host_notifier_ctrl(vid, RTE_VHOST_QUEUE_ALL, true) != 0)
 		DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did);
 
 	return 0;
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 159653f..97f87c5 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -146,7 +146,8 @@
 	int ret;
 
 	if (priv->direct_notifier) {
-		ret = rte_vhost_host_notifier_ctrl(priv->vid, false);
+		ret = rte_vhost_host_notifier_ctrl(priv->vid,
+						   RTE_VHOST_QUEUE_ALL, false);
 		if (ret != 0) {
 			DRV_LOG(INFO, "Direct HW notifier FD cannot be "
 				"destroyed for device %d: %d.", priv->vid, ret);
@@ -154,7 +155,8 @@
 		}
 		priv->direct_notifier = 0;
 	}
-	ret = rte_vhost_host_notifier_ctrl(priv->vid, true);
+	ret = rte_vhost_host_notifier_ctrl(priv->vid, RTE_VHOST_QUEUE_ALL,
+					   true);
 	if (ret != 0)
 		DRV_LOG(INFO, "Direct HW notifier FD cannot be configured for"
 			" device %d: %d.", priv->vid, ret);
diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
index ecb3d91..fd42085 100644
--- a/lib/librte_vhost/rte_vdpa.h
+++ b/lib/librte_vhost/rte_vdpa.h
@@ -202,22 +202,26 @@ struct rte_vdpa_device *
 int
 rte_vdpa_get_device_num(void);
 
+#define RTE_VHOST_QUEUE_ALL UINT16_MAX
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice
  *
- * Enable/Disable host notifier mapping for a vdpa port.
+ * Enable/Disable host notifier mapping for a vdpa queue.
  *
  * @param vid
  *  vhost device id
  * @param enable
  *  true for host notifier map, false for host notifier unmap
+ * @param qid
+ *  vhost queue id, RTE_VHOST_QUEUE_ALL to configure all the device queues
  * @return
  *  0 on success, -1 on failure
  */
 __rte_experimental
 int
-rte_vhost_host_notifier_ctrl(int vid, bool enable);
+rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable);
 
 /**
  * @warning
diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
index 329ed8a..1ac7eaf 100644
--- a/lib/librte_vhost/rte_vhost.h
+++ b/lib/librte_vhost/rte_vhost.h
@@ -107,7 +107,6 @@
 #define VHOST_USER_F_PROTOCOL_FEATURES	30
 #endif
 
-
 /**
  * Information relating to memory regions including offsets to
  * addresses in QEMUs memory file.
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ea9cd10..4e1af91 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2951,13 +2951,13 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev,
 	return process_slave_message_reply(dev, &msg);
 }
 
-int rte_vhost_host_notifier_ctrl(int vid, bool enable)
+int rte_vhost_host_notifier_ctrl(int vid, uint16_t qid, bool enable)
 {
 	struct virtio_net *dev;
 	struct rte_vdpa_device *vdpa_dev;
 	int vfio_device_fd, did, ret = 0;
 	uint64_t offset, size;
-	unsigned int i;
+	unsigned int i, q_start, q_last;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -2981,6 +2981,16 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 	if (!vdpa_dev)
 		return -ENODEV;
 
+	if (qid == RTE_VHOST_QUEUE_ALL) {
+		q_start = 0;
+		q_last = dev->nr_vring - 1;
+	} else {
+		if (qid >= dev->nr_vring)
+			return -EINVAL;
+		q_start = qid;
+		q_last = qid;
+	}
+
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_vfio_device_fd, -ENOTSUP);
 	RTE_FUNC_PTR_OR_ERR_RET(vdpa_dev->ops->get_notify_area, -ENOTSUP);
 
@@ -2989,7 +2999,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		return -ENOTSUP;
 
 	if (enable) {
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			if (vdpa_dev->ops->get_notify_area(vid, i, &offset,
 					&size) < 0) {
 				ret = -ENOTSUP;
@@ -3004,7 +3014,7 @@ int rte_vhost_host_notifier_ctrl(int vid, bool enable)
 		}
 	} else {
 disable:
-		for (i = 0; i < dev->nr_vring; i++) {
+		for (i = q_start; i <= q_last; i++) {
 			vhost_user_slave_set_vring_host_notifier(dev, i, -1,
 					0, 0);
 		}
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 4/6] bus/mlx5_pci: register a PCI driver
  @ 2020-06-29 15:49  2%     ` Gaëtan Rivet
  0 siblings, 0 replies; 200+ results
From: Gaëtan Rivet @ 2020-06-29 15:49 UTC (permalink / raw)
  To: Parav Pandit; +Cc: ferruh.yigit, thomas, dev, orika, matan

On 21/06/20 19:11 +0000, Parav Pandit wrote:
> Create a mlx5 bus driver framework for invoking drivers of
> multiple classes who have registered with the mlx5_pci bus
> driver.
> 
> Validate user class arguments for supported class combinations.
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
> Changelog:
> v1->v2:
>  - Address comments from Thomas and Gaetan
>  - Enhanced driver to honor RTE_PCI_DRV_PROBE_AGAIN drv_flag
>  - Use anonymous structure for class search and code changes around it
>  - Define static for class comination array
>  - Use RTE_DIM to find array size
>  - Added OOM check for strdup()
>  - Renamed copy variable to nstr_orig
>  - Returning negagive error code
>  - Returning directly if match entry found
>  - Use compat condition check
>  - Avoided cutting error message string
>  - USe uint32_t datatype instead of enum mlx5_class
>  - Changed logic to parse device arguments only once during probe()
>  - Added check to fail driver probe if multiple classes register with
>    DMA ops
>  - Renamed function to parse_class_options
> ---
>  drivers/bus/mlx5_pci/Makefile           |   2 +
>  drivers/bus/mlx5_pci/meson.build        |   2 +-
>  drivers/bus/mlx5_pci/mlx5_pci_bus.c     | 290 ++++++++++++++++++++++++
>  drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h |   1 +
>  4 files changed, 294 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mlx5_pci/Makefile b/drivers/bus/mlx5_pci/Makefile
> index 7db977ba8..e53ed8856 100644
> --- a/drivers/bus/mlx5_pci/Makefile
> +++ b/drivers/bus/mlx5_pci/Makefile
> @@ -13,7 +13,9 @@ CFLAGS += $(WERROR_FLAGS)
>  CFLAGS += -I$(RTE_SDK)/drivers/common/mlx5
>  CFLAGS += -I$(BUILDDIR)/drivers/common/mlx5
>  CFLAGS += -I$(RTE_SDK)/drivers/bus/pci
> +CFLAGS += -D_DEFAULT_SOURCE
>  LDLIBS += -lrte_eal
> +LDLIBS += -lrte_kvargs
>  LDLIBS += -lrte_common_mlx5
>  LDLIBS += -lrte_pci -lrte_bus_pci
>  
> diff --git a/drivers/bus/mlx5_pci/meson.build b/drivers/bus/mlx5_pci/meson.build
> index cc4a84e23..5111baa4e 100644
> --- a/drivers/bus/mlx5_pci/meson.build
> +++ b/drivers/bus/mlx5_pci/meson.build
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2020 Mellanox Technologies Ltd
>  
> -deps += ['pci', 'bus_pci', 'common_mlx5']
> +deps += ['pci', 'bus_pci', 'common_mlx5', 'kvargs']
>  install_headers('rte_bus_mlx5_pci.h')
>  sources = files('mlx5_pci_bus.c')
> diff --git a/drivers/bus/mlx5_pci/mlx5_pci_bus.c b/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> index 66db3c7b0..e8f1649a3 100644
> --- a/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> +++ b/drivers/bus/mlx5_pci/mlx5_pci_bus.c
> @@ -3,12 +3,302 @@
>   */
>  
>  #include "rte_bus_mlx5_pci.h"
> +#include <mlx5_common_utils.h>
>  
>  static TAILQ_HEAD(mlx5_pci_bus_drv_head, rte_mlx5_pci_driver) drv_list =
>  				TAILQ_HEAD_INITIALIZER(drv_list);
>  
> +static const struct {
> +	const char *name;
> +	unsigned int dev_class;

Let me quote you when you refused to follow my comment:

>> Yes, I acked to changed to define, but I forgot that I use the enum here.
>> So I am going to keep the enum as code reads more clear with enum.

You refused to use a fixed-width integer type as per my past comments,
for readability reasons, but changed the type to "unsigned int" instead.

I insisted in the previous commit on uint32_t for exposed ABI (even if
between internal libs). Here I accept some leeway given the
compilation-unit scope of the definition. But at this point, your choice
is certainly *NOT* to use a vague type instead.

> +} mlx5_classes[] = {

mlx5_class_names.

> +	{ .name = "vdpa", .dev_class = MLX5_CLASS_VDPA },
> +	{ .name = "net", .dev_class = MLX5_CLASS_NET },
> +};
> +
> +static const unsigned int mlx5_valid_class_combo[] = {
> +	MLX5_CLASS_NET,
> +	MLX5_CLASS_VDPA,
> +	/* New class combination should be added here */

This comment seems redundant, new class combo will be added wherever
appropriate, leave it to future dev.

> +};
> +
> +static int class_name_to_val(const char *class_name)

I think mlx5_class_from_name() is better.
(with mlx5_ namespace.)

> +{
> +	unsigned int i;

In general, size_t is the type of array iterators in C.

> +
> +	for (i = 0; i < RTE_DIM(mlx5_classes); i++) {
> +		if (strcmp(class_name, mlx5_classes[i].name) == 0)
> +			return mlx5_classes[i].dev_class;
> +
> +	}
> +	return -EINVAL;

You're mixing signed int and enum mlx5_class as return type.
Please find another way of signaling error that will make you keep the enum.

You have a sentinel value describing explicitly an invalid class, it seems perfectly
suited instead of -EINVAL. Use it instead.

> +}
> +
> +static int
> +mlx5_bus_opt_handler(__rte_unused const char *key, const char *class_names,
> +		     void *opaque)
> +{
> +	int *ret = opaque;
> +	char *nstr_org;
> +	int class_val;
> +	char *found;
> +	char *nstr;
> +
> +	*ret = 0;
> +	nstr = strdup(class_names);
> +	if (!nstr) {

Please be explicit and use (nstr == NULL).

> +		*ret = -ENOMEM;
> +		return *ret;
> +	}
> +
> +	nstr_org = nstr;

nstr_orig is more readable.

> +	while (nstr) {
        while (nstr != NULL) {

> +		/* Extract each individual class name */
> +		found = strsep(&nstr, ":");
> +		if (!found)

        ditto

> +			continue;
> +
> +		/* Check if its a valid class */
> +		class_val = class_name_to_val(found);
> +		if (class_val < 0) {

if (class_val == MLX5_CLASS_INVALID),
with the proper API change.

> +			*ret = -EINVAL;
> +			goto err;
> +		}
> +
> +		*ret |= class_val;

Once again, mixing ints and enum mlx5_class.
You don't *have* to set *ret on error.

* Change your opaque out_arg to uint32_t, stop using variable width types for bitmaps.

* Do not set it on error, use a tmp u32 for parsing and only set it once everything is ok.

* rte_kvargs_process() will mask your error values anyway, so instead set rte_errno and return -1.
  On negative return, it will itself return -1. Check for < 0 in bus_options_valid()

> +	}
> +err:
> +	free(nstr_org);
> +	if (*ret < 0)
> +		DRV_LOG(ERR, "Invalid mlx5 class options %s. Maybe typo in device class argument setting?",

Find a way to give the exact source of error. If it is an invalid name, show which token failed to be parsed
(meaning move your error code before nstr_orig is freed). Remove the "Maybe" formulation.

By the way, Thomas' comment was correct instead of mine, you should just cut your format string after
the "%s.".

> +			class_names);
> +	return *ret;
> +}
> +
> +static int
> +parse_class_options(const struct rte_devargs *devargs)
> +{
> +	const char *key = MLX5_CLASS_ARG_NAME;
> +	struct rte_kvargs *kvlist;
> +	int ret = 0;
> +
> +	if (devargs == NULL)
> +		return 0;
> +	kvlist = rte_kvargs_parse(devargs->args, NULL);
> +	if (kvlist == NULL)
> +		return 0;
> +	if (rte_kvargs_count(kvlist, key))
> +		rte_kvargs_process(kvlist, key, mlx5_bus_opt_handler, &ret);

Set ret to rte_kvargs_process() return value instead, define a specific u32 for bitmap.
Find a way to output the bitmap *separately* from the error code, or
set MLX5_CLASS_INVALID in the bitmap before returning it as sole return value for this function.
(meaning having a proper bit value for MLX5_CLASS_INVALID, if you go this way.)

I already said it in previous review, I will reformulate: stop overloading your types,
relying on implicit casts between correct and incorrect values, and merging your returned values
and the error channel.

Please be proactive into cleaning up your APIs.

> +	rte_kvargs_free(kvlist);
> +	return ret;
> +}
> +
>  void
>  rte_mlx5_pci_driver_register(struct rte_mlx5_pci_driver *driver)
>  {
>  	TAILQ_INSERT_TAIL(&drv_list, driver, next);
>  }
> +
> +static bool
> +mlx5_bus_match(const struct rte_mlx5_pci_driver *drv,
> +	       const struct rte_pci_device *pci_dev)
> +{
> +	const struct rte_pci_id *id_table;
> +
> +	for (id_table = drv->pci_driver.id_table; id_table->vendor_id != 0;
> +	     id_table++) {
> +		/* check if device's ids match the class driver's ones */
> +		if (id_table->vendor_id != pci_dev->id.vendor_id &&
> +				id_table->vendor_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->device_id != pci_dev->id.device_id &&
> +				id_table->device_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->subsystem_vendor_id !=
> +		    pci_dev->id.subsystem_vendor_id &&
> +		    id_table->subsystem_vendor_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->subsystem_device_id !=
> +		    pci_dev->id.subsystem_device_id &&
> +		    id_table->subsystem_device_id != PCI_ANY_ID)
> +			continue;
> +		if (id_table->class_id != pci_dev->id.class_id &&
> +				id_table->class_id != RTE_CLASS_ANY_ID)
> +			continue;
> +
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static int is_valid_class_combo(uint32_t user_classes)
> +{
> +	unsigned int i;

size_t

> +
> +	/* Verify if user specified valid supported combination */
                                    a valid combination.
> +	for (i = 0; i < RTE_DIM(mlx5_valid_class_combo); i++) {
> +		if (mlx5_valid_class_combo[i] == user_classes)

You simplified the scope of this function, which is good.
However, given the more limited scope, now it becomes a boolean
yes/no.

You are returning (0 | false) for yes, which is not ok.

reading if (is_valid_class_combo(combo)) { handle_error(combo); } is pretty
awkward.

While you're at it, you might want to use a proper bool instead.

> +			return 0;
> +	}
> +	/* Not found any valid class combination */
> +	return -EINVAL;
> +}
> +
> +static int validate_single_class_dma_ops(void)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int dma_map_classes = 0;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (class->pci_driver.dma_map)
> +			dma_map_classes++;
> +	}
> +	if (dma_map_classes > 1) {
> +		DRV_LOG(ERR, "Multiple classes with DMA ops is unsupported");
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +/**
> + * DPDK callback to register to probe multiple PCI class devices.
> + *
> + * @param[in] pci_drv
> + *   PCI driver structure.
> + * @param[in] dev
> + *   PCI device information.
> + *
> + * @return
> + *   0 on success, 1 to skip this driver, a negative errno value otherwise
> + *   and rte_errno is set.
> + */
> +static int
> +mlx5_bus_pci_probe(struct rte_pci_driver *drv __rte_unused,

drv is not unused, you are passing it to sub-drivers below.

> +		   struct rte_pci_device *dev)
> +{
> +	struct rte_mlx5_pci_driver *class;

This compilation unit targets a C compiler. I think only
headers should ensure compat with C++, but this name is not great still.

driver seems more appropriate anyway to designate a driver.

> +	uint32_t user_classes = 0;
> +	int ret;
> +

Mixing ret and user_classes as you do afterward is the result of the above API issues
already outlined. I won't go over them again, please fix everything to have proper
type discipline.

> +	ret = validate_single_class_dma_ops();
> +	if (ret)
> +		return ret;
> +
> +	ret = parse_class_options(dev->device.devargs);
> +	if (ret < 0)
> +		return ret;
> +
> +	user_classes = ret;
> +	if (user_classes) {
> +		/* Validate combination here */
> +		ret = is_valid_class_combo(user_classes);
> +		if (ret) {
> +			DRV_LOG(ERR, "Unsupported mlx5 classes supplied");
> +			return ret;
> +		}
> +	}
> +
> +	/* Default to net class */
> +	if (user_classes == 0)
> +		user_classes = MLX5_CLASS_NET;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (!mlx5_bus_match(class, dev))
> +			continue;
> +
> +		if ((class->dev_class & user_classes) == 0)
> +			continue;
> +
> +		ret = -EINVAL;
> +		if (class->loaded) {
> +			/* If already loaded and class driver can handle
> +			 * reprobe, probe such class driver again.
> +			 */
> +			if (class->pci_driver.drv_flags & RTE_PCI_DRV_PROBE_AGAIN)
> +				ret = class->pci_driver.probe(drv, dev);

Using "drv" here instead of "class" means you are overriding the DRV_FLAG set by the
sub-driver.

Why not use "class" instead? dev->driver is setup by the upper layer, so will be correctly set
to drv instead of class.

> +		} else {
> +			ret = class->pci_driver.probe(drv, dev);
> +		}

You are ignoring probe() < 0 here, seems wrong.

> +		if (!ret)
> +			class->loaded = true;

loaded flag is not properly set.
You will set it on first successful probe, even on further errors.

Instead, use a u32 to mark each properly probed classes, then set loaded outside of this loop,
only if this "probed" bitmap matches exactly the "user_classes" bitmap.

This means also not silently ignoring dev and class mismatch. If this is the behavior you
explicitly want, then you will need to unset the mismatched class in the user_classes, so that the
exact match on probed is correct. Otherwise, logging an error is more appropriate.

> +	}
> +	return 0;
> +}
> +
> +/**
> + * DPDK callback to remove one or more class devices for a PCI device.
> + *
> + * This function removes all class devices belong to a given PCI device.
> + *
> + * @param[in] pci_dev
> + *   Pointer to the PCI device.
> + *
> + * @return
> + *   0 on success, the function cannot fail.
> + */
> +static int
> +mlx5_bus_pci_remove(struct rte_pci_device *dev)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +
> +	/* Remove each class driver in reverse order */
> +	TAILQ_FOREACH_REVERSE(class, &drv_list, mlx5_pci_bus_drv_head, next) {
> +		if (class->loaded)
> +			class->pci_driver.remove(dev);
> +	}
> +	return 0;
> +}
> +
> +static int
> +mlx5_bus_pci_dma_map(struct rte_pci_device *dev, void *addr,
> +		     uint64_t iova, size_t len)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int ret = -EINVAL;
> +
> +	TAILQ_FOREACH(class, &drv_list, next) {
> +		if (!class->pci_driver.dma_map)
> +			continue;
> +
> +		return class->pci_driver.dma_map(dev, addr, iova, len);
> +	}
> +	return ret;
> +}
> +
> +static int
> +mlx5_bus_pci_dma_unmap(struct rte_pci_device *dev, void *addr,
> +		       uint64_t iova, size_t len)
> +{
> +	struct rte_mlx5_pci_driver *class;
> +	int ret = -EINVAL;
> +
> +	TAILQ_FOREACH_REVERSE(class, &drv_list, mlx5_pci_bus_drv_head, next) {
> +		if (!class->pci_driver.dma_unmap)
> +			continue;
> +

I see no additional logging about edge-cases that were discussed previously.
You can add them to the register function.

> +		return class->pci_driver.dma_unmap(dev, addr, iova, len);
> +	}
> +	return ret;
> +}
> +
> +static const struct rte_pci_id mlx5_bus_pci_id_map[] = {
> +	{
> +		.vendor_id = 0
> +	}
> +};
> +
> +static struct rte_pci_driver mlx5_bus_driver = {
> +	.driver = {
> +		.name = "mlx5_bus_pci",
> +	},
> +	.id_table = mlx5_bus_pci_id_map,
> +	.probe = mlx5_bus_pci_probe,
> +	.remove = mlx5_bus_pci_remove,
> +	.dma_map = mlx5_bus_pci_dma_map,
> +	.dma_unmap = mlx5_bus_pci_dma_unmap,
> +	.drv_flags = RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV |
> +		     RTE_PCI_DRV_PROBE_AGAIN,
> +};
> +
> +RTE_PMD_REGISTER_PCI(mlx5_bus, mlx5_bus_driver);
> +RTE_PMD_REGISTER_PCI_TABLE(mlx5_bus, mlx5_bus_pci_id_map);
> diff --git a/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h b/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> index 571f7dfd6..c8cd7187b 100644
> --- a/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> +++ b/drivers/bus/mlx5_pci/rte_bus_mlx5_pci.h
> @@ -55,6 +55,7 @@ struct rte_mlx5_pci_driver {
>  	enum mlx5_class dev_class;		/**< Class of this driver */
>  	struct rte_pci_driver pci_driver;	/**< Inherit core pci driver. */
>  	TAILQ_ENTRY(rte_mlx5_pci_driver) next;
> +	bool loaded;
>  };
>  
>  /**
> -- 
> 2.25.4
> 

-- 
Gaëtan

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-27  7:44  5%   ` Jerin Jacob
@ 2020-06-29 19:30  4%     ` McDaniel, Timothy
  2020-06-30  4:21  0%       ` Jerin Jacob
  2020-06-30 11:22  0%     ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: McDaniel, Timothy @ 2020-06-29 19:30 UTC (permalink / raw)
  To: Jerin Jacob, Ray Kinsella, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Eads, Gage,
	Van Haaren, Harry

-----Original Message-----
From: Jerin Jacob <jerinjacobk@gmail.com> 
Sent: Saturday, June 27, 2020 2:45 AM
To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites

> +
> +/** Event port configuration structure */
> +struct rte_event_port_conf_v20 {
> +       int32_t new_event_threshold;
> +       /**< A backpressure threshold for new event enqueues on this port.
> +        * Use for *closed system* event dev where event capacity is limited,
> +        * and cannot exceed the capacity of the event dev.
> +        * Configuring ports with different thresholds can make higher priority
> +        * traffic less likely to  be backpressured.
> +        * For example, a port used to inject NIC Rx packets into the event dev
> +        * can have a lower threshold so as not to overwhelm the device,
> +        * while ports used for worker pools can have a higher threshold.
> +        * This value cannot exceed the *nb_events_limit*
> +        * which was previously supplied to rte_event_dev_configure().
> +        * This should be set to '-1' for *open system*.
> +        */
> +       uint16_t dequeue_depth;
> +       /**< Configure number of bulk dequeues for this event port.
> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
> +       uint16_t enqueue_depth;
> +       /**< Configure number of bulk enqueues for this event port.
> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> +        * which previously supplied to rte_event_dev_configure().
> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> +        */
>         uint8_t disable_implicit_release;
>         /**< Configure the port not to release outstanding events in
>          * rte_event_dev_dequeue_burst(). If true, all events received through
> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>                                 struct rte_event_port_conf *port_conf);
>
> +int
> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> +                               struct rte_event_port_conf_v20 *port_conf);
> +
> +int
> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> +                                     struct rte_event_port_conf *port_conf);

Hi Timothy,

+ ABI Maintainers (Ray, Neil)

# As per my understanding, the structures can not be versioned, only
function can be versioned.
i.e we can not make any change to " struct rte_event_port_conf"

# We have a similar case with ethdev and it deferred to next release v20.11
http://patches.dpdk.org/patch/69113/

Regarding the API changes:
# The slow path changes general looks good to me. I will review the
next level in the coming days
# The following fast path changes bothers to me. Could you share more
details on below change?

diff --git a/app/test-eventdev/test_order_atq.c
b/app/test-eventdev/test_order_atq.c
index 3366cfc..8246b96 100644
--- a/app/test-eventdev/test_order_atq.c
+++ b/app/test-eventdev/test_order_atq.c
@@ -34,6 +34,8 @@
                        continue;
                }

+               ev.flow_id = ev.mbuf->udata64;
+
# Since RC1 is near, I am not sure how to accommodate the API changes
now and sort out ABI stuffs.
# Other concern is eventdev spec get bloated with versioning files
just for ONE release as 20.11 will be OK to change the ABI.
# While we discuss the API change, Please send deprecation notice for
ABI change for 20.11,
so that there is no ambiguity of this patch for the 20.11 release.

Hello Jerin,

Thank you for the review comments.

With regard to your comments regarding the fast path flow_id change, the Intel DLB hardware
is not capable of transferring the flow_id as part of the event itself. We therefore require a mechanism
to accomplish this. What we have done to work around this is to require the application to embed the flow_id
within the data payload. The new flag, #define RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
by applications to determine if they need to embed the flow_id, or if its automatically propagated and present in the
received event.

What we should have done is to wrap the assignment with a conditional.  

if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
	ev.flow_id = ev.mbuf->udata64;

This would minimize/eliminate any performance impact due to the processor's branch prediction logic.
The assignment then becomes in essence a NOOP for all event devices that are capable of carrying the flow_id as part of the event payload itself.

Thanks,
Tim



Thanks,
Tim

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2 0/4] add PPC and Windows cross-compilation to meson test
  @ 2020-06-29 23:15  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-06-29 23:15 UTC (permalink / raw)
  To: dev; +Cc: david.marchand, bruce.richardson, drc, dmitry.kozliuk

16/06/2020 00:22, Thomas Monjalon:
> In order to better support PPC and Windows,
> their compilation is tested on Linux with Meson
> with the script test-meson-builds.sh,
> supposed to be called in every CI labs.
> 
> 
> Thomas Monjalon (4):
>   devtools: shrink cross-compilation test definition
>   devtools: allow non-standard toolchain in meson test
>   devtools: add ppc64 in meson build test
>   devtools: add Windows cross-build test with MinGW
> 
> 
> v2: update some explanations and fix ABI check


Applied



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29  8:12  0%         ` Tal Shnaiderman
@ 2020-06-29 23:56  0%           ` Dmitry Kozlyuk
  2020-07-08  1:09  0%             ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2020-06-29 23:56 UTC (permalink / raw)
  To: Tal Shnaiderman
  Cc: Ranjit Menon, Fady Bader, dev, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Thomas Monjalon, Olivier Matz

On Mon, 29 Jun 2020 08:12:51 +0000, Tal Shnaiderman wrote:
> > From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> > 
> > On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:  
> > > On 6/28/2020 7:20 AM, Fady Bader wrote:  
> > > > Hi Dmitry,
> > > > I'm trying to run test-pmd on Windows and I ran into this error with  
> > cmdline.  
> > > >
> > > > The error log message is :
> > > > In file included from ../app/test-pmd/cmdline_flow.c:23:
> > > > ..\lib\librte_cmdline/cmdline_parse_num.h:24:2: error: 'INT64'  
> > redeclared as different kind of symbol  
> > > >    INT64
> > > >
> > > > In file included from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/winnt.h:150,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/minwindef.h:163,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/windef.h:8,  
> > > >                   from C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/windows.h:69,  
> > > >                   from ..\lib/librte_eal/windows/include/rte_windows.h:22,
> > > >                   from ..\lib/librte_eal/windows/include/pthread.h:20,
> > > >                   from ..\lib/librte_eal/include/rte_per_lcore.h:25,
> > > >                   from ..\lib/librte_eal/include/rte_errno.h:18,
> > > >                   from ..\lib\librte_ethdev/rte_ethdev.h:156,
> > > >                   from ../app/test-pmd/cmdline_flow.c:18:
> > > > C:/mingw-w64/x86_64/mingw64/x86_64-w64-  
> > mingw32/include/basetsd.h:32:44: note: previous declaration of 'INT64' was
> > here  
> > > >     __MINGW_EXTENSION typedef signed __int64 INT64,*PINT64;
> > > >
> > > > The same error is for the other types defined in cmdline_numtype.
> > > >
> > > > This problem with windows.h is popping in many places and some of
> > > > them are cmdline and test-pmd and librte_net.
> > > > We should find a way to exclude windows.h from the unneeded places,
> > > > is there any suggestions on how it can be done ?  
> > >
> > > We ran into this same issue when working with the code that is on the
> > > draft repo.
> > >
> > > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > > in Windows headers for integer types. We found that it is easier to
> > > change the enum in cmdline_parse_num.h than try to play with the
> > > include order of headers. AFAIK, the enums were only used to determine
> > > the type in a series of switch() statements in librte_cmdline, so we
> > > simply renamed the enums. Not sure, if that will be acceptable here.  
> > 
> > +1 for renaming enum values. It's not a problem of librte_cmdline itself
> > +but a
> > problem of its consumption on Windows, however renaming enum values
> > doesn't break ABI and winn make librte_cmdline API "namespaced".
> > 
> > I don't see a clean way not to expose windows.h, because pthread.h
> > depends on it, and if we hide implementation, librte_eal would have to
> > export pthread symbols on Windows, which is a hack (or is it?).  
> 
> test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.
>
> Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.

I never hit these particular conflicts, but you're right that there will be
more, e.g. I remember particularly nasty clashes in failsafe PMD, unrelated
to cmdline token names.


We could take the same approach as with networking headers: copy required
declarations instead of including them from SDK. Here's a list of what
pthread.h uses:

CloseHandle
CreateThread
DeleteSynchronizationBarrier
EnterSynchronizationBarrier
GetThreadAffinityMask
InitializeSynchronizationBarrier
OpenThread
SetPriorityClass
SetThreadAffinityMask
SetThreadPriority
TerminateThread

Windows has strict compatibility policy, so prototypes are unlikely to ever
change. None of the used functions takes string parameters, thus not affected
by A/W macros. Looks a bit messy, but it's limited in scope at least.


An external pthread library would solve the problem, but as I've reported
earlier, I failed to find a good one: [1] and [3] are tied to MinGW, although
of high quality, [2] seems outdated.

[1]: Wnpthreads:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/mingw-w64-libraries/winpthreads/
[2] pthreads-win32: https://sourceware.org/pthreads-win32/
[3] mcfgthread: https://github.com/lhmouse/mcfgthread

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-29 19:30  4%     ` McDaniel, Timothy
@ 2020-06-30  4:21  0%       ` Jerin Jacob
  2020-06-30 15:37  0%         ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30  4:21 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Saturday, June 27, 2020 2:45 AM
> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
> > +
> > +/** Event port configuration structure */
> > +struct rte_event_port_conf_v20 {
> > +       int32_t new_event_threshold;
> > +       /**< A backpressure threshold for new event enqueues on this port.
> > +        * Use for *closed system* event dev where event capacity is limited,
> > +        * and cannot exceed the capacity of the event dev.
> > +        * Configuring ports with different thresholds can make higher priority
> > +        * traffic less likely to  be backpressured.
> > +        * For example, a port used to inject NIC Rx packets into the event dev
> > +        * can have a lower threshold so as not to overwhelm the device,
> > +        * while ports used for worker pools can have a higher threshold.
> > +        * This value cannot exceed the *nb_events_limit*
> > +        * which was previously supplied to rte_event_dev_configure().
> > +        * This should be set to '-1' for *open system*.
> > +        */
> > +       uint16_t dequeue_depth;
> > +       /**< Configure number of bulk dequeues for this event port.
> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> > +        * which previously supplied to rte_event_dev_configure().
> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> > +        */
> > +       uint16_t enqueue_depth;
> > +       /**< Configure number of bulk enqueues for this event port.
> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> > +        * which previously supplied to rte_event_dev_configure().
> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> > +        */
> >         uint8_t disable_implicit_release;
> >         /**< Configure the port not to release outstanding events in
> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >                                 struct rte_event_port_conf *port_conf);
> >
> > +int
> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> > +                               struct rte_event_port_conf_v20 *port_conf);
> > +
> > +int
> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> > +                                     struct rte_event_port_conf *port_conf);
>
> Hi Timothy,
>
> + ABI Maintainers (Ray, Neil)
>
> # As per my understanding, the structures can not be versioned, only
> function can be versioned.
> i.e we can not make any change to " struct rte_event_port_conf"
>
> # We have a similar case with ethdev and it deferred to next release v20.11
> http://patches.dpdk.org/patch/69113/
>
> Regarding the API changes:
> # The slow path changes general looks good to me. I will review the
> next level in the coming days
> # The following fast path changes bothers to me. Could you share more
> details on below change?
>
> diff --git a/app/test-eventdev/test_order_atq.c
> b/app/test-eventdev/test_order_atq.c
> index 3366cfc..8246b96 100644
> --- a/app/test-eventdev/test_order_atq.c
> +++ b/app/test-eventdev/test_order_atq.c
> @@ -34,6 +34,8 @@
>                         continue;
>                 }
>
> +               ev.flow_id = ev.mbuf->udata64;
> +
> # Since RC1 is near, I am not sure how to accommodate the API changes
> now and sort out ABI stuffs.
> # Other concern is eventdev spec get bloated with versioning files
> just for ONE release as 20.11 will be OK to change the ABI.
> # While we discuss the API change, Please send deprecation notice for
> ABI change for 20.11,
> so that there is no ambiguity of this patch for the 20.11 release.
>
> Hello Jerin,
>
> Thank you for the review comments.
>
> With regard to your comments regarding the fast path flow_id change, the Intel DLB hardware
> is not capable of transferring the flow_id as part of the event itself. We therefore require a mechanism
> to accomplish this. What we have done to work around this is to require the application to embed the flow_id
> within the data payload. The new flag, #define RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> by applications to determine if they need to embed the flow_id, or if its automatically propagated and present in the
> received event.
>
> What we should have done is to wrap the assignment with a conditional.
>
> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>         ev.flow_id = ev.mbuf->udata64;

Two problems with this approach,
1) we are assuming mbuf udata64 field is available for DLB driver
2) It won't work with another adapter, eventdev has no dependency with mbuf

Question:
1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
only event pointer and not have any other metadata like schedule_type
etc.


>
> This would minimize/eliminate any performance impact due to the processor's branch prediction logic.
> The assignment then becomes in essence a NOOP for all event devices that are capable of carrying the flow_id as part of the event payload itself.
>
> Thanks,
> Tim
>
>
>
> Thanks,
> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  @ 2020-06-30  7:30  4% ` David Marchand
  2020-06-30  7:35  3%   ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-06-30  7:30 UTC (permalink / raw)
  To: Nicolas Chautru; +Cc: dev, Thomas Monjalon, Akhil Goyal

Hello Nicolas,

On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
<nicolas.chautru@intel.com> wrote:
>
> Planning to move bbdev API to stable from 20.11 (ABI version 21)
> and remove experimental tag.
> Sending now to advertise and get any feedback.
> Some manual rebase will be required later on notably as the
> actual release note which is not there yet.

Cool that we want to stabilize this API.
My concern is that we have drivers from a single vendor.
I would hate to see a new vendor unable to submit a driver (or having
to wait until the next ABI breakage window) because of the current
API/ABI.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [dpdk-announce] DPDK Userspace CFP now open; help celebrate 10 years of DPDK
@ 2020-06-29 22:36  3% Jill Lovato
  0 siblings, 0 replies; 200+ results
From: Jill Lovato @ 2020-06-29 22:36 UTC (permalink / raw)
  To: announce

DPDK Community,

We are moving forward with a virtual experience for DPDK Userspace
<https://events.linuxfoundation.org/dpdk-userspace-summit/program/cfp/#overview>
this year, happening September 22-23. The Call for Proposals is now open
<https://events.linuxfoundation.org/dpdk-userspace-summit/program/cfp/>
and will
close on July 12. Please plan to join us and get your submissions in
quickly. As usual, we are looking for presentations that showcase the
following:

-- Enhancements and additions to DPDK libraries, functional or
performance-wise
-- New networking technologies and their applicability to DPDK
-- Hardware NIC capabilities and offloads
-- Hardware datapath accelerators (compression, crypto, baseband, GPU,
regex, etc)
-- Virtualization and container networking
-- Debug tooling (tracing, dumps, metrics, telemetry, monitoring)
-- DPDK consumability (API/ABI compatibility, OS integration, packaging)
-- Project infrastructure, security, testing and workflow
-- Developer stories, technical challenges when integrating or developing
with DPDK
-- Feedback from usage and deployment of DPDK applications (OSS or
proprietar

Separately, DPDK celebrates 10 years as a project during 2020! We are
working to create a virtual yearbook and would love to hear about your
favorite moments from DPDK over the years. Please take a few moments to
share your thoughts via this brief Google form (folks in China, please
email pr@dpdk.org and we will send you a Word version):

Please also check out the latest blog post, which includes updates from the
Governing and Tech Boards:

Many thanks,
Jill

*Jill Lovato*
Senior PR Manager
The Linux Foundation
jlovato@linuxfoundation.org
Phone: +1.503.703.8268

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-30  7:30  4% ` David Marchand
@ 2020-06-30  7:35  3%   ` Akhil Goyal
  2020-07-02 17:54  0%     ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2020-06-30  7:35 UTC (permalink / raw)
  To: David Marchand, Nicolas Chautru; +Cc: dev, Thomas Monjalon


> 
> Hello Nicolas,
> 
> On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> <nicolas.chautru@intel.com> wrote:
> >
> > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > and remove experimental tag.
> > Sending now to advertise and get any feedback.
> > Some manual rebase will be required later on notably as the
> > actual release note which is not there yet.
> 
> Cool that we want to stabilize this API.
> My concern is that we have drivers from a single vendor.
> I would hate to see a new vendor unable to submit a driver (or having
> to wait until the next ABI breakage window) because of the current
> API/ABI.
> 
> 

+1 from my side. I am not sure how much it is acceptable for all the vendors/customers.
It is not reviewed by most of the vendors who may support in future.
It is not good to remove experimental tag as we have a long 1 year cycle to break the API/ABI.

Regards,
Akhil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id
  @ 2020-06-30  9:34  0%     ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-06-30  9:34 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, bruce.richardson, mdr, thomas, arybchenko,
	ktraynor, ian.stokes, i.maximets, Neil Horman, Cunming Liang,
	Konstantin Ananyev

On Fri, Jun 26, 2020 at 04:47:29PM +0200, David Marchand wrote:
> Because of the inline accessor + static declaration in rte_gettid(),
> we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
> Each compilation unit will pay a cost when accessing this information
> for the first time.
> 
> $ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
> 0000000000000054 d per_lcore__thread_id.5037
> 0000000000000040 d per_lcore__thread_id.5103
> 0000000000000048 d per_lcore__thread_id.5259
> 000000000000004c d per_lcore__thread_id.5259
> 0000000000000044 d per_lcore__thread_id.5933
> 0000000000000058 d per_lcore__thread_id.6261
> 0000000000000050 d per_lcore__thread_id.7378
> 000000000000005c d per_lcore__thread_id.7496
> 000000000000000c d per_lcore__thread_id.8016
> 0000000000000010 d per_lcore__thread_id.8431
> 
> Make it global as part of the DPDK_21 stable ABI.
> 
> Fixes: ef76436c6834 ("eal: get unique thread id")
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper
      2020-06-29  8:59  0%     ` [dpdk-dev] [EXT] " Sunil Kumar Kori
@ 2020-06-30  9:42  0%     ` Olivier Matz
  2 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-06-30  9:42 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, jerinjacobk, bruce.richardson, mdr, thomas, arybchenko,
	ktraynor, ian.stokes, i.maximets, Jerin Jacob, Sunil Kumar Kori,
	Neil Horman, Harini Ramakrishnan, Omar Cardona, Pallavi Kadam,
	Ranjit Menon

On Fri, Jun 26, 2020 at 04:47:31PM +0200, David Marchand wrote:
> This is a preparation step for dynamically unregistering threads.
> 
> Since we explicitly allocate a per thread trace buffer in
> rte_thread_init, add an internal helper to free this buffer.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Note: I preferred renaming the current internal function to free all
> threads trace buffers (new name trace_mem_free()) and reuse the previous
> name (trace_mem_per_thread_free()) when freeing this buffer for a given
> thread.
> 
> Changes since v2:
> - added missing stub for windows tracing support,
> - moved free symbol to exported (experimental) ABI as a counterpart of
>   the alloc symbol we already had,
> 
> Changes since v1:
> - rebased on master, removed Windows workaround wrt traces support,
> 
> ---
>  lib/librte_eal/common/eal_common_thread.c |  9 ++++
>  lib/librte_eal/common/eal_common_trace.c  | 51 +++++++++++++++++++----
>  lib/librte_eal/common/eal_thread.h        |  5 +++
>  lib/librte_eal/common/eal_trace.h         |  2 +-
>  lib/librte_eal/include/rte_trace_point.h  |  9 ++++
>  lib/librte_eal/rte_eal_version.map        |  3 ++
>  lib/librte_eal/windows/eal.c              |  5 +++
>  7 files changed, 75 insertions(+), 9 deletions(-)

[...]

> diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
> index 875553d7e5..3e620d76ed 100644
> --- a/lib/librte_eal/common/eal_common_trace.c
> +++ b/lib/librte_eal/common/eal_common_trace.c
> @@ -101,7 +101,7 @@ eal_trace_fini(void)
>  {
>  	if (!rte_trace_is_enabled())
>  		return;
> -	trace_mem_per_thread_free();
> +	trace_mem_free();
>  	trace_metadata_destroy();
>  	eal_trace_args_free();
>  }
> @@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
>  	rte_spinlock_unlock(&trace->lock);
>  }
>  
> +static void
> +trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
> +{
> +	if (meta->area == TRACE_AREA_HUGEPAGE)
> +		eal_free_no_trace(meta->mem);
> +	else if (meta->area == TRACE_AREA_HEAP)
> +		free(meta->mem);
> +}
> +
> +void
> +__rte_trace_mem_per_thread_free(void)
> +{
> +	struct trace *trace = trace_obj_get();
> +	struct __rte_trace_header *header;
> +	uint32_t count;
> +
> +	if (RTE_PER_LCORE(trace_mem) == NULL)
> +		return;
> +
> +	header = RTE_PER_LCORE(trace_mem);

nit:

	header = RTE_PER_LCORE(trace_mem);
	if (header == NULL)
		return;

[...]

> diff --git a/lib/librte_eal/include/rte_trace_point.h b/lib/librte_eal/include/rte_trace_point.h
> index 377c2414aa..686b86fdb1 100644
> --- a/lib/librte_eal/include/rte_trace_point.h
> +++ b/lib/librte_eal/include/rte_trace_point.h
> @@ -230,6 +230,15 @@ __rte_trace_point_fp_is_enabled(void)
>  __rte_experimental
>  void __rte_trace_mem_per_thread_alloc(void);
>  
> +/**
> + * @internal
> + *
> + * Free trace memory buffer per thread.
> + *
> + */
> +__rte_experimental
> +void __rte_trace_mem_per_thread_free(void);

Maybe the doc comment could be reworded a bit
(and the empty line can be removed by the way).

> +
>  /**
>   * @internal
>   *
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 0d42d44ce9..5831eea4b0 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -393,6 +393,9 @@ EXPERIMENTAL {
>  	rte_trace_point_lookup;
>  	rte_trace_regexp;
>  	rte_trace_save;
> +
> +	# added in 20.08
> +	__rte_trace_mem_per_thread_free;

Is it really needed to export this function?


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 20.11] eal: simplify exit functions
  @ 2020-06-30 10:26  0% ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-30 10:26 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: david.marchand, bruce.richardson, John McNamara, Marko Kovacevic,
	Neil Horman


On 24/06/2020 10:36, Thomas Monjalon wrote:
> The option RTE_EAL_ALWAYS_PANIC_ON_ERROR was off by default,
> and not customizable with meson. It is completely removed.
> 
> The function rte_dump_registers is a trace of the bare metal support
> era, and was not supported in userland. It is completely removed.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
> Because the empty function rte_dump_registers is part of the ABI,
> this change is planned for DPDK 20.11.
> ---
>  app/test/test_debug.c                    |  3 ---
>  config/common_base                       |  1 -
>  doc/guides/howto/debug_troubleshoot.rst  |  2 +-
>  lib/librte_eal/common/eal_common_debug.c | 17 +----------------
>  lib/librte_eal/include/rte_debug.h       |  7 -------
>  lib/librte_eal/rte_eal_version.map       |  1 -
>  6 files changed, 2 insertions(+), 29 deletions(-)
> 
> diff --git a/app/test/test_debug.c b/app/test/test_debug.c
> index 25eab97e2a..834a7386f5 100644
> --- a/app/test/test_debug.c
> +++ b/app/test/test_debug.c
> @@ -66,13 +66,11 @@ test_exit_val(int exit_val)
>  	}
>  	wait(&status);
>  	printf("Child process status: %d\n", status);
> -#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
>  	if(!WIFEXITED(status) || WEXITSTATUS(status) != (uint8_t)exit_val){
>  		printf("Child process terminated with incorrect status (expected = %d)!\n",
>  				exit_val);
>  		return -1;
>  	}
> -#endif
>  	return 0;
>  }
>  
> @@ -113,7 +111,6 @@ static int
>  test_debug(void)
>  {
>  	rte_dump_stack();
> -	rte_dump_registers();
>  	if (test_panic() < 0)
>  		return -1;
>  	if (test_exit() < 0)
> diff --git a/config/common_base b/config/common_base
> index c7d5c73215..42ad399b17 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -103,7 +103,6 @@ CONFIG_RTE_ENABLE_TRACE_FP=n
>  CONFIG_RTE_LOG_HISTORY=256
>  CONFIG_RTE_BACKTRACE=y
>  CONFIG_RTE_LIBEAL_USE_HPET=n
> -CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n
>  CONFIG_RTE_EAL_IGB_UIO=n
>  CONFIG_RTE_EAL_VFIO=n
>  CONFIG_RTE_MAX_VFIO_GROUPS=64
> diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
> index cef016b2fe..1ed8be5a04 100644
> --- a/doc/guides/howto/debug_troubleshoot.rst
> +++ b/doc/guides/howto/debug_troubleshoot.rst
> @@ -313,7 +313,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
>     * For high-performance execution logic ensure running it on correct NUMA
>       and non-master core.
>  
> -   * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
> +   * Analyze run logic with ``rte_dump_stack`` and
>       ``rte_memdump`` for more insights.
>  
>     * Make use of objdump to ensure opcode is matching to the desired state.
> diff --git a/lib/librte_eal/common/eal_common_debug.c b/lib/librte_eal/common/eal_common_debug.c
> index 722468754d..15418e957f 100644
> --- a/lib/librte_eal/common/eal_common_debug.c
> +++ b/lib/librte_eal/common/eal_common_debug.c
> @@ -7,14 +7,6 @@
>  #include <rte_log.h>
>  #include <rte_debug.h>
>  
> -/* not implemented */
> -void
> -rte_dump_registers(void)
> -{
> -	return;
> -}
> -
> -/* call abort(), it will generate a coredump if enabled */
>  void
>  __rte_panic(const char *funcname, const char *format, ...)
>  {
> @@ -25,8 +17,7 @@ __rte_panic(const char *funcname, const char *format, ...)
>  	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
>  	va_end(ap);
>  	rte_dump_stack();
> -	rte_dump_registers();
> -	abort();
> +	abort(); /* generate a coredump if enabled */
>  }
>  
>  /*
> @@ -46,14 +37,8 @@ rte_exit(int exit_code, const char *format, ...)
>  	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
>  	va_end(ap);
>  
> -#ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
>  	if (rte_eal_cleanup() != 0)
>  		RTE_LOG(CRIT, EAL,
>  			"EAL could not release all resources\n");
>  	exit(exit_code);
> -#else
> -	rte_dump_stack();
> -	rte_dump_registers();
> -	abort();
> -#endif
>  }
> diff --git a/lib/librte_eal/include/rte_debug.h b/lib/librte_eal/include/rte_debug.h
> index 50052c5a90..c4bc71ce28 100644
> --- a/lib/librte_eal/include/rte_debug.h
> +++ b/lib/librte_eal/include/rte_debug.h
> @@ -26,13 +26,6 @@ extern "C" {
>   */
>  void rte_dump_stack(void);
>  
> -/**
> - * Dump the registers of the calling core to the console.
> - *
> - * Note: Not implemented in a userapp environment; use gdb instead.
> - */
> -void rte_dump_registers(void);
> -
>  /**
>   * Provide notification of a critical non-recoverable error and terminate
>   * execution abnormally.
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 196eef5afa..3f36e46b3b 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -37,7 +37,6 @@ DPDK_20.0 {
>  	rte_devargs_remove;
>  	rte_devargs_type_count;
>  	rte_dump_physmem_layout;
> -	rte_dump_registers;
>  	rte_dump_stack;
>  	rte_dump_tailq;
>  	rte_eal_alarm_cancel;
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
  @ 2020-06-30 10:35  3%         ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-06-30 10:35 UTC (permalink / raw)
  To: Bruce Richardson, David Marchand
  Cc: Ruifeng Wang, Vladimir Medvedkin, John McNamara, Marko Kovacevic,
	Neil Horman, dev, Ananyev, Konstantin, Honnappa Nagarahalli, nd



On 29/06/2020 13:55, Bruce Richardson wrote:
> On Mon, Jun 29, 2020 at 01:56:07PM +0200, David Marchand wrote:
>> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>>>
>>> Currently, the tbl8 group is freed even though the readers might be
>>> using the tbl8 group entries. The freed tbl8 group can be reallocated
>>> quickly. This results in incorrect lookup results.
>>>
>>> RCU QSBR process is integrated for safe tbl8 group reclaim.
>>> Refer to RCU documentation to understand various aspects of
>>> integrating RCU library into other libraries.
>>>
>>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>>> ---
>>>  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
>>>  lib/librte_lpm/Makefile            |   2 +-
>>>  lib/librte_lpm/meson.build         |   1 +
>>>  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
>>>  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
>>>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>>>  6 files changed, 216 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
>>> index 1609a57d0..7cc99044a 100644
>>> --- a/doc/guides/prog_guide/lpm_lib.rst
>>> +++ b/doc/guides/prog_guide/lpm_lib.rst
>>> @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
>>>  Prefix expansion is one of the keys of this algorithm,
>>>  since it improves the speed dramatically by adding redundancy.
>>>
>>> +Deletion
>>> +~~~~~~~~
>>> +
>>> +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
>>> +the longest prefix match with the rule to be deleted, but has smaller depth.
>>> +
>>> +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
>>> +value with the replacement rule.
>>> +
>>> +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
>>> +
>>> +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
>>> +
>>> +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
>>> +
>>> +*   All tbl8s in the group are empty .
>>> +
>>> +*   All tbl8s in the group have the same values and with depth no greater than 24.
>>> +
>>> +Free of tbl8s have different behaviors:
>>> +
>>> +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
>>> +
>>> +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
>>> +
>>> +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
>>> +the tbl8 group entries. This might result in incorrect lookup results.
>>> +
>>> +RCU QSBR process is integrated for safe tbl8 group reclaimation. Application has certain responsibilities
>>> +while using this feature. Please refer to resource reclaimation framework of :ref:`RCU library <RCU_Library>`
>>> +for more details.
>>> +
>>
>> Would the lpm6 library benefit from the same?
>> Asking as I do not see much code shared between lpm and lpm6.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
>>> index 38ab512a4..41e9c49b8 100644
>>> --- a/lib/librte_lpm/rte_lpm.c
>>> +++ b/lib/librte_lpm/rte_lpm.c
>>> @@ -1,5 +1,6 @@
>>>  /* SPDX-License-Identifier: BSD-3-Clause
>>>   * Copyright(c) 2010-2014 Intel Corporation
>>> + * Copyright(c) 2020 Arm Limited
>>>   */
>>>
>>>  #include <string.h>
>>> @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
>>>                 TAILQ_REMOVE(lpm_list, te, next);
>>>
>>>         rte_mcfg_tailq_write_unlock();
>>> -
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       if (lpm->dq)
>>> +               rte_rcu_qsbr_dq_delete(lpm->dq);
>>> +#endif
>>
>> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API flag set.
>> There is no need to protect against this flag in rte_lpm.c.
>>
>> [...]
>>
>>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>>> index b9d49ac87..7889f21b3 100644
>>> --- a/lib/librte_lpm/rte_lpm.h
>>> +++ b/lib/librte_lpm/rte_lpm.h
>>
>>> @@ -130,6 +143,28 @@ struct rte_lpm {
>>>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>>>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>>>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
>>> +#ifdef ALLOW_EXPERIMENTAL_API
>>> +       /* RCU config. */
>>> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
>>> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
>>> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
>>> +#endif
>>> +};
>>
>> This is more a comment/question for the lpm maintainers.
>>
>> Afaics, the rte_lpm structure is exported/public because of lookup
>> which is inlined.
>> But most of the structure can be hidden and stored in a private
>> structure that would embed the exposed rte_lpm.
>> The slowpath functions would only have to translate from publicly
>> exposed to internal representation (via container_of).
>>
>> This patch could do this and be the first step to hide the unneeded
>> exposure of other fields (later/in 20.11 ?).
>>
>> Thoughts?
>>
> Hiding as much of the structures as possible is always a good idea, so if
> that is possible in this patchset I would support such a move.
> 
> /Bruce
> 

Agreed - I acked the change as it doesn't break ABI compatibility.
Bruce and David's comments still hold for 20.11+. 

Ray K

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-27  7:44  5%   ` Jerin Jacob
  2020-06-29 19:30  4%     ` McDaniel, Timothy
@ 2020-06-30 11:22  0%     ` Kinsella, Ray
  2020-06-30 11:30  0%       ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-06-30 11:22 UTC (permalink / raw)
  To: Jerin Jacob, Tim McDaniel, Neil Horman
  Cc: Jerin Jacob, Mattias Rönnblom, dpdk-dev, Gage Eads,
	Van Haaren, Harry



On 27/06/2020 08:44, Jerin Jacob wrote:
>> +
>> +/** Event port configuration structure */
>> +struct rte_event_port_conf_v20 {
>> +       int32_t new_event_threshold;
>> +       /**< A backpressure threshold for new event enqueues on this port.
>> +        * Use for *closed system* event dev where event capacity is limited,
>> +        * and cannot exceed the capacity of the event dev.
>> +        * Configuring ports with different thresholds can make higher priority
>> +        * traffic less likely to  be backpressured.
>> +        * For example, a port used to inject NIC Rx packets into the event dev
>> +        * can have a lower threshold so as not to overwhelm the device,
>> +        * while ports used for worker pools can have a higher threshold.
>> +        * This value cannot exceed the *nb_events_limit*
>> +        * which was previously supplied to rte_event_dev_configure().
>> +        * This should be set to '-1' for *open system*.
>> +        */
>> +       uint16_t dequeue_depth;
>> +       /**< Configure number of bulk dequeues for this event port.
>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> +        * which previously supplied to rte_event_dev_configure().
>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>> +        */
>> +       uint16_t enqueue_depth;
>> +       /**< Configure number of bulk enqueues for this event port.
>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> +        * which previously supplied to rte_event_dev_configure().
>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>> +        */
>>         uint8_t disable_implicit_release;
>>         /**< Configure the port not to release outstanding events in
>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>                                 struct rte_event_port_conf *port_conf);
>>
>> +int
>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> +                               struct rte_event_port_conf_v20 *port_conf);
>> +
>> +int
>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> +                                     struct rte_event_port_conf *port_conf);
> 
> Hi Timothy,
> 
> + ABI Maintainers (Ray, Neil)
> 
> # As per my understanding, the structures can not be versioned, only
> function can be versioned.
> i.e we can not make any change to " struct rte_event_port_conf"

So the answer is (as always): depends

If the structure is being use in inline functions is when you run into trouble 
- as knowledge of the structure is embedded in the linked application. 

However if the structure is _strictly_ being used as a non-inlined function parameter,
It can be safe to version in this way. 

So just to be clear, it is still the function that is actually being versioned here.

> 
> # We have a similar case with ethdev and it deferred to next release v20.11
> http://patches.dpdk.org/patch/69113/

Yes - I spent a why looking at this one, but I am struggling to recall,
why when I looked it we didn't suggest function versioning as a potential solution in this case. 

Looking back at it now, looks like it would have been ok. 

> 
> Regarding the API changes:
> # The slow path changes general looks good to me. I will review the
> next level in the coming days
> # The following fast path changes bothers to me. Could you share more
> details on below change?
> 
> diff --git a/app/test-eventdev/test_order_atq.c
> b/app/test-eventdev/test_order_atq.c
> index 3366cfc..8246b96 100644
> --- a/app/test-eventdev/test_order_atq.c
> +++ b/app/test-eventdev/test_order_atq.c
> @@ -34,6 +34,8 @@
>                         continue;
>                 }
> 
> +               ev.flow_id = ev.mbuf->udata64;
> +
> # Since RC1 is near, I am not sure how to accommodate the API changes
> now and sort out ABI stuffs.
> # Other concern is eventdev spec get bloated with versioning files
> just for ONE release as 20.11 will be OK to change the ABI.
> # While we discuss the API change, Please send deprecation notice for
> ABI change for 20.11,
> so that there is no ambiguity of this patch for the 20.11 release.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:22  0%     ` Kinsella, Ray
@ 2020-06-30 11:30  0%       ` Jerin Jacob
  2020-06-30 11:36  0%         ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 11:30 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry

On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
>
>
> On 27/06/2020 08:44, Jerin Jacob wrote:
> >> +
> >> +/** Event port configuration structure */
> >> +struct rte_event_port_conf_v20 {
> >> +       int32_t new_event_threshold;
> >> +       /**< A backpressure threshold for new event enqueues on this port.
> >> +        * Use for *closed system* event dev where event capacity is limited,
> >> +        * and cannot exceed the capacity of the event dev.
> >> +        * Configuring ports with different thresholds can make higher priority
> >> +        * traffic less likely to  be backpressured.
> >> +        * For example, a port used to inject NIC Rx packets into the event dev
> >> +        * can have a lower threshold so as not to overwhelm the device,
> >> +        * while ports used for worker pools can have a higher threshold.
> >> +        * This value cannot exceed the *nb_events_limit*
> >> +        * which was previously supplied to rte_event_dev_configure().
> >> +        * This should be set to '-1' for *open system*.
> >> +        */
> >> +       uint16_t dequeue_depth;
> >> +       /**< Configure number of bulk dequeues for this event port.
> >> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> +        * which previously supplied to rte_event_dev_configure().
> >> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >> +        */
> >> +       uint16_t enqueue_depth;
> >> +       /**< Configure number of bulk enqueues for this event port.
> >> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> +        * which previously supplied to rte_event_dev_configure().
> >> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >> +        */
> >>         uint8_t disable_implicit_release;
> >>         /**< Configure the port not to release outstanding events in
> >>          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >>                                 struct rte_event_port_conf *port_conf);
> >>
> >> +int
> >> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> +                               struct rte_event_port_conf_v20 *port_conf);
> >> +
> >> +int
> >> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> +                                     struct rte_event_port_conf *port_conf);
> >
> > Hi Timothy,
> >
> > + ABI Maintainers (Ray, Neil)
> >
> > # As per my understanding, the structures can not be versioned, only
> > function can be versioned.
> > i.e we can not make any change to " struct rte_event_port_conf"
>
> So the answer is (as always): depends
>
> If the structure is being use in inline functions is when you run into trouble
> - as knowledge of the structure is embedded in the linked application.
>
> However if the structure is _strictly_ being used as a non-inlined function parameter,
> It can be safe to version in this way.

But based on the optimization applied when building the consumer code
matters. Right?
i.e compiler can "inline" it, based on the optimization even the
source code explicitly mentions it.


>
> So just to be clear, it is still the function that is actually being versioned here.
>
> >
> > # We have a similar case with ethdev and it deferred to next release v20.11
> > http://patches.dpdk.org/patch/69113/
>
> Yes - I spent a why looking at this one, but I am struggling to recall,
> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>
> Looking back at it now, looks like it would have been ok.

Ok.

>
> >
> > Regarding the API changes:
> > # The slow path changes general looks good to me. I will review the
> > next level in the coming days
> > # The following fast path changes bothers to me. Could you share more
> > details on below change?
> >
> > diff --git a/app/test-eventdev/test_order_atq.c
> > b/app/test-eventdev/test_order_atq.c
> > index 3366cfc..8246b96 100644
> > --- a/app/test-eventdev/test_order_atq.c
> > +++ b/app/test-eventdev/test_order_atq.c
> > @@ -34,6 +34,8 @@
> >                         continue;
> >                 }
> >
> > +               ev.flow_id = ev.mbuf->udata64;
> > +
> > # Since RC1 is near, I am not sure how to accommodate the API changes
> > now and sort out ABI stuffs.
> > # Other concern is eventdev spec get bloated with versioning files
> > just for ONE release as 20.11 will be OK to change the ABI.
> > # While we discuss the API change, Please send deprecation notice for
> > ABI change for 20.11,
> > so that there is no ambiguity of this patch for the 20.11 release.
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:30  0%       ` Jerin Jacob
@ 2020-06-30 11:36  0%         ` Kinsella, Ray
  2020-06-30 12:14  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-06-30 11:36 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry



On 30/06/2020 12:30, Jerin Jacob wrote:
> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>>
>>
>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>> +
>>>> +/** Event port configuration structure */
>>>> +struct rte_event_port_conf_v20 {
>>>> +       int32_t new_event_threshold;
>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>> +        * Use for *closed system* event dev where event capacity is limited,
>>>> +        * and cannot exceed the capacity of the event dev.
>>>> +        * Configuring ports with different thresholds can make higher priority
>>>> +        * traffic less likely to  be backpressured.
>>>> +        * For example, a port used to inject NIC Rx packets into the event dev
>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>> +        * while ports used for worker pools can have a higher threshold.
>>>> +        * This value cannot exceed the *nb_events_limit*
>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>> +        * This should be set to '-1' for *open system*.
>>>> +        */
>>>> +       uint16_t dequeue_depth;
>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>> +        * which previously supplied to rte_event_dev_configure().
>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>> +        */
>>>> +       uint16_t enqueue_depth;
>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>> +        * which previously supplied to rte_event_dev_configure().
>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>> +        */
>>>>         uint8_t disable_implicit_release;
>>>>         /**< Configure the port not to release outstanding events in
>>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>                                 struct rte_event_port_conf *port_conf);
>>>>
>>>> +int
>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>> +
>>>> +int
>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>> +                                     struct rte_event_port_conf *port_conf);
>>>
>>> Hi Timothy,
>>>
>>> + ABI Maintainers (Ray, Neil)
>>>
>>> # As per my understanding, the structures can not be versioned, only
>>> function can be versioned.
>>> i.e we can not make any change to " struct rte_event_port_conf"
>>
>> So the answer is (as always): depends
>>
>> If the structure is being use in inline functions is when you run into trouble
>> - as knowledge of the structure is embedded in the linked application.
>>
>> However if the structure is _strictly_ being used as a non-inlined function parameter,
>> It can be safe to version in this way.
> 
> But based on the optimization applied when building the consumer code
> matters. Right?
> i.e compiler can "inline" it, based on the optimization even the
> source code explicitly mentions it.

Well a compiler will typically only inline within the confines of a given object file, or 
binary, if LTO is enabled. 

If a function symbol is exported from library however, it won't be inlined in a linked application. 
The compiler doesn't have enough information to inline it. 
All the compiler will know about it is it's offset in memory, and it's signature. 

> 
> 
>>
>> So just to be clear, it is still the function that is actually being versioned here.
>>
>>>
>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>> http://patches.dpdk.org/patch/69113/
>>
>> Yes - I spent a why looking at this one, but I am struggling to recall,
>> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>>
>> Looking back at it now, looks like it would have been ok.
> 
> Ok.
> 
>>
>>>
>>> Regarding the API changes:
>>> # The slow path changes general looks good to me. I will review the
>>> next level in the coming days
>>> # The following fast path changes bothers to me. Could you share more
>>> details on below change?
>>>
>>> diff --git a/app/test-eventdev/test_order_atq.c
>>> b/app/test-eventdev/test_order_atq.c
>>> index 3366cfc..8246b96 100644
>>> --- a/app/test-eventdev/test_order_atq.c
>>> +++ b/app/test-eventdev/test_order_atq.c
>>> @@ -34,6 +34,8 @@
>>>                         continue;
>>>                 }
>>>
>>> +               ev.flow_id = ev.mbuf->udata64;
>>> +
>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>> now and sort out ABI stuffs.
>>> # Other concern is eventdev spec get bloated with versioning files
>>> just for ONE release as 20.11 will be OK to change the ABI.
>>> # While we discuss the API change, Please send deprecation notice for
>>> ABI change for 20.11,
>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 11:36  0%         ` Kinsella, Ray
@ 2020-06-30 12:14  0%           ` Jerin Jacob
  2020-07-02 15:21  0%             ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 12:14 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry

On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
>
>
> On 30/06/2020 12:30, Jerin Jacob wrote:
> > On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
> >>
> >>
> >>
> >> On 27/06/2020 08:44, Jerin Jacob wrote:
> >>>> +
> >>>> +/** Event port configuration structure */
> >>>> +struct rte_event_port_conf_v20 {
> >>>> +       int32_t new_event_threshold;
> >>>> +       /**< A backpressure threshold for new event enqueues on this port.
> >>>> +        * Use for *closed system* event dev where event capacity is limited,
> >>>> +        * and cannot exceed the capacity of the event dev.
> >>>> +        * Configuring ports with different thresholds can make higher priority
> >>>> +        * traffic less likely to  be backpressured.
> >>>> +        * For example, a port used to inject NIC Rx packets into the event dev
> >>>> +        * can have a lower threshold so as not to overwhelm the device,
> >>>> +        * while ports used for worker pools can have a higher threshold.
> >>>> +        * This value cannot exceed the *nb_events_limit*
> >>>> +        * which was previously supplied to rte_event_dev_configure().
> >>>> +        * This should be set to '-1' for *open system*.
> >>>> +        */
> >>>> +       uint16_t dequeue_depth;
> >>>> +       /**< Configure number of bulk dequeues for this event port.
> >>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >>>> +        * which previously supplied to rte_event_dev_configure().
> >>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >>>> +        */
> >>>> +       uint16_t enqueue_depth;
> >>>> +       /**< Configure number of bulk enqueues for this event port.
> >>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >>>> +        * which previously supplied to rte_event_dev_configure().
> >>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
> >>>> +        */
> >>>>         uint8_t disable_implicit_release;
> >>>>         /**< Configure the port not to release outstanding events in
> >>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
> >>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >>>>                                 struct rte_event_port_conf *port_conf);
> >>>>
> >>>> +int
> >>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >>>> +                               struct rte_event_port_conf_v20 *port_conf);
> >>>> +
> >>>> +int
> >>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >>>> +                                     struct rte_event_port_conf *port_conf);
> >>>
> >>> Hi Timothy,
> >>>
> >>> + ABI Maintainers (Ray, Neil)
> >>>
> >>> # As per my understanding, the structures can not be versioned, only
> >>> function can be versioned.
> >>> i.e we can not make any change to " struct rte_event_port_conf"
> >>
> >> So the answer is (as always): depends
> >>
> >> If the structure is being use in inline functions is when you run into trouble
> >> - as knowledge of the structure is embedded in the linked application.
> >>
> >> However if the structure is _strictly_ being used as a non-inlined function parameter,
> >> It can be safe to version in this way.
> >
> > But based on the optimization applied when building the consumer code
> > matters. Right?
> > i.e compiler can "inline" it, based on the optimization even the
> > source code explicitly mentions it.
>
> Well a compiler will typically only inline within the confines of a given object file, or
> binary, if LTO is enabled.

>
> If a function symbol is exported from library however, it won't be inlined in a linked application.

Yes, With respect to that function.
But the application can use struct rte_event_port_conf in their code
and it can be part of other structures.
Right?


> The compiler doesn't have enough information to inline it.
> All the compiler will know about it is it's offset in memory, and it's signature.
>
> >
> >
> >>
> >> So just to be clear, it is still the function that is actually being versioned here.
> >>
> >>>
> >>> # We have a similar case with ethdev and it deferred to next release v20.11
> >>> http://patches.dpdk.org/patch/69113/
> >>
> >> Yes - I spent a why looking at this one, but I am struggling to recall,
> >> why when I looked it we didn't suggest function versioning as a potential solution in this case.
> >>
> >> Looking back at it now, looks like it would have been ok.
> >
> > Ok.
> >
> >>
> >>>
> >>> Regarding the API changes:
> >>> # The slow path changes general looks good to me. I will review the
> >>> next level in the coming days
> >>> # The following fast path changes bothers to me. Could you share more
> >>> details on below change?
> >>>
> >>> diff --git a/app/test-eventdev/test_order_atq.c
> >>> b/app/test-eventdev/test_order_atq.c
> >>> index 3366cfc..8246b96 100644
> >>> --- a/app/test-eventdev/test_order_atq.c
> >>> +++ b/app/test-eventdev/test_order_atq.c
> >>> @@ -34,6 +34,8 @@
> >>>                         continue;
> >>>                 }
> >>>
> >>> +               ev.flow_id = ev.mbuf->udata64;
> >>> +
> >>> # Since RC1 is near, I am not sure how to accommodate the API changes
> >>> now and sort out ABI stuffs.
> >>> # Other concern is eventdev spec get bloated with versioning files
> >>> just for ONE release as 20.11 will be OK to change the ABI.
> >>> # While we discuss the API change, Please send deprecation notice for
> >>> ABI change for 20.11,
> >>> so that there is no ambiguity of this patch for the 20.11 release.
> >>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30  4:21  0%       ` Jerin Jacob
@ 2020-06-30 15:37  0%         ` McDaniel, Timothy
  2020-06-30 15:57  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: McDaniel, Timothy @ 2020-06-30 15:37 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Monday, June 29, 2020 11:21 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> -----Original Message-----
>> From: Jerin Jacob <jerinjacobk@gmail.com>
>> Sent: Saturday, June 27, 2020 2:45 AM
>> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>>
>> > +
>> > +/** Event port configuration structure */
>> > +struct rte_event_port_conf_v20 {
>> > +       int32_t new_event_threshold;
>> > +       /**< A backpressure threshold for new event enqueues on this port.
>> > +        * Use for *closed system* event dev where event capacity is limited,
>> > +        * and cannot exceed the capacity of the event dev.
>> > +        * Configuring ports with different thresholds can make higher priority
>> > +        * traffic less likely to  be backpressured.
>> > +        * For example, a port used to inject NIC Rx packets into the event dev
>> > +        * can have a lower threshold so as not to overwhelm the device,
>> > +        * while ports used for worker pools can have a higher threshold.
>> > +        * This value cannot exceed the *nb_events_limit*
>> > +        * which was previously supplied to rte_event_dev_configure().
>> > +        * This should be set to '-1' for *open system*.
>> > +        */
>> > +       uint16_t dequeue_depth;
>> > +       /**< Configure number of bulk dequeues for this event port.
>> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> > +        * which previously supplied to rte_event_dev_configure().
>> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>> > +        */
>> > +       uint16_t enqueue_depth;
>> > +       /**< Configure number of bulk enqueues for this event port.
>> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> > +        * which previously supplied to rte_event_dev_configure().
>> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>> > +        */
>> >         uint8_t disable_implicit_release;
>> >         /**< Configure the port not to release outstanding events in
>> >          * rte_event_dev_dequeue_burst(). If true, all events received through
>> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >                                 struct rte_event_port_conf *port_conf);
>> >
>> > +int
>> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> > +                               struct rte_event_port_conf_v20 *port_conf);
>> > +
>> > +int
>> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> > +                                     struct rte_event_port_conf *port_conf);
>>
>> Hi Timothy,
>>
>> + ABI Maintainers (Ray, Neil)
>>
>> # As per my understanding, the structures can not be versioned, only
>> function can be versioned.
>> i.e we can not make any change to " struct rte_event_port_conf"
>>
>> # We have a similar case with ethdev and it deferred to next release v20.11
>> http://patches.dpdk.org/patch/69113/
>>
>> Regarding the API changes:
>> # The slow path changes general looks good to me. I will review the
>> next level in the coming days
>> # The following fast path changes bothers to me. Could you share more
>> details on below change?
>>
>> diff --git a/app/test-eventdev/test_order_atq.c
>> b/app/test-eventdev/test_order_atq.c
>> index 3366cfc..8246b96 100644
>> --- a/app/test-eventdev/test_order_atq.c
>> +++ b/app/test-eventdev/test_order_atq.c
>> @@ -34,6 +34,8 @@
>>                         continue;
>>                 }
>>
>> +               ev.flow_id = ev.mbuf->udata64;
>> +
>> # Since RC1 is near, I am not sure how to accommodate the API changes
>> now and sort out ABI stuffs.
>> # Other concern is eventdev spec get bloated with versioning files
>> just for ONE release as 20.11 will be OK to change the ABI.
>> # While we discuss the API change, Please send deprecation notice for
>> ABI change for 20.11,
>> so that there is no ambiguity of this patch for the 20.11 release.
>>
>> Hello Jerin,
>>
>> Thank you for the review comments.
>>
>> With regard to your comments regarding the fast path flow_id change, the Intel
>DLB hardware
>> is not capable of transferring the flow_id as part of the event itself. We
>therefore require a mechanism
>> to accomplish this. What we have done to work around this is to require the
>application to embed the flow_id
>> within the data payload. The new flag, #define
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> by applications to determine if they need to embed the flow_id, or if its
>automatically propagated and present in the
>> received event.
>>
>> What we should have done is to wrap the assignment with a conditional.
>>
>> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>         ev.flow_id = ev.mbuf->udata64;
>
>Two problems with this approach,
>1) we are assuming mbuf udata64 field is available for DLB driver
>2) It won't work with another adapter, eventdev has no dependency with mbuf
>

This snippet is not intended to suggest that udata64 always be used to store the flow ID, but as an example of how an application could do it. Some applications won’t need to carry the flow ID through; others can select an unused field in the event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate the flow ID in pipeline stages that require it.

>Question:
>1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>only event pointer and not have any other metadata like schedule_type
>etc.
>

The DLB device provides a 16B “queue entry” that consists of:

*	8B event data
*	Queue ID
*	Priority
*	Scheduling type
*	19 bits of carried-through data
*	Assorted error/debug/reserved bits that are set by the device (not carried-through)

 For the carried-through 19b, we use 12b for event_type and sub_event_type.

>
>>
>> This would minimize/eliminate any performance impact due to the processor's
>branch prediction logic.
>> The assignment then becomes in essence a NOOP for all event devices that are
>capable of carrying the flow_id as part of the event payload itself.
>>
>> Thanks,
>> Tim
>>
>>
>>
>> Thanks,
>> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 15:37  0%         ` McDaniel, Timothy
@ 2020-06-30 15:57  0%           ` Jerin Jacob
  2020-06-30 19:26  0%             ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-06-30 15:57 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> >-----Original Message-----
> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >Sent: Monday, June 29, 2020 11:21 PM
> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >
> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
> ><timothy.mcdaniel@intel.com> wrote:
> >>
> >> -----Original Message-----
> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> Sent: Saturday, June 27, 2020 2:45 AM
> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >>
> >> > +
> >> > +/** Event port configuration structure */
> >> > +struct rte_event_port_conf_v20 {
> >> > +       int32_t new_event_threshold;
> >> > +       /**< A backpressure threshold for new event enqueues on this port.
> >> > +        * Use for *closed system* event dev where event capacity is limited,
> >> > +        * and cannot exceed the capacity of the event dev.
> >> > +        * Configuring ports with different thresholds can make higher priority
> >> > +        * traffic less likely to  be backpressured.
> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
> >> > +        * can have a lower threshold so as not to overwhelm the device,
> >> > +        * while ports used for worker pools can have a higher threshold.
> >> > +        * This value cannot exceed the *nb_events_limit*
> >> > +        * which was previously supplied to rte_event_dev_configure().
> >> > +        * This should be set to '-1' for *open system*.
> >> > +        */
> >> > +       uint16_t dequeue_depth;
> >> > +       /**< Configure number of bulk dequeues for this event port.
> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> > +        * which previously supplied to rte_event_dev_configure().
> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >capable.
> >> > +        */
> >> > +       uint16_t enqueue_depth;
> >> > +       /**< Configure number of bulk enqueues for this event port.
> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> > +        * which previously supplied to rte_event_dev_configure().
> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >capable.
> >> > +        */
> >> >         uint8_t disable_implicit_release;
> >> >         /**< Configure the port not to release outstanding events in
> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >> >                                 struct rte_event_port_conf *port_conf);
> >> >
> >> > +int
> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> > +                               struct rte_event_port_conf_v20 *port_conf);
> >> > +
> >> > +int
> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> > +                                     struct rte_event_port_conf *port_conf);
> >>
> >> Hi Timothy,
> >>
> >> + ABI Maintainers (Ray, Neil)
> >>
> >> # As per my understanding, the structures can not be versioned, only
> >> function can be versioned.
> >> i.e we can not make any change to " struct rte_event_port_conf"
> >>
> >> # We have a similar case with ethdev and it deferred to next release v20.11
> >> http://patches.dpdk.org/patch/69113/
> >>
> >> Regarding the API changes:
> >> # The slow path changes general looks good to me. I will review the
> >> next level in the coming days
> >> # The following fast path changes bothers to me. Could you share more
> >> details on below change?
> >>
> >> diff --git a/app/test-eventdev/test_order_atq.c
> >> b/app/test-eventdev/test_order_atq.c
> >> index 3366cfc..8246b96 100644
> >> --- a/app/test-eventdev/test_order_atq.c
> >> +++ b/app/test-eventdev/test_order_atq.c
> >> @@ -34,6 +34,8 @@
> >>                         continue;
> >>                 }
> >>
> >> +               ev.flow_id = ev.mbuf->udata64;
> >> +
> >> # Since RC1 is near, I am not sure how to accommodate the API changes
> >> now and sort out ABI stuffs.
> >> # Other concern is eventdev spec get bloated with versioning files
> >> just for ONE release as 20.11 will be OK to change the ABI.
> >> # While we discuss the API change, Please send deprecation notice for
> >> ABI change for 20.11,
> >> so that there is no ambiguity of this patch for the 20.11 release.
> >>
> >> Hello Jerin,
> >>
> >> Thank you for the review comments.
> >>
> >> With regard to your comments regarding the fast path flow_id change, the Intel
> >DLB hardware
> >> is not capable of transferring the flow_id as part of the event itself. We
> >therefore require a mechanism
> >> to accomplish this. What we have done to work around this is to require the
> >application to embed the flow_id
> >> within the data payload. The new flag, #define
> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> >> by applications to determine if they need to embed the flow_id, or if its
> >automatically propagated and present in the
> >> received event.
> >>
> >> What we should have done is to wrap the assignment with a conditional.
> >>
> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
> >>         ev.flow_id = ev.mbuf->udata64;
> >
> >Two problems with this approach,
> >1) we are assuming mbuf udata64 field is available for DLB driver
> >2) It won't work with another adapter, eventdev has no dependency with mbuf
> >
>
> This snippet is not intended to suggest that udata64 always be used to store the flow ID, but as an example of how an application could do it. Some applications won’t need to carry the flow ID through; others can select an unused field in the event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate the flow ID in pipeline stages that require it.

OK.
>
> >Question:
> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
> >only event pointer and not have any other metadata like schedule_type
> >etc.
> >
>
> The DLB device provides a 16B “queue entry” that consists of:
>
> *       8B event data
> *       Queue ID
> *       Priority
> *       Scheduling type
> *       19 bits of carried-through data
> *       Assorted error/debug/reserved bits that are set by the device (not carried-through)
>
>  For the carried-through 19b, we use 12b for event_type and sub_event_type.

I can only think of TWO options to help
1) Since event pointer always cache aligned, You could grab LSB
6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
structure
2) Have separate mempool driver using existing drivers, ie "event
pointer" + or - some offset have any amount of custom data.


>
> >
> >>
> >> This would minimize/eliminate any performance impact due to the processor's
> >branch prediction logic.

I think, If we need to change common fastpath, better we need to make
it template to create code for compile-time to have absolute zero
overhead
and use runtime.
See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
_create_ worker at compile time based on runtime capability.



> >> The assignment then becomes in essence a NOOP for all event devices that are
> >capable of carrying the flow_id as part of the event payload itself.
> >>
> >> Thanks,
> >> Tim
> >>
> >>
> >>
> >> Thanks,
> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 15:57  0%           ` Jerin Jacob
@ 2020-06-30 19:26  0%             ` McDaniel, Timothy
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
  2020-07-01  4:50  3%               ` Jerin Jacob
  0 siblings, 2 replies; 200+ results
From: McDaniel, Timothy @ 2020-06-30 19:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Tuesday, June 30, 2020 10:58 AM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> >-----Original Message-----
>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >Sent: Monday, June 29, 2020 11:21 PM
>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >
>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>> ><timothy.mcdaniel@intel.com> wrote:
>> >>
>> >> -----Original Message-----
>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> Sent: Saturday, June 27, 2020 2:45 AM
>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >>
>> >> > +
>> >> > +/** Event port configuration structure */
>> >> > +struct rte_event_port_conf_v20 {
>> >> > +       int32_t new_event_threshold;
>> >> > +       /**< A backpressure threshold for new event enqueues on this port.
>> >> > +        * Use for *closed system* event dev where event capacity is limited,
>> >> > +        * and cannot exceed the capacity of the event dev.
>> >> > +        * Configuring ports with different thresholds can make higher priority
>> >> > +        * traffic less likely to  be backpressured.
>> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
>> >> > +        * can have a lower threshold so as not to overwhelm the device,
>> >> > +        * while ports used for worker pools can have a higher threshold.
>> >> > +        * This value cannot exceed the *nb_events_limit*
>> >> > +        * which was previously supplied to rte_event_dev_configure().
>> >> > +        * This should be set to '-1' for *open system*.
>> >> > +        */
>> >> > +       uint16_t dequeue_depth;
>> >> > +       /**< Configure number of bulk dequeues for this event port.
>> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>> >capable.
>> >> > +        */
>> >> > +       uint16_t enqueue_depth;
>> >> > +       /**< Configure number of bulk enqueues for this event port.
>> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>> >capable.
>> >> > +        */
>> >> >         uint8_t disable_implicit_release;
>> >> >         /**< Configure the port not to release outstanding events in
>> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >> >                                 struct rte_event_port_conf *port_conf);
>> >> >
>> >> > +int
>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>> >> > +
>> >> > +int
>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> >> > +                                     struct rte_event_port_conf *port_conf);
>> >>
>> >> Hi Timothy,
>> >>
>> >> + ABI Maintainers (Ray, Neil)
>> >>
>> >> # As per my understanding, the structures can not be versioned, only
>> >> function can be versioned.
>> >> i.e we can not make any change to " struct rte_event_port_conf"
>> >>
>> >> # We have a similar case with ethdev and it deferred to next release v20.11
>> >> http://patches.dpdk.org/patch/69113/
>> >>
>> >> Regarding the API changes:
>> >> # The slow path changes general looks good to me. I will review the
>> >> next level in the coming days
>> >> # The following fast path changes bothers to me. Could you share more
>> >> details on below change?
>> >>
>> >> diff --git a/app/test-eventdev/test_order_atq.c
>> >> b/app/test-eventdev/test_order_atq.c
>> >> index 3366cfc..8246b96 100644
>> >> --- a/app/test-eventdev/test_order_atq.c
>> >> +++ b/app/test-eventdev/test_order_atq.c
>> >> @@ -34,6 +34,8 @@
>> >>                         continue;
>> >>                 }
>> >>
>> >> +               ev.flow_id = ev.mbuf->udata64;
>> >> +
>> >> # Since RC1 is near, I am not sure how to accommodate the API changes
>> >> now and sort out ABI stuffs.
>> >> # Other concern is eventdev spec get bloated with versioning files
>> >> just for ONE release as 20.11 will be OK to change the ABI.
>> >> # While we discuss the API change, Please send deprecation notice for
>> >> ABI change for 20.11,
>> >> so that there is no ambiguity of this patch for the 20.11 release.
>> >>
>> >> Hello Jerin,
>> >>
>> >> Thank you for the review comments.
>> >>
>> >> With regard to your comments regarding the fast path flow_id change, the
>Intel
>> >DLB hardware
>> >> is not capable of transferring the flow_id as part of the event itself. We
>> >therefore require a mechanism
>> >> to accomplish this. What we have done to work around this is to require the
>> >application to embed the flow_id
>> >> within the data payload. The new flag, #define
>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> >> by applications to determine if they need to embed the flow_id, or if its
>> >automatically propagated and present in the
>> >> received event.
>> >>
>> >> What we should have done is to wrap the assignment with a conditional.
>> >>
>> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>> >>         ev.flow_id = ev.mbuf->udata64;
>> >
>> >Two problems with this approach,
>> >1) we are assuming mbuf udata64 field is available for DLB driver
>> >2) It won't work with another adapter, eventdev has no dependency with mbuf
>> >
>>
>> This snippet is not intended to suggest that udata64 always be used to store the
>flow ID, but as an example of how an application could do it. Some applications
>won’t need to carry the flow ID through; others can select an unused field in the
>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate
>the flow ID in pipeline stages that require it.
>
>OK.
>>
>> >Question:
>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>> >only event pointer and not have any other metadata like schedule_type
>> >etc.
>> >
>>
>> The DLB device provides a 16B “queue entry” that consists of:
>>
>> *       8B event data
>> *       Queue ID
>> *       Priority
>> *       Scheduling type
>> *       19 bits of carried-through data
>> *       Assorted error/debug/reserved bits that are set by the device (not carried-
>through)
>>
>>  For the carried-through 19b, we use 12b for event_type and sub_event_type.
>
>I can only think of TWO options to help
>1) Since event pointer always cache aligned, You could grab LSB
>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>structure
>2) Have separate mempool driver using existing drivers, ie "event
>pointer" + or - some offset have any amount of custom data.
>

We can't guarantee that the event will contain a pointer -- it's possible that 8B is inline data (i.e. struct rte_event's u64 field).

It's really an application decision -- for example an app could allocate space in the 'mbuf private data' to store the flow ID, if the event device lacks that carry-flow-ID capability and the other mbuf fields can't be used for whatever reason.
We modified the tests, sample apps to show how this might be done, not necessarily how it must be done.

>
>>
>> >
>> >>
>> >> This would minimize/eliminate any performance impact due to the
>processor's
>> >branch prediction logic.
>
>I think, If we need to change common fastpath, better we need to make
>it template to create code for compile-time to have absolute zero
>overhead
>and use runtime.
>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>_create_ worker at compile time based on runtime capability.
>

Yes, that would be perfect.  Thanks for the example!

>
>
>> >> The assignment then becomes in essence a NOOP for all event devices that
>are
>> >capable of carrying the flow_id as part of the event payload itself.
>> >>
>> >> Thanks,
>> >> Tim
>> >>
>> >>
>> >>
>> >> Thanks,
>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 19:26  0%             ` McDaniel, Timothy
@ 2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
  2020-06-30 21:07  0%                 ` McDaniel, Timothy
  2020-07-01  4:50  3%               ` Jerin Jacob
  1 sibling, 1 reply; 200+ results
From: Pavan Nikhilesh Bhagavatula @ 2020-06-30 20:40 UTC (permalink / raw)
  To: McDaniel, Timothy, Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob Kollanukkaran,
	Mattias Rönnblom, dpdk-dev, Eads, Gage, Van Haaren, Harry



>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of McDaniel, Timothy
>Sent: Wednesday, July 1, 2020 12:57 AM
>To: Jerin Jacob <jerinjacobk@gmail.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>; Jerin Jacob Kollanukkaran
><jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage <gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>
>>-----Original Message-----
>>From: Jerin Jacob <jerinjacobk@gmail.com>
>>Sent: Tuesday, June 30, 2020 10:58 AM
>>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>
>>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>><timothy.mcdaniel@intel.com> wrote:
>>>
>>> >-----Original Message-----
>>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>>> >Sent: Monday, June 29, 2020 11:21 PM
>>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>Eads, Gage
>>> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>> >
>>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>>> ><timothy.mcdaniel@intel.com> wrote:
>>> >>
>>> >> -----Original Message-----
>>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>>> >> Sent: Saturday, June 27, 2020 2:45 AM
>>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray
>Kinsella
>>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>Eads, Gage
>>> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>>> >>
>>> >> > +
>>> >> > +/** Event port configuration structure */
>>> >> > +struct rte_event_port_conf_v20 {
>>> >> > +       int32_t new_event_threshold;
>>> >> > +       /**< A backpressure threshold for new event enqueues on
>this port.
>>> >> > +        * Use for *closed system* event dev where event capacity
>is limited,
>>> >> > +        * and cannot exceed the capacity of the event dev.
>>> >> > +        * Configuring ports with different thresholds can make
>higher priority
>>> >> > +        * traffic less likely to  be backpressured.
>>> >> > +        * For example, a port used to inject NIC Rx packets into
>the event dev
>>> >> > +        * can have a lower threshold so as not to overwhelm the
>device,
>>> >> > +        * while ports used for worker pools can have a higher
>threshold.
>>> >> > +        * This value cannot exceed the *nb_events_limit*
>>> >> > +        * which was previously supplied to
>rte_event_dev_configure().
>>> >> > +        * This should be set to '-1' for *open system*.
>>> >> > +        */
>>> >> > +       uint16_t dequeue_depth;
>>> >> > +       /**< Configure number of bulk dequeues for this event
>port.
>>> >> > +        * This value cannot exceed the
>*nb_event_port_dequeue_depth*
>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>>> >capable.
>>> >> > +        */
>>> >> > +       uint16_t enqueue_depth;
>>> >> > +       /**< Configure number of bulk enqueues for this event
>port.
>>> >> > +        * This value cannot exceed the
>*nb_event_port_enqueue_depth*
>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>>> >capable.
>>> >> > +        */
>>> >> >         uint8_t disable_implicit_release;
>>> >> >         /**< Configure the port not to release outstanding events
>in
>>> >> >          * rte_event_dev_dequeue_burst(). If true, all events
>received through
>>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t
>port_id,
>>> >> >                                 struct rte_event_port_conf *port_conf);
>>> >> >
>>> >> > +int
>>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t
>port_id,
>>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>>> >> > +
>>> >> > +int
>>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t
>port_id,
>>> >> > +                                     struct rte_event_port_conf *port_conf);
>>> >>
>>> >> Hi Timothy,
>>> >>
>>> >> + ABI Maintainers (Ray, Neil)
>>> >>
>>> >> # As per my understanding, the structures can not be versioned,
>only
>>> >> function can be versioned.
>>> >> i.e we can not make any change to " struct rte_event_port_conf"
>>> >>
>>> >> # We have a similar case with ethdev and it deferred to next
>release v20.11
>>> >> https://urldefense.proofpoint.com/v2/url?u=http-
>3A__patches.dpdk.org_patch_69113_&d=DwIGaQ&c=nKjWec2b6R0mO
>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=lL7
>dDlN7ICIpENvIB7_El27UclXA2tJLdOwbsirg1Dw&s=CNmSXDDn28U-
>OjEAaZgJI_A2fDmMKM6zb12sIE9L-Io&e=
>>> >>
>>> >> Regarding the API changes:
>>> >> # The slow path changes general looks good to me. I will review
>the
>>> >> next level in the coming days
>>> >> # The following fast path changes bothers to me. Could you share
>more
>>> >> details on below change?
>>> >>
>>> >> diff --git a/app/test-eventdev/test_order_atq.c
>>> >> b/app/test-eventdev/test_order_atq.c
>>> >> index 3366cfc..8246b96 100644
>>> >> --- a/app/test-eventdev/test_order_atq.c
>>> >> +++ b/app/test-eventdev/test_order_atq.c
>>> >> @@ -34,6 +34,8 @@
>>> >>                         continue;
>>> >>                 }
>>> >>
>>> >> +               ev.flow_id = ev.mbuf->udata64;
>>> >> +
>>> >> # Since RC1 is near, I am not sure how to accommodate the API
>changes
>>> >> now and sort out ABI stuffs.
>>> >> # Other concern is eventdev spec get bloated with versioning files
>>> >> just for ONE release as 20.11 will be OK to change the ABI.
>>> >> # While we discuss the API change, Please send deprecation
>notice for
>>> >> ABI change for 20.11,
>>> >> so that there is no ambiguity of this patch for the 20.11 release.
>>> >>
>>> >> Hello Jerin,
>>> >>
>>> >> Thank you for the review comments.
>>> >>
>>> >> With regard to your comments regarding the fast path flow_id
>change, the
>>Intel
>>> >DLB hardware
>>> >> is not capable of transferring the flow_id as part of the event
>itself. We
>>> >therefore require a mechanism
>>> >> to accomplish this. What we have done to work around this is to
>require the
>>> >application to embed the flow_id
>>> >> within the data payload. The new flag, #define
>>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>>> >> by applications to determine if they need to embed the flow_id,
>or if its
>>> >automatically propagated and present in the
>>> >> received event.
>>> >>
>>> >> What we should have done is to wrap the assignment with a
>conditional.
>>> >>
>>> >> if (!(device_capability_flags &
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>> >>         ev.flow_id = ev.mbuf->udata64;
>>> >
>>> >Two problems with this approach,
>>> >1) we are assuming mbuf udata64 field is available for DLB driver
>>> >2) It won't work with another adapter, eventdev has no
>dependency with mbuf
>>> >
>>>
>>> This snippet is not intended to suggest that udata64 always be used
>to store the
>>flow ID, but as an example of how an application could do it. Some
>applications
>>won’t need to carry the flow ID through; others can select an unused
>field in the
>>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case)
>re-generate
>>the flow ID in pipeline stages that require it.
>>
>>OK.
>>>
>>> >Question:
>>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is
>it
>>> >only event pointer and not have any other metadata like
>schedule_type
>>> >etc.
>>> >
>>>
>>> The DLB device provides a 16B “queue entry” that consists of:
>>>
>>> *       8B event data
>>> *       Queue ID
>>> *       Priority
>>> *       Scheduling type
>>> *       19 bits of carried-through data
>>> *       Assorted error/debug/reserved bits that are set by the device
>(not carried-
>>through)
>>>
>>>  For the carried-through 19b, we use 12b for event_type and
>sub_event_type.
>>
>>I can only think of TWO options to help
>>1) Since event pointer always cache aligned, You could grab LSB
>>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>>structure
>>2) Have separate mempool driver using existing drivers, ie "event
>>pointer" + or - some offset have any amount of custom data.
>>
>
>We can't guarantee that the event will contain a pointer -- it's possible
>that 8B is inline data (i.e. struct rte_event's u64 field).
>
>It's really an application decision -- for example an app could allocate
>space in the 'mbuf private data' to store the flow ID, if the event device
>lacks that carry-flow-ID capability and the other mbuf fields can't be
>used for whatever reason.
>We modified the tests, sample apps to show how this might be done,
>not necessarily how it must be done.
>
>>
>>>
>>> >
>>> >>
>>> >> This would minimize/eliminate any performance impact due to
>the
>>processor's
>>> >branch prediction logic.
>>
>>I think, If we need to change common fastpath, better we need to
>make
>>it template to create code for compile-time to have absolute zero
>>overhead
>>and use runtime.
>>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>>_create_ worker at compile time based on runtime capability.
>>
>
>Yes, that would be perfect.  Thanks for the example!

Just to  add instead of having if and else using a jumptbl would be much cleaner
Ex.
	const pipeline_atq_worker_t pipeline_atq_worker_single_stage[2][2][2] = {
		[0][0] = pipeline_atq_worker_single_stage_fwd,
		[0][1] = pipeline_atq_worker_single_stage_tx,
		[1][0] = pipeline_atq_worker_single_stage_burst_fwd,
		[1][1] = pipeline_atq_worker_single_stage_burst_tx,
	};

		return (pipeline_atq_worker_single_stage[burst][internal_port])(arg);

>
>>
>>
>>> >> The assignment then becomes in essence a NOOP for all event
>devices that
>>are
>>> >capable of carrying the flow_id as part of the event payload itself.
>>> >>
>>> >> Thanks,
>>> >> Tim
>>> >>
>>> >>
>>> >>
>>> >> Thanks,
>>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
@ 2020-06-30 21:07  0%                 ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-06-30 21:07 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob Kollanukkaran,
	Mattias Rönnblom, dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
>Sent: Tuesday, June 30, 2020 3:40 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Jerin Jacob
><jerinjacobk@gmail.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>
>
>>-----Original Message-----
>>From: dev <dev-bounces@dpdk.org> On Behalf Of McDaniel, Timothy
>>Sent: Wednesday, July 1, 2020 12:57 AM
>>To: Jerin Jacob <jerinjacobk@gmail.com>
>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>; Jerin Jacob Kollanukkaran
>><jerinj@marvell.com>; Mattias Rönnblom
>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>>Gage <gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>
>>>-----Original Message-----
>>>From: Jerin Jacob <jerinjacobk@gmail.com>
>>>Sent: Tuesday, June 30, 2020 10:58 AM
>>>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>;
>>>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>>Gage
>>><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>
>>>On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>>><timothy.mcdaniel@intel.com> wrote:
>>>>
>>>> >-----Original Message-----
>>>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>>>> >Sent: Monday, June 29, 2020 11:21 PM
>>>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>>>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
>><nhorman@tuxdriver.com>;
>>>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>>Eads, Gage
>>>> ><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>> >
>>>> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>>>> ><timothy.mcdaniel@intel.com> wrote:
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>>>> >> Sent: Saturday, June 27, 2020 2:45 AM
>>>> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray
>>Kinsella
>>>> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>>>> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>>>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>;
>>Eads, Gage
>>>> ><gage.eads@intel.com>; Van Haaren, Harry
>><harry.van.haaren@intel.com>
>>>> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>>prerequisites
>>>> >>
>>>> >> > +
>>>> >> > +/** Event port configuration structure */
>>>> >> > +struct rte_event_port_conf_v20 {
>>>> >> > +       int32_t new_event_threshold;
>>>> >> > +       /**< A backpressure threshold for new event enqueues on
>>this port.
>>>> >> > +        * Use for *closed system* event dev where event capacity
>>is limited,
>>>> >> > +        * and cannot exceed the capacity of the event dev.
>>>> >> > +        * Configuring ports with different thresholds can make
>>higher priority
>>>> >> > +        * traffic less likely to  be backpressured.
>>>> >> > +        * For example, a port used to inject NIC Rx packets into
>>the event dev
>>>> >> > +        * can have a lower threshold so as not to overwhelm the
>>device,
>>>> >> > +        * while ports used for worker pools can have a higher
>>threshold.
>>>> >> > +        * This value cannot exceed the *nb_events_limit*
>>>> >> > +        * which was previously supplied to
>>rte_event_dev_configure().
>>>> >> > +        * This should be set to '-1' for *open system*.
>>>> >> > +        */
>>>> >> > +       uint16_t dequeue_depth;
>>>> >> > +       /**< Configure number of bulk dequeues for this event
>>port.
>>>> >> > +        * This value cannot exceed the
>>*nb_event_port_dequeue_depth*
>>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>>> >> > +        * Ignored when device is not
>>RTE_EVENT_DEV_CAP_BURST_MODE
>>>> >capable.
>>>> >> > +        */
>>>> >> > +       uint16_t enqueue_depth;
>>>> >> > +       /**< Configure number of bulk enqueues for this event
>>port.
>>>> >> > +        * This value cannot exceed the
>>*nb_event_port_enqueue_depth*
>>>> >> > +        * which previously supplied to rte_event_dev_configure().
>>>> >> > +        * Ignored when device is not
>>RTE_EVENT_DEV_CAP_BURST_MODE
>>>> >capable.
>>>> >> > +        */
>>>> >> >         uint8_t disable_implicit_release;
>>>> >> >         /**< Configure the port not to release outstanding events
>>in
>>>> >> >          * rte_event_dev_dequeue_burst(). If true, all events
>>received through
>>>> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> >                                 struct rte_event_port_conf *port_conf);
>>>> >> >
>>>> >> > +int
>>>> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>>>> >> > +
>>>> >> > +int
>>>> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t
>>port_id,
>>>> >> > +                                     struct rte_event_port_conf *port_conf);
>>>> >>
>>>> >> Hi Timothy,
>>>> >>
>>>> >> + ABI Maintainers (Ray, Neil)
>>>> >>
>>>> >> # As per my understanding, the structures can not be versioned,
>>only
>>>> >> function can be versioned.
>>>> >> i.e we can not make any change to " struct rte_event_port_conf"
>>>> >>
>>>> >> # We have a similar case with ethdev and it deferred to next
>>release v20.11
>>>> >> https://urldefense.proofpoint.com/v2/url?u=http-
>>3A__patches.dpdk.org_patch_69113_&d=DwIGaQ&c=nKjWec2b6R0mO
>>yPaz7xtfQ&r=1cjuAHrGh745jHNmj2fD85sUMIJ2IPIDsIJzo6FN6Z0&m=lL7
>>dDlN7ICIpENvIB7_El27UclXA2tJLdOwbsirg1Dw&s=CNmSXDDn28U-
>>OjEAaZgJI_A2fDmMKM6zb12sIE9L-Io&e=
>>>> >>
>>>> >> Regarding the API changes:
>>>> >> # The slow path changes general looks good to me. I will review
>>the
>>>> >> next level in the coming days
>>>> >> # The following fast path changes bothers to me. Could you share
>>more
>>>> >> details on below change?
>>>> >>
>>>> >> diff --git a/app/test-eventdev/test_order_atq.c
>>>> >> b/app/test-eventdev/test_order_atq.c
>>>> >> index 3366cfc..8246b96 100644
>>>> >> --- a/app/test-eventdev/test_order_atq.c
>>>> >> +++ b/app/test-eventdev/test_order_atq.c
>>>> >> @@ -34,6 +34,8 @@
>>>> >>                         continue;
>>>> >>                 }
>>>> >>
>>>> >> +               ev.flow_id = ev.mbuf->udata64;
>>>> >> +
>>>> >> # Since RC1 is near, I am not sure how to accommodate the API
>>changes
>>>> >> now and sort out ABI stuffs.
>>>> >> # Other concern is eventdev spec get bloated with versioning files
>>>> >> just for ONE release as 20.11 will be OK to change the ABI.
>>>> >> # While we discuss the API change, Please send deprecation
>>notice for
>>>> >> ABI change for 20.11,
>>>> >> so that there is no ambiguity of this patch for the 20.11 release.
>>>> >>
>>>> >> Hello Jerin,
>>>> >>
>>>> >> Thank you for the review comments.
>>>> >>
>>>> >> With regard to your comments regarding the fast path flow_id
>>change, the
>>>Intel
>>>> >DLB hardware
>>>> >> is not capable of transferring the flow_id as part of the event
>>itself. We
>>>> >therefore require a mechanism
>>>> >> to accomplish this. What we have done to work around this is to
>>require the
>>>> >application to embed the flow_id
>>>> >> within the data payload. The new flag, #define
>>>> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>>>> >> by applications to determine if they need to embed the flow_id,
>>or if its
>>>> >automatically propagated and present in the
>>>> >> received event.
>>>> >>
>>>> >> What we should have done is to wrap the assignment with a
>>conditional.
>>>> >>
>>>> >> if (!(device_capability_flags &
>>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>>>> >>         ev.flow_id = ev.mbuf->udata64;
>>>> >
>>>> >Two problems with this approach,
>>>> >1) we are assuming mbuf udata64 field is available for DLB driver
>>>> >2) It won't work with another adapter, eventdev has no
>>dependency with mbuf
>>>> >
>>>>
>>>> This snippet is not intended to suggest that udata64 always be used
>>to store the
>>>flow ID, but as an example of how an application could do it. Some
>>applications
>>>won’t need to carry the flow ID through; others can select an unused
>>field in the
>>>event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case)
>>re-generate
>>>the flow ID in pipeline stages that require it.
>>>
>>>OK.
>>>>
>>>> >Question:
>>>> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is
>>it
>>>> >only event pointer and not have any other metadata like
>>schedule_type
>>>> >etc.
>>>> >
>>>>
>>>> The DLB device provides a 16B “queue entry” that consists of:
>>>>
>>>> *       8B event data
>>>> *       Queue ID
>>>> *       Priority
>>>> *       Scheduling type
>>>> *       19 bits of carried-through data
>>>> *       Assorted error/debug/reserved bits that are set by the device
>>(not carried-
>>>through)
>>>>
>>>>  For the carried-through 19b, we use 12b for event_type and
>>sub_event_type.
>>>
>>>I can only think of TWO options to help
>>>1) Since event pointer always cache aligned, You could grab LSB
>>>6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>>>structure
>>>2) Have separate mempool driver using existing drivers, ie "event
>>>pointer" + or - some offset have any amount of custom data.
>>>
>>
>>We can't guarantee that the event will contain a pointer -- it's possible
>>that 8B is inline data (i.e. struct rte_event's u64 field).
>>
>>It's really an application decision -- for example an app could allocate
>>space in the 'mbuf private data' to store the flow ID, if the event device
>>lacks that carry-flow-ID capability and the other mbuf fields can't be
>>used for whatever reason.
>>We modified the tests, sample apps to show how this might be done,
>>not necessarily how it must be done.
>>
>>>
>>>>
>>>> >
>>>> >>
>>>> >> This would minimize/eliminate any performance impact due to
>>the
>>>processor's
>>>> >branch prediction logic.
>>>
>>>I think, If we need to change common fastpath, better we need to
>>make
>>>it template to create code for compile-time to have absolute zero
>>>overhead
>>>and use runtime.
>>>See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>>>_create_ worker at compile time based on runtime capability.
>>>
>>
>>Yes, that would be perfect.  Thanks for the example!
>
>Just to  add instead of having if and else using a jumptbl would be much cleaner
>Ex.
>	const pipeline_atq_worker_t pipeline_atq_worker_single_stage[2][2][2]
>= {
>		[0][0] = pipeline_atq_worker_single_stage_fwd,
>		[0][1] = pipeline_atq_worker_single_stage_tx,
>		[1][0] = pipeline_atq_worker_single_stage_burst_fwd,
>		[1][1] = pipeline_atq_worker_single_stage_burst_tx,
>	};
>
>		return
>(pipeline_atq_worker_single_stage[burst][internal_port])(arg);
>


Thank you for the suggestion.


>>
>>>
>>>
>>>> >> The assignment then becomes in essence a NOOP for all event
>>devices that
>>>are
>>>> >capable of carrying the flow_id as part of the event payload itself.
>>>> >>
>>>> >> Thanks,
>>>> >> Tim
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks,
>>>> >> Tim

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  @ 2020-06-30 23:15  0%       ` Honnappa Nagarahalli
  2020-07-01  7:27  0%         ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-06-30 23:15 UTC (permalink / raw)
  To: thomas, Jerin Jacob, Konstantin Ananyev, jerinj, Morten Brørup
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger,
	Honnappa Nagarahalli, nd, nd

<snip>

> Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
> 
> 26/03/2020 09:04, Morten Brørup:
> > From: Jerin Jacob
> > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > >
> > > > As was discussed here:
> > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > functions non-inlined. In theory that might help to maintain ABI
> > > > stability in future.
> > > > This is just a POC to measure the impact of proposed idea, proper
> > > > implementation would definetly need some extra effort.
> > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra for
> > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > the impact it might be a bit higher.
> > > > For MP/MC bulk transfers degradation seems quite small, though for
> > > > SP/SC and/or small transfers it is more then noticable (see exact
> > > > numbers below).
> > > > From my perspective we'd probably keep it inlined for now to avoid
> > > > any non-anticipated perfomance degradations.
> > > > Though intersted to see perf results and opinions from other
> > > > interested parties.
> > >
> > > +1
> 
> Konstantin, thank you for doing some measures
> 
> 
> > > My reasoning is a bit different, DPDK is using in embedded boxes too
> > > where performance has more weight than ABI stuff.
> >
> > As a network appliance vendor I can confirm that we certainly care
> > more about performance than ABI stability.
> > ABI stability is irrelevant for us;
> > and API instability is a non-recurring engineering cost each time we
> > choose to switch to a new DPDK version, which we only do if we cannot
> > avoid it, e.g. due to new drivers, security fixes or new features that
> > we want to use.
> >
> > For us, the trend pointed in the wrong direction when DPDK switched
> > the preference towards runtime configurability and deprecated compile
> > time configurability. I do understand the reasoning behind it, and the
> > impact is minimal, so we accept it.
> 
> The code can be optimized by removing some instructions with #ifdef.
> But the complexity of managing #ifdef enabling/disabling, depending on the
> platform and the use case, would be huge.
> We try to have a reasonable code "always enabled" which performs well in all
> cases. This is a design choice which makes DPDK a library, not a pool of code
> to cherry-pick.
> 
> > However, if DPDK starts sacrificing performance of the core libraries
> > for the benefits of the GNU/Linux distributors, network appliance
> > vendors may put more effort into sticking with old DPDK versions
> > instead of updating.
> 
> The initial choice regarding ABI compatibility was "do not care".
> Recently, the decision was done to care about ABI compatibility as priority
> number 2. The priority number 1 remains the performance.
> That's a reason for allowing some ABI breakages in some specific releases
> announced in advance.
> 
> > > I think we need to focus first on slow path APIs ABI stuff.
> 
> Yes we should not degrade fast path performance for the sake of avoiding
> uncertain future ABI issues.
> 
> Morten, Jerin, thank you for the feedback.
I think we have a consensus here not to make any changes to inline functions for now.
Should we mark this as 'Deferred or Rejected'?

> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 19:26  0%             ` McDaniel, Timothy
  2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
@ 2020-07-01  4:50  3%               ` Jerin Jacob
  2020-07-01 16:48  0%                 ` McDaniel, Timothy
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-01  4:50 UTC (permalink / raw)
  To: McDaniel, Timothy
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

On Wed, Jul 1, 2020 at 12:57 AM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> >-----Original Message-----
> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >Sent: Tuesday, June 30, 2020 10:58 AM
> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >
> >On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
> ><timothy.mcdaniel@intel.com> wrote:
> >>
> >> >-----Original Message-----
> >> >From: Jerin Jacob <jerinjacobk@gmail.com>
> >> >Sent: Monday, June 29, 2020 11:21 PM
> >> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> >> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
> >> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> >> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >> >
> >> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
> >> ><timothy.mcdaniel@intel.com> wrote:
> >> >>
> >> >> -----Original Message-----
> >> >> From: Jerin Jacob <jerinjacobk@gmail.com>
> >> >> Sent: Saturday, June 27, 2020 2:45 AM
> >> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
> >> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> >> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
> >> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
> >> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
> >> >>
> >> >> > +
> >> >> > +/** Event port configuration structure */
> >> >> > +struct rte_event_port_conf_v20 {
> >> >> > +       int32_t new_event_threshold;
> >> >> > +       /**< A backpressure threshold for new event enqueues on this port.
> >> >> > +        * Use for *closed system* event dev where event capacity is limited,
> >> >> > +        * and cannot exceed the capacity of the event dev.
> >> >> > +        * Configuring ports with different thresholds can make higher priority
> >> >> > +        * traffic less likely to  be backpressured.
> >> >> > +        * For example, a port used to inject NIC Rx packets into the event dev
> >> >> > +        * can have a lower threshold so as not to overwhelm the device,
> >> >> > +        * while ports used for worker pools can have a higher threshold.
> >> >> > +        * This value cannot exceed the *nb_events_limit*
> >> >> > +        * which was previously supplied to rte_event_dev_configure().
> >> >> > +        * This should be set to '-1' for *open system*.
> >> >> > +        */
> >> >> > +       uint16_t dequeue_depth;
> >> >> > +       /**< Configure number of bulk dequeues for this event port.
> >> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
> >> >> > +        * which previously supplied to rte_event_dev_configure().
> >> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >> >capable.
> >> >> > +        */
> >> >> > +       uint16_t enqueue_depth;
> >> >> > +       /**< Configure number of bulk enqueues for this event port.
> >> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
> >> >> > +        * which previously supplied to rte_event_dev_configure().
> >> >> > +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
> >> >capable.
> >> >> > +        */
> >> >> >         uint8_t disable_implicit_release;
> >> >> >         /**< Configure the port not to release outstanding events in
> >> >> >          * rte_event_dev_dequeue_burst(). If true, all events received through
> >> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
> >> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
> >> >> >                                 struct rte_event_port_conf *port_conf);
> >> >> >
> >> >> > +int
> >> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
> >> >> > +                               struct rte_event_port_conf_v20 *port_conf);
> >> >> > +
> >> >> > +int
> >> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
> >> >> > +                                     struct rte_event_port_conf *port_conf);
> >> >>
> >> >> Hi Timothy,
> >> >>
> >> >> + ABI Maintainers (Ray, Neil)
> >> >>
> >> >> # As per my understanding, the structures can not be versioned, only
> >> >> function can be versioned.
> >> >> i.e we can not make any change to " struct rte_event_port_conf"
> >> >>
> >> >> # We have a similar case with ethdev and it deferred to next release v20.11
> >> >> http://patches.dpdk.org/patch/69113/
> >> >>
> >> >> Regarding the API changes:
> >> >> # The slow path changes general looks good to me. I will review the
> >> >> next level in the coming days
> >> >> # The following fast path changes bothers to me. Could you share more
> >> >> details on below change?
> >> >>
> >> >> diff --git a/app/test-eventdev/test_order_atq.c
> >> >> b/app/test-eventdev/test_order_atq.c
> >> >> index 3366cfc..8246b96 100644
> >> >> --- a/app/test-eventdev/test_order_atq.c
> >> >> +++ b/app/test-eventdev/test_order_atq.c
> >> >> @@ -34,6 +34,8 @@
> >> >>                         continue;
> >> >>                 }
> >> >>
> >> >> +               ev.flow_id = ev.mbuf->udata64;
> >> >> +
> >> >> # Since RC1 is near, I am not sure how to accommodate the API changes
> >> >> now and sort out ABI stuffs.
> >> >> # Other concern is eventdev spec get bloated with versioning files
> >> >> just for ONE release as 20.11 will be OK to change the ABI.
> >> >> # While we discuss the API change, Please send deprecation notice for
> >> >> ABI change for 20.11,
> >> >> so that there is no ambiguity of this patch for the 20.11 release.
> >> >>
> >> >> Hello Jerin,
> >> >>
> >> >> Thank you for the review comments.
> >> >>
> >> >> With regard to your comments regarding the fast path flow_id change, the
> >Intel
> >> >DLB hardware
> >> >> is not capable of transferring the flow_id as part of the event itself. We
> >> >therefore require a mechanism
> >> >> to accomplish this. What we have done to work around this is to require the
> >> >application to embed the flow_id
> >> >> within the data payload. The new flag, #define
> >> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
> >> >> by applications to determine if they need to embed the flow_id, or if its
> >> >automatically propagated and present in the
> >> >> received event.
> >> >>
> >> >> What we should have done is to wrap the assignment with a conditional.
> >> >>
> >> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
> >> >>         ev.flow_id = ev.mbuf->udata64;
> >> >
> >> >Two problems with this approach,
> >> >1) we are assuming mbuf udata64 field is available for DLB driver
> >> >2) It won't work with another adapter, eventdev has no dependency with mbuf
> >> >
> >>
> >> This snippet is not intended to suggest that udata64 always be used to store the
> >flow ID, but as an example of how an application could do it. Some applications
> >won’t need to carry the flow ID through; others can select an unused field in the
> >event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-generate
> >the flow ID in pipeline stages that require it.
> >
> >OK.
> >>
> >> >Question:
> >> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
> >> >only event pointer and not have any other metadata like schedule_type
> >> >etc.
> >> >
> >>
> >> The DLB device provides a 16B “queue entry” that consists of:
> >>
> >> *       8B event data
> >> *       Queue ID
> >> *       Priority
> >> *       Scheduling type
> >> *       19 bits of carried-through data
> >> *       Assorted error/debug/reserved bits that are set by the device (not carried-
> >through)
> >>
> >>  For the carried-through 19b, we use 12b for event_type and sub_event_type.
> >
> >I can only think of TWO options to help
> >1) Since event pointer always cache aligned, You could grab LSB
> >6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
> >structure
> >2) Have separate mempool driver using existing drivers, ie "event
> >pointer" + or - some offset have any amount of custom data.
> >
>
> We can't guarantee that the event will contain a pointer -- it's possible that 8B is inline data (i.e. struct rte_event's u64 field).
>
> It's really an application decision -- for example an app could allocate space in the 'mbuf private data' to store the flow ID, if the event device lacks that carry-flow-ID capability and the other mbuf fields can't be used for whatever reason.
> We modified the tests, sample apps to show how this might be done, not necessarily how it must be done.


Yeah. If HW has limitation we can't do much. It is OK to change
eventdev spec to support new HW limitations. aka,
RTE_EVENT_DEV_CAP_CARRY_FLOW_ID is OK.
Please update existing drivers has this
RTE_EVENT_DEV_CAP_CARRY_FLOW_ID capability which is missing in the
patch(I believe)

>
> >
> >>
> >> >
> >> >>
> >> >> This would minimize/eliminate any performance impact due to the
> >processor's
> >> >branch prediction logic.
> >
> >I think, If we need to change common fastpath, better we need to make
> >it template to create code for compile-time to have absolute zero
> >overhead
> >and use runtime.
> >See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
> >_create_ worker at compile time based on runtime capability.
> >
>
> Yes, that would be perfect.  Thanks for the example!

Where ever you are making fastpath change, Please follow this scheme
and send the next version.
In order to have clean and reusable code, you could have template
function and with "if" and it can opt-out in _compile_ time.
i.e

no_inline generic_worker(..., _const_ uint64_t flags)
{
..
..

if (! flags & CAP_CARRY_FLOW_ID)
    ....

}

worker_with_out_carry_flow_id()
{
          generic_worker(.., CAP_CARRY_FLOW_ID)
}

normal_worker()
{
          generic_worker(.., 0)
}

No other controversial top-level comments with this patch series.
Once we sorted out the ABI issues then I can review and merge.


>
> >
> >
> >> >> The assignment then becomes in essence a NOOP for all event devices that
> >are
> >> >capable of carrying the flow_id as part of the event payload itself.
> >> >>
> >> >> Thanks,
> >> >> Tim
> >> >>
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Tim

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-06-30 23:15  0%       ` Honnappa Nagarahalli
@ 2020-07-01  7:27  0%         ` Morten Brørup
  2020-07-01 12:21  0%           ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-01  7:27 UTC (permalink / raw)
  To: Honnappa Nagarahalli, thomas, Jerin Jacob, Konstantin Ananyev, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd, nd

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa
> Nagarahalli
> Sent: Wednesday, July 1, 2020 1:16 AM
> 
> <snip>
> 
> > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> inlined
> >
> > 26/03/2020 09:04, Morten Brørup:
> > > From: Jerin Jacob
> > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > >
> > > > > As was discussed here:
> > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > > functions non-inlined. In theory that might help to maintain
> ABI
> > > > > stability in future.
> > > > > This is just a POC to measure the impact of proposed idea,
> proper
> > > > > implementation would definetly need some extra effort.
> > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> for
> > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > the impact it might be a bit higher.
> > > > > For MP/MC bulk transfers degradation seems quite small, though
> for
> > > > > SP/SC and/or small transfers it is more then noticable (see
> exact
> > > > > numbers below).
> > > > > From my perspective we'd probably keep it inlined for now to
> avoid
> > > > > any non-anticipated perfomance degradations.
> > > > > Though intersted to see perf results and opinions from other
> > > > > interested parties.
> > > >
> > > > +1
> >
> > Konstantin, thank you for doing some measures
> >
> >
> > > > My reasoning is a bit different, DPDK is using in embedded boxes
> too
> > > > where performance has more weight than ABI stuff.
> > >
> > > As a network appliance vendor I can confirm that we certainly care
> > > more about performance than ABI stability.
> > > ABI stability is irrelevant for us;
> > > and API instability is a non-recurring engineering cost each time
> we
> > > choose to switch to a new DPDK version, which we only do if we
> cannot
> > > avoid it, e.g. due to new drivers, security fixes or new features
> that
> > > we want to use.
> > >
> > > For us, the trend pointed in the wrong direction when DPDK switched
> > > the preference towards runtime configurability and deprecated
> compile
> > > time configurability. I do understand the reasoning behind it, and
> the
> > > impact is minimal, so we accept it.
> >
> > The code can be optimized by removing some instructions with #ifdef.
> > But the complexity of managing #ifdef enabling/disabling, depending
> on the
> > platform and the use case, would be huge.
> > We try to have a reasonable code "always enabled" which performs well
> in all
> > cases. This is a design choice which makes DPDK a library, not a pool
> of code
> > to cherry-pick.
> >
> > > However, if DPDK starts sacrificing performance of the core
> libraries
> > > for the benefits of the GNU/Linux distributors, network appliance
> > > vendors may put more effort into sticking with old DPDK versions
> > > instead of updating.
> >
> > The initial choice regarding ABI compatibility was "do not care".
> > Recently, the decision was done to care about ABI compatibility as
> priority
> > number 2. The priority number 1 remains the performance.
> > That's a reason for allowing some ABI breakages in some specific
> releases
> > announced in advance.
> >
> > > > I think we need to focus first on slow path APIs ABI stuff.
> >
> > Yes we should not degrade fast path performance for the sake of
> avoiding
> > uncertain future ABI issues.
> >
> > Morten, Jerin, thank you for the feedback.
> I think we have a consensus here not to make any changes to inline
> functions for now.
> Should we mark this as 'Deferred or Rejected'?

Rejected.

There is no need for this modification now, and no actual use cases for it in the road map. In other words: This modification has no use cases; it is purely academic. Many other suggestions have been rejected for the reason that they have no current use cases.

As Thomas mentioned, DPDK has transitioned towards being a library, rather than a pool of code to cherry-pick from. I have learned to live with this.

Being a library doesn't mean that functions cannot be exposed as inline code in the library header files. DPDK is mainly a high performance library with a tradition of exposing many of its internals in its API, and we should keep it this way. We certainly don't want an opaque API hiding all of its internals, passing around void pointers.

However, it was still an interesting experiment to investigate the performance cost.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-07-01  7:27  0%         ` Morten Brørup
@ 2020-07-01 12:21  0%           ` Ananyev, Konstantin
  2020-07-01 14:11  0%             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2020-07-01 12:21 UTC (permalink / raw)
  To: Morten Brørup, Honnappa Nagarahalli, thomas, Jerin Jacob, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd, nd

> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa
> > Nagarahalli
> > Sent: Wednesday, July 1, 2020 1:16 AM
> >
> > <snip>
> >
> > > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> > inlined
> > >
> > > 26/03/2020 09:04, Morten Brørup:
> > > > From: Jerin Jacob
> > > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > > >
> > > > > > As was discussed here:
> > > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > > this RFC aimed to hide ring internals into .c and make all ring
> > > > > > functions non-inlined. In theory that might help to maintain
> > ABI
> > > > > > stability in future.
> > > > > > This is just a POC to measure the impact of proposed idea,
> > proper
> > > > > > implementation would definetly need some extra effort.
> > > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> > for
> > > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > > the impact it might be a bit higher.
> > > > > > For MP/MC bulk transfers degradation seems quite small, though
> > for
> > > > > > SP/SC and/or small transfers it is more then noticable (see
> > exact
> > > > > > numbers below).
> > > > > > From my perspective we'd probably keep it inlined for now to
> > avoid
> > > > > > any non-anticipated perfomance degradations.
> > > > > > Though intersted to see perf results and opinions from other
> > > > > > interested parties.
> > > > >
> > > > > +1
> > >
> > > Konstantin, thank you for doing some measures
> > >
> > >
> > > > > My reasoning is a bit different, DPDK is using in embedded boxes
> > too
> > > > > where performance has more weight than ABI stuff.
> > > >
> > > > As a network appliance vendor I can confirm that we certainly care
> > > > more about performance than ABI stability.
> > > > ABI stability is irrelevant for us;
> > > > and API instability is a non-recurring engineering cost each time
> > we
> > > > choose to switch to a new DPDK version, which we only do if we
> > cannot
> > > > avoid it, e.g. due to new drivers, security fixes or new features
> > that
> > > > we want to use.
> > > >
> > > > For us, the trend pointed in the wrong direction when DPDK switched
> > > > the preference towards runtime configurability and deprecated
> > compile
> > > > time configurability. I do understand the reasoning behind it, and
> > the
> > > > impact is minimal, so we accept it.
> > >
> > > The code can be optimized by removing some instructions with #ifdef.
> > > But the complexity of managing #ifdef enabling/disabling, depending
> > on the
> > > platform and the use case, would be huge.
> > > We try to have a reasonable code "always enabled" which performs well
> > in all
> > > cases. This is a design choice which makes DPDK a library, not a pool
> > of code
> > > to cherry-pick.
> > >
> > > > However, if DPDK starts sacrificing performance of the core
> > libraries
> > > > for the benefits of the GNU/Linux distributors, network appliance
> > > > vendors may put more effort into sticking with old DPDK versions
> > > > instead of updating.
> > >
> > > The initial choice regarding ABI compatibility was "do not care".
> > > Recently, the decision was done to care about ABI compatibility as
> > priority
> > > number 2. The priority number 1 remains the performance.
> > > That's a reason for allowing some ABI breakages in some specific
> > releases
> > > announced in advance.
> > >
> > > > > I think we need to focus first on slow path APIs ABI stuff.
> > >
> > > Yes we should not degrade fast path performance for the sake of
> > avoiding
> > > uncertain future ABI issues.
> > >
> > > Morten, Jerin, thank you for the feedback.
> > I think we have a consensus here not to make any changes to inline
> > functions for now.
> > Should we mark this as 'Deferred or Rejected'?
> 
> Rejected.
> 
> There is no need for this modification now, and no actual use cases for it in the road map. In other words: This modification has no use
> cases; it is purely academic. Many other suggestions have been rejected for the reason that they have no current use cases.
> 
> As Thomas mentioned, DPDK has transitioned towards being a library, rather than a pool of code to cherry-pick from. I have learned to live
> with this.
> 
> Being a library doesn't mean that functions cannot be exposed as inline code in the library header files. DPDK is mainly a high performance
> library with a tradition of exposing many of its internals in its API, and we should keep it this way. We certainly don't want an opaque API
> hiding all of its internals, passing around void pointers.
> 
> However, it was still an interesting experiment to investigate the performance cost.

Yes, please reject it.
Konstantin



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] ring: make ring implementation non-inlined
  2020-07-01 12:21  0%           ` Ananyev, Konstantin
@ 2020-07-01 14:11  0%             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2020-07-01 14:11 UTC (permalink / raw)
  To: Ananyev, Konstantin, Morten Brørup, thomas, Jerin Jacob, jerinj
  Cc: dev, Olivier Matz, David Christensen, Stephen Hemminger, nd,
	Honnappa Nagarahalli, nd

<snip>
> > >
> > > > Subject: Re: [dpdk-dev] [RFC] ring: make ring implementation non-
> > > inlined
> > > >
> > > > 26/03/2020 09:04, Morten Brørup:
> > > > > From: Jerin Jacob
> > > > > > On Fri, Mar 20, 2020 Konstantin Ananyev wrote:
> > > > > > >
> > > > > > > As was discussed here:
> > > > > > > http://mails.dpdk.org/archives/dev/2020-February/158586.html
> > > > > > > this RFC aimed to hide ring internals into .c and make all
> > > > > > > ring functions non-inlined. In theory that might help to
> > > > > > > maintain
> > > ABI
> > > > > > > stability in future.
> > > > > > > This is just a POC to measure the impact of proposed idea,
> > > proper
> > > > > > > implementation would definetly need some extra effort.
> > > > > > > On IA box (SKX) ring_perf_autotest shows ~20-30 cycles extra
> > > for
> > > > > > > enqueue+dequeue pair. On some more realistic code, I suspect
> > > > > > > the impact it might be a bit higher.
> > > > > > > For MP/MC bulk transfers degradation seems quite small,
> > > > > > > though
> > > for
> > > > > > > SP/SC and/or small transfers it is more then noticable (see
> > > exact
> > > > > > > numbers below).
> > > > > > > From my perspective we'd probably keep it inlined for now to
> > > avoid
> > > > > > > any non-anticipated perfomance degradations.
> > > > > > > Though intersted to see perf results and opinions from other
> > > > > > > interested parties.
> > > > > >
> > > > > > +1
> > > >
> > > > Konstantin, thank you for doing some measures
> > > >
> > > >
> > > > > > My reasoning is a bit different, DPDK is using in embedded
> > > > > > boxes
> > > too
> > > > > > where performance has more weight than ABI stuff.
> > > > >
> > > > > As a network appliance vendor I can confirm that we certainly
> > > > > care more about performance than ABI stability.
> > > > > ABI stability is irrelevant for us; and API instability is a
> > > > > non-recurring engineering cost each time
> > > we
> > > > > choose to switch to a new DPDK version, which we only do if we
> > > cannot
> > > > > avoid it, e.g. due to new drivers, security fixes or new
> > > > > features
> > > that
> > > > > we want to use.
> > > > >
> > > > > For us, the trend pointed in the wrong direction when DPDK
> > > > > switched the preference towards runtime configurability and
> > > > > deprecated
> > > compile
> > > > > time configurability. I do understand the reasoning behind it,
> > > > > and
> > > the
> > > > > impact is minimal, so we accept it.
> > > >
> > > > The code can be optimized by removing some instructions with #ifdef.
> > > > But the complexity of managing #ifdef enabling/disabling,
> > > > depending
> > > on the
> > > > platform and the use case, would be huge.
> > > > We try to have a reasonable code "always enabled" which performs
> > > > well
> > > in all
> > > > cases. This is a design choice which makes DPDK a library, not a
> > > > pool
> > > of code
> > > > to cherry-pick.
> > > >
> > > > > However, if DPDK starts sacrificing performance of the core
> > > libraries
> > > > > for the benefits of the GNU/Linux distributors, network
> > > > > appliance vendors may put more effort into sticking with old
> > > > > DPDK versions instead of updating.
> > > >
> > > > The initial choice regarding ABI compatibility was "do not care".
> > > > Recently, the decision was done to care about ABI compatibility as
> > > priority
> > > > number 2. The priority number 1 remains the performance.
> > > > That's a reason for allowing some ABI breakages in some specific
> > > releases
> > > > announced in advance.
> > > >
> > > > > > I think we need to focus first on slow path APIs ABI stuff.
> > > >
> > > > Yes we should not degrade fast path performance for the sake of
> > > avoiding
> > > > uncertain future ABI issues.
> > > >
> > > > Morten, Jerin, thank you for the feedback.
> > > I think we have a consensus here not to make any changes to inline
> > > functions for now.
> > > Should we mark this as 'Deferred or Rejected'?
> >
> > Rejected.
> >
> > There is no need for this modification now, and no actual use cases
> > for it in the road map. In other words: This modification has no use cases; it
> is purely academic. Many other suggestions have been rejected for the reason
> that they have no current use cases.
> >
> > As Thomas mentioned, DPDK has transitioned towards being a library,
> > rather than a pool of code to cherry-pick from. I have learned to live with
> this.
> >
> > Being a library doesn't mean that functions cannot be exposed as
> > inline code in the library header files. DPDK is mainly a high
> > performance library with a tradition of exposing many of its internals in its
> API, and we should keep it this way. We certainly don't want an opaque API
> hiding all of its internals, passing around void pointers.
> >
> > However, it was still an interesting experiment to investigate the
> performance cost.
> 
> Yes, please reject it.
I just tried to mark it rejected in patchwork, I do not have the permissions (probably you are the owner of the patch). Can you please mark it?

> Konstantin
> 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
    @ 2020-07-01 15:36  2% ` Viacheslav Ovsiienko
  2020-07-07 11:50  0%   ` Olivier Matz
  2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-01 15:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. It the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..fb5477c 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/*
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, these dynamic flag and field will be used to manage
+ * the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [EXT] RE: [RFC] mbuf: accurate packet Tx scheduling
  @ 2020-07-01 15:46  0%       ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-01 15:46 UTC (permalink / raw)
  To: Harman Kalra
  Cc: dev, Thomas Monjalon, Matan Azrad, Raslan Darawsheh, Ori Kam,
	olivier.matz, Shahaf Shuler

> -----Original Message-----
> From: Harman Kalra <hkalra@marvell.com>
> Sent: Wednesday, June 17, 2020 18:58
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Matan
> Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; Ori Kam <orika@mellanox.com>;
> olivier.matz@6wind.com; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [EXT] RE: [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling
> 
> On Wed, Jun 10, 2020 at 03:16:12PM +0000, Slava Ovsiienko wrote:
> 
> > External Email
> >
> > ----------------------------------------------------------------------
Hi, Harman

Sorry for delay - missed your reply.

[..skip..]
> > Should we waste CPU cycles to wait the desired moment of time? Can we
> > guarantee stable interrupt latency if we choose to schedule on interrupts
> approach?
> >
> > This RFC splits the responsibility - application should prepare the
> > data and specify when it desires to send, the rest is on PMD.
> 
> I agree with the fact that we cannot guarantee the delay between tx burst
> call and data on wire, hence PMD should take care of it.
> Even if PMD is holding, it is wastage of CPU cycles or if we setup an alarm
> then also interrupt latency might be a concern to achieve precise timming. So
> how are you planning to address both of above issue in PMD.

It is promoted to HW. The special WAIT descriptor with timestamp
is pushed to the queue and hardware just waits for the appropriate moment.
It is exactly the task for PMD - convert the timestamp in hardware related
entities and perform requested operation on hardware. Thus we should not
wait on CPU in any way - loop/interrupts, etc. Let NIC do it for us.
 
> >
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > > > related features. This API would break ABI compatibility and can't
> > > > be introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > So what I am understanding here is "rte_dynfield_timestamp" will
> > > provide information about three parameters:
> > > - timestamp at which TX should start
> > > - intra packet gap
> > > - intra burst gap.
> > >
> > > If its about "intra packet gap" then PMD can take care, but if it is
> > > about intra burst gap, application can take care of it.
> >
> > Not sure - the intra-burst gap might be pretty small.
> > It is supposed to handle intra-burst in the same way - by specifying
> > the timestamps. Waiting is supposed to be implemented on tx_burst() retry.
> > Prepare the packets with timestamps, tx_burst - if not all packets are
> > sent - it means queue is waiting for the schedult, retry with the remaining
> packets.
> > As option - we can implement intra-burst wait basing rte_eth_read_clock().
> 
> Yeah, I think app can make use of rte_eth_read_clock() to implement intra-
> burst gap.
> But my actual doubt was, what all information will app provide as part of
> "rte_dynfield_timestamp" - one I understand will be timestamp at which
> packets should be sent out. What else? intra-packet gap ?
Intra-packet gap is just the parameter of testpmd to provide some
preliminary feature testing. If intra-gap is too small even hardware might
not support. In mlx5 we are going to support the scheduling with minimal
granularity 250 nanoseconds, so minimal gap supported is 250ns,
on 100Gbps line speed it means at least two 1500B packets.

With best regards, Slava

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-07-01  4:50  3%               ` Jerin Jacob
@ 2020-07-01 16:48  0%                 ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-07-01 16:48 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Eads, Gage, Van Haaren, Harry

>-----Original Message-----
>From: Jerin Jacob <jerinjacobk@gmail.com>
>Sent: Tuesday, June 30, 2020 11:50 PM
>To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>On Wed, Jul 1, 2020 at 12:57 AM McDaniel, Timothy
><timothy.mcdaniel@intel.com> wrote:
>>
>> >-----Original Message-----
>> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >Sent: Tuesday, June 30, 2020 10:58 AM
>> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>;
>> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads, Gage
>> ><gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>> >
>> >On Tue, Jun 30, 2020 at 9:12 PM McDaniel, Timothy
>> ><timothy.mcdaniel@intel.com> wrote:
>> >>
>> >> >-----Original Message-----
>> >> >From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> >Sent: Monday, June 29, 2020 11:21 PM
>> >> >To: McDaniel, Timothy <timothy.mcdaniel@intel.com>
>> >> >Cc: Ray Kinsella <mdr@ashroe.eu>; Neil Horman
><nhorman@tuxdriver.com>;
>> >> >Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>> >> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>> >> >Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>> >> >
>> >> >On Tue, Jun 30, 2020 at 1:01 AM McDaniel, Timothy
>> >> ><timothy.mcdaniel@intel.com> wrote:
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Jerin Jacob <jerinjacobk@gmail.com>
>> >> >> Sent: Saturday, June 27, 2020 2:45 AM
>> >> >> To: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Ray Kinsella
>> >> ><mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
>> >> >> Cc: Jerin Jacob <jerinj@marvell.com>; Mattias Rönnblom
>> >> ><mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage
>> >> ><gage.eads@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>
>> >> >> Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream
>prerequisites
>> >> >>
>> >> >> > +
>> >> >> > +/** Event port configuration structure */
>> >> >> > +struct rte_event_port_conf_v20 {
>> >> >> > +       int32_t new_event_threshold;
>> >> >> > +       /**< A backpressure threshold for new event enqueues on this
>port.
>> >> >> > +        * Use for *closed system* event dev where event capacity is
>limited,
>> >> >> > +        * and cannot exceed the capacity of the event dev.
>> >> >> > +        * Configuring ports with different thresholds can make higher
>priority
>> >> >> > +        * traffic less likely to  be backpressured.
>> >> >> > +        * For example, a port used to inject NIC Rx packets into the event
>dev
>> >> >> > +        * can have a lower threshold so as not to overwhelm the device,
>> >> >> > +        * while ports used for worker pools can have a higher threshold.
>> >> >> > +        * This value cannot exceed the *nb_events_limit*
>> >> >> > +        * which was previously supplied to rte_event_dev_configure().
>> >> >> > +        * This should be set to '-1' for *open system*.
>> >> >> > +        */
>> >> >> > +       uint16_t dequeue_depth;
>> >> >> > +       /**< Configure number of bulk dequeues for this event port.
>> >> >> > +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>> >> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>> >> >capable.
>> >> >> > +        */
>> >> >> > +       uint16_t enqueue_depth;
>> >> >> > +       /**< Configure number of bulk enqueues for this event port.
>> >> >> > +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>> >> >> > +        * which previously supplied to rte_event_dev_configure().
>> >> >> > +        * Ignored when device is not
>RTE_EVENT_DEV_CAP_BURST_MODE
>> >> >capable.
>> >> >> > +        */
>> >> >> >         uint8_t disable_implicit_release;
>> >> >> >         /**< Configure the port not to release outstanding events in
>> >> >> >          * rte_event_dev_dequeue_burst(). If true, all events received
>through
>> >> >> > @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>> >> >> >  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>> >> >> >                                 struct rte_event_port_conf *port_conf);
>> >> >> >
>> >> >> > +int
>> >> >> > +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>> >> >> > +                               struct rte_event_port_conf_v20 *port_conf);
>> >> >> > +
>> >> >> > +int
>> >> >> > +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>> >> >> > +                                     struct rte_event_port_conf *port_conf);
>> >> >>
>> >> >> Hi Timothy,
>> >> >>
>> >> >> + ABI Maintainers (Ray, Neil)
>> >> >>
>> >> >> # As per my understanding, the structures can not be versioned, only
>> >> >> function can be versioned.
>> >> >> i.e we can not make any change to " struct rte_event_port_conf"
>> >> >>
>> >> >> # We have a similar case with ethdev and it deferred to next release
>v20.11
>> >> >> http://patches.dpdk.org/patch/69113/
>> >> >>
>> >> >> Regarding the API changes:
>> >> >> # The slow path changes general looks good to me. I will review the
>> >> >> next level in the coming days
>> >> >> # The following fast path changes bothers to me. Could you share more
>> >> >> details on below change?
>> >> >>
>> >> >> diff --git a/app/test-eventdev/test_order_atq.c
>> >> >> b/app/test-eventdev/test_order_atq.c
>> >> >> index 3366cfc..8246b96 100644
>> >> >> --- a/app/test-eventdev/test_order_atq.c
>> >> >> +++ b/app/test-eventdev/test_order_atq.c
>> >> >> @@ -34,6 +34,8 @@
>> >> >>                         continue;
>> >> >>                 }
>> >> >>
>> >> >> +               ev.flow_id = ev.mbuf->udata64;
>> >> >> +
>> >> >> # Since RC1 is near, I am not sure how to accommodate the API changes
>> >> >> now and sort out ABI stuffs.
>> >> >> # Other concern is eventdev spec get bloated with versioning files
>> >> >> just for ONE release as 20.11 will be OK to change the ABI.
>> >> >> # While we discuss the API change, Please send deprecation notice for
>> >> >> ABI change for 20.11,
>> >> >> so that there is no ambiguity of this patch for the 20.11 release.
>> >> >>
>> >> >> Hello Jerin,
>> >> >>
>> >> >> Thank you for the review comments.
>> >> >>
>> >> >> With regard to your comments regarding the fast path flow_id change,
>the
>> >Intel
>> >> >DLB hardware
>> >> >> is not capable of transferring the flow_id as part of the event itself. We
>> >> >therefore require a mechanism
>> >> >> to accomplish this. What we have done to work around this is to require
>the
>> >> >application to embed the flow_id
>> >> >> within the data payload. The new flag, #define
>> >> >RTE_EVENT_DEV_CAP_CARRY_FLOW_ID (1ULL << 9), can be used
>> >> >> by applications to determine if they need to embed the flow_id, or if its
>> >> >automatically propagated and present in the
>> >> >> received event.
>> >> >>
>> >> >> What we should have done is to wrap the assignment with a conditional.
>> >> >>
>> >> >> if (!(device_capability_flags & RTE_EVENT_DEV_CAP_CARRY_FLOW_ID))
>> >> >>         ev.flow_id = ev.mbuf->udata64;
>> >> >
>> >> >Two problems with this approach,
>> >> >1) we are assuming mbuf udata64 field is available for DLB driver
>> >> >2) It won't work with another adapter, eventdev has no dependency with
>mbuf
>> >> >
>> >>
>> >> This snippet is not intended to suggest that udata64 always be used to store
>the
>> >flow ID, but as an example of how an application could do it. Some
>applications
>> >won’t need to carry the flow ID through; others can select an unused field in
>the
>> >event data (e.g. hash.rss or udata64 if using mbufs), or (worst-case) re-
>generate
>> >the flow ID in pipeline stages that require it.
>> >
>> >OK.
>> >>
>> >> >Question:
>> >> >1) In the case of DLB hardware, on dequeue(),  what HW returns? is it
>> >> >only event pointer and not have any other metadata like schedule_type
>> >> >etc.
>> >> >
>> >>
>> >> The DLB device provides a 16B “queue entry” that consists of:
>> >>
>> >> *       8B event data
>> >> *       Queue ID
>> >> *       Priority
>> >> *       Scheduling type
>> >> *       19 bits of carried-through data
>> >> *       Assorted error/debug/reserved bits that are set by the device (not
>carried-
>> >through)
>> >>
>> >>  For the carried-through 19b, we use 12b for event_type and
>sub_event_type.
>> >
>> >I can only think of TWO options to help
>> >1) Since event pointer always cache aligned, You could grab LSB
>> >6bits(2^6 = 64B ) and 7 bits from (19b - 12b) carried through
>> >structure
>> >2) Have separate mempool driver using existing drivers, ie "event
>> >pointer" + or - some offset have any amount of custom data.
>> >
>>
>> We can't guarantee that the event will contain a pointer -- it's possible that 8B
>is inline data (i.e. struct rte_event's u64 field).
>>
>> It's really an application decision -- for example an app could allocate space in
>the 'mbuf private data' to store the flow ID, if the event device lacks that carry-
>flow-ID capability and the other mbuf fields can't be used for whatever reason.
>> We modified the tests, sample apps to show how this might be done, not
>necessarily how it must be done.
>
>
>Yeah. If HW has limitation we can't do much. It is OK to change
>eventdev spec to support new HW limitations. aka,
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID is OK.
>Please update existing drivers has this
>RTE_EVENT_DEV_CAP_CARRY_FLOW_ID capability which is missing in the
>patch(I believe)
>
>>
>> >
>> >>
>> >> >
>> >> >>
>> >> >> This would minimize/eliminate any performance impact due to the
>> >processor's
>> >> >branch prediction logic.
>> >
>> >I think, If we need to change common fastpath, better we need to make
>> >it template to create code for compile-time to have absolute zero
>> >overhead
>> >and use runtime.
>> >See app/test-eventdev/test_order_atq.c: function: worker_wrapper()
>> >_create_ worker at compile time based on runtime capability.
>> >
>>
>> Yes, that would be perfect.  Thanks for the example!
>
>Where ever you are making fastpath change, Please follow this scheme
>and send the next version.
>In order to have clean and reusable code, you could have template
>function and with "if" and it can opt-out in _compile_ time.
>i.e
>
>no_inline generic_worker(..., _const_ uint64_t flags)
>{
>..
>..
>
>if (! flags & CAP_CARRY_FLOW_ID)
>    ....
>
>}
>
>worker_with_out_carry_flow_id()
>{
>          generic_worker(.., CAP_CARRY_FLOW_ID)
>}
>
>normal_worker()
>{
>          generic_worker(.., 0)
>}
>
>No other controversial top-level comments with this patch series.
>Once we sorted out the ABI issues then I can review and merge.
>

Thanks Jerin. I'll get these changes into the v3 patch set.

>
>>
>> >
>> >
>> >> >> The assignment then becomes in essence a NOOP for all event devices
>that
>> >are
>> >> >capable of carrying the flow_id as part of the event payload itself.
>> >> >>
>> >> >> Thanks,
>> >> >> Tim
>> >> >>
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Tim

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 22/27] doc: update references to master/slave lcore in documentation
  @ 2020-07-01 19:46  1%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-01 19:46 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

New terms are initial and worker lcores.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/contributing/coding_style.rst            |  2 +-
 doc/guides/faq/faq.rst                              |  6 +++---
 doc/guides/howto/debug_troubleshoot.rst             |  2 +-
 doc/guides/linux_gsg/eal_args.include.rst           |  4 ++--
 doc/guides/nics/bnxt.rst                            |  2 +-
 doc/guides/nics/fail_safe.rst                       |  3 ---
 doc/guides/prog_guide/env_abstraction_layer.rst     |  6 +++---
 doc/guides/prog_guide/event_ethernet_rx_adapter.rst |  2 +-
 doc/guides/prog_guide/glossary.rst                  |  8 ++++----
 doc/guides/rel_notes/release_20_08.rst              |  7 ++++++-
 doc/guides/sample_app_ug/bbdev_app.rst              |  2 +-
 doc/guides/sample_app_ug/ethtool.rst                |  4 ++--
 doc/guides/sample_app_ug/hello_world.rst            |  8 ++++----
 doc/guides/sample_app_ug/ioat.rst                   | 12 ++++++------
 doc/guides/sample_app_ug/ip_pipeline.rst            |  4 ++--
 doc/guides/sample_app_ug/keep_alive.rst             |  2 +-
 doc/guides/sample_app_ug/l2_forward_event.rst       |  4 ++--
 .../sample_app_ug/l2_forward_real_virtual.rst       |  4 ++--
 doc/guides/sample_app_ug/l3_forward_graph.rst       |  6 +++---
 doc/guides/sample_app_ug/l3_forward_power_man.rst   |  2 +-
 doc/guides/sample_app_ug/link_status_intr.rst       |  4 ++--
 doc/guides/sample_app_ug/multi_process.rst          |  6 +++---
 doc/guides/sample_app_ug/packet_ordering.rst        |  8 ++++----
 doc/guides/sample_app_ug/performance_thread.rst     |  6 +++---
 doc/guides/sample_app_ug/qos_scheduler.rst          |  4 ++--
 doc/guides/sample_app_ug/timer.rst                  | 13 +++++++------
 doc/guides/testpmd_app_ug/run_app.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst         |  2 +-
 28 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index 4efde93f6af0..321d54438f7d 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -334,7 +334,7 @@ For example:
 	typedef int (lcore_function_t)(void *);
 
 	/* launch a function of lcore_function_t type */
-	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned slave_id);
+	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned id);
 
 
 C Indentation
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index f19c1389b6af..cb5f35923d64 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -42,13 +42,13 @@ I am running a 32-bit DPDK application on a NUMA system, and sometimes the appli
 If your system has a lot (>1 GB size) of hugepage memory, not all of it will be allocated.
 Due to hugepages typically being allocated on a local NUMA node, the hugepages allocation the application gets during the initialization depends on which
 NUMA node it is running on (the EAL does not affinitize cores until much later in the initialization process).
-Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK master core and
+Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK initial core and
 therefore all the hugepages are allocated on the wrong socket.
 
 To avoid this scenario, either lower the amount of hugepage memory available to 1 GB size (or less), or run the application with taskset
-affinitizing the application to a would-be master core.
+affinitizing the application to a would-be initial core.
 
-For example, if your EAL coremask is 0xff0, the master core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
+For example, if your EAL coremask is 0xff0, the initial core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
 
    taskset 0x10 ./l2fwd -l 4-11 -n 2
 
diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
index cef016b2fef4..fdeaabe62206 100644
--- a/doc/guides/howto/debug_troubleshoot.rst
+++ b/doc/guides/howto/debug_troubleshoot.rst
@@ -311,7 +311,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
      SERVICE. Check performance functions are mapped to run on the cores.
 
    * For high-performance execution logic ensure running it on correct NUMA
-     and non-master core.
+     and worker core.
 
    * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
      ``rte_memdump`` for more insights.
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe44579689b..ca7508fb423e 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -33,9 +33,9 @@ Lcore-related options
     At a given instance only one core option ``--lcores``, ``-l`` or ``-c`` can
     be used.
 
-*   ``--master-lcore <core ID>``
+*   ``--initial-lcore <core ID>``
 
-    Core ID that is used as master.
+    Core ID that is used as initial lcore.
 
 *   ``-s <service core mask>``
 
diff --git a/doc/guides/nics/bnxt.rst b/doc/guides/nics/bnxt.rst
index a53cdad21d34..6a7314a91627 100644
--- a/doc/guides/nics/bnxt.rst
+++ b/doc/guides/nics/bnxt.rst
@@ -385,7 +385,7 @@ The application enables multiple TX and RX queues when it is started.
 
 .. code-block:: console
 
-    testpmd -l 1,3,5 --master-lcore 1 --txq=2 –rxq=2 --nb-cores=2
+    testpmd -l 1,3,5 --initial-lcore 1 --txq=2 –rxq=2 --nb-cores=2
 
 **TSS**
 
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index b4a92f663b17..3b15d6f0743d 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -236,9 +236,6 @@ Upkeep round
     (brought down or up accordingly). Additionally, any sub-device marked for
     removal is cleaned-up.
 
-Slave
-    In the context of the fail-safe PMD, synonymous to sub-device.
-
 Sub-device
     A device being utilized by the fail-safe PMD.
     This is another PMD running underneath the fail-safe PMD.
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 48a2fec066db..463245463c52 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -64,7 +64,7 @@ It consist of calls to the pthread library (more specifically, pthread_self(), p
 .. note::
 
     Initialization of objects, such as memory zones, rings, memory pools, lpm tables and hash tables,
-    should be done as part of the overall application initialization on the master lcore.
+    should be done as part of the overall application initialization on the initial lcore.
     The creation and initialization functions for these objects are not multi-thread safe.
     However, once initialized, the objects themselves can safely be used in multiple threads simultaneously.
 
@@ -186,7 +186,7 @@ very dependent on the memory allocation patterns of the application.
 
 Additional restrictions are present when running in 32-bit mode. In dynamic
 memory mode, by default maximum of 2 gigabytes of VA space will be preallocated,
-and all of it will be on master lcore NUMA node unless ``--socket-mem`` flag is
+and all of it will be on initial lcore NUMA node unless ``--socket-mem`` flag is
 used.
 
 In legacy mode, VA space will only be preallocated for segments that were
@@ -603,7 +603,7 @@ controlled with tools like taskset (Linux) or cpuset (FreeBSD),
 - with affinity restricted to 2-4, the Control Threads will end up on
   CPU 4.
 - with affinity restricted to 2-3, the Control Threads will end up on
-  CPU 2 (master lcore, which is the default when no CPU is available).
+  CPU 2 (initial lcore, which is the default when no CPU is available).
 
 .. _known_issue_label:
 
diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
index c7dda92215ea..5d015fa2d678 100644
--- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
+++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
@@ -172,7 +172,7 @@ converts the received packets to events in the same manner as packets
 received on a polled Rx queue. The interrupt thread is affinitized to the same
 CPUs as the lcores of the Rx adapter service function, if the Rx adapter
 service function has not been mapped to any lcores, the interrupt thread
-is mapped to the master lcore.
+is mapped to the initial lcore.
 
 Rx Callback for SW Rx Adapter
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst
index 21063a414729..3716efd13da2 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -124,9 +124,9 @@ LAN
 LPM
    Longest Prefix Match
 
-master lcore
+initial lcore
    The execution unit that executes the main() function and that launches
-   other lcores.
+   other lcores. Described in older versions as master lcore.
 
 mbuf
    An mbuf is a data structure used internally to carry messages (mainly
@@ -184,8 +184,8 @@ RTE
 Rx
    Reception
 
-Slave lcore
-   Any *lcore* that is not the *master lcore*.
+Worker lcore
+   Any *lcore* that is not the *initial lcore*.
 
 Socket
    A physical CPU, that includes several *cores*.
diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14446..ecbceb0d05e3 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -107,6 +107,9 @@ New Features
   * Dump ``rte_flow`` memory consumption.
   * Measure packet per second forwarding.
 
+* **Renamed master lcore to initial lcore.**
+
+  The name given to the first thread in DPDK is changed from master lcore to initial lcore.
 
 Removed Items
 -------------
@@ -122,7 +125,6 @@ Removed Items
 
 * Removed ``RTE_KDRV_NONE`` based PCI device driver probing.
 
-
 API Changes
 -----------
 
@@ -143,6 +145,9 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* ``rte_get_master_lcore`` was renamed to ``rte_get_initial_lcore``
+  The old function is deprecated and will be removed in future release.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index 405e706a46e4..5917d52ca199 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -94,7 +94,7 @@ device gets linked to a corresponding ethernet port as whitelisted by
 the parameter -w.
 3 cores are allocated to the application, and assigned as:
 
- - core 3 is the master and used to print the stats live on screen,
+ - core 3 is the initial and used to print the stats live on screen,
 
  - core 4 is the encoding lcore performing Rx and Turbo Encode operations
 
diff --git a/doc/guides/sample_app_ug/ethtool.rst b/doc/guides/sample_app_ug/ethtool.rst
index 8f7fc6ca66c0..a4b92255c266 100644
--- a/doc/guides/sample_app_ug/ethtool.rst
+++ b/doc/guides/sample_app_ug/ethtool.rst
@@ -64,8 +64,8 @@ Explanation
 -----------
 
 The sample program has two parts: A background `packet reflector`_
-that runs on a slave core, and a foreground `Ethtool Shell`_ that
-runs on the master core. These are described below.
+that runs on a worker core, and a foreground `Ethtool Shell`_ that
+runs on the initial core. These are described below.
 
 Packet Reflector
 ~~~~~~~~~~~~~~~~
diff --git a/doc/guides/sample_app_ug/hello_world.rst b/doc/guides/sample_app_ug/hello_world.rst
index 46f997a7dce3..f6740b10e385 100644
--- a/doc/guides/sample_app_ug/hello_world.rst
+++ b/doc/guides/sample_app_ug/hello_world.rst
@@ -75,13 +75,13 @@ The code that launches the function on each lcore is as follows:
 
 .. code-block:: c
 
-    /* call lcore_hello() on every slave lcore */
+    /* call lcore_hello() on every worker lcore */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
        rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     lcore_hello(NULL);
 
@@ -89,6 +89,6 @@ The following code is equivalent and simpler:
 
 .. code-block:: c
 
-    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_MASTER);
+    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_INITIAL);
 
 Refer to the *DPDK API Reference* for detailed information on the rte_eal_mp_remote_launch() function.
diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
index bab7654b8d4d..c75b91bfa989 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -69,13 +69,13 @@ provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
 updates MAC address and sends the copy. If one lcore per port is used,
 both operations are done sequentially. For each configuration an additional
-lcore is needed since the master lcore does not handle traffic but is
+lcore is needed since the initial lcore does not handle traffic but is
 responsible for configuration, statistics printing and safe shutdown of
 all ports and devices.
 
 The application can use a maximum of 8 ports.
 
-To run the application in a Linux environment with 3 lcores (the master lcore,
+To run the application in a Linux environment with 3 lcores (the initial lcore,
 plus two forwarding cores), a single port (port 0), software copying and MAC
 updating issue the command:
 
@@ -83,7 +83,7 @@ updating issue the command:
 
     $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
 
-To run the application in a Linux environment with 2 lcores (the master lcore,
+To run the application in a Linux environment with 2 lcores (the initial lcore,
 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
 updating issue the command:
 
@@ -208,7 +208,7 @@ After that each port application assigns resources needed.
     cfg.nb_lcores = rte_lcore_count() - 1;
     if (cfg.nb_lcores < 1)
         rte_exit(EXIT_FAILURE,
-            "There should be at least one slave lcore.\n");
+            "There should be at least one worker lcore.\n");
 
     ret = 0;
 
@@ -310,8 +310,8 @@ If initialization is successful, memory for hardware device
 statistics is allocated.
 
 Finally ``main()`` function starts all packet handling lcores and starts
-printing stats in a loop on the master lcore. The application can be
-interrupted and closed using ``Ctrl-C``. The master lcore waits for
+printing stats in a loop on the initial lcore. The application can be
+interrupted and closed using ``Ctrl-C``. The initial lcore waits for
 all slave processes to finish, deallocates resources and exits.
 
 The processing lcores launching function are described below.
diff --git a/doc/guides/sample_app_ug/ip_pipeline.rst b/doc/guides/sample_app_ug/ip_pipeline.rst
index 56014be17458..f395027b3498 100644
--- a/doc/guides/sample_app_ug/ip_pipeline.rst
+++ b/doc/guides/sample_app_ug/ip_pipeline.rst
@@ -122,7 +122,7 @@ is displayed and the application is terminated.
 Run-time
 ~~~~~~~~
 
-The master thread is creating and managing all the application objects based on CLI input.
+The initial thread is creating and managing all the application objects based on CLI input.
 
 Each data plane thread runs one or several pipelines previously assigned to it in round-robin order. Each data plane thread
 executes two tasks in time-sharing mode:
@@ -130,7 +130,7 @@ executes two tasks in time-sharing mode:
 1. *Packet processing task*: Process bursts of input packets read from the pipeline input ports.
 
 2. *Message handling task*: Periodically, the data plane thread pauses the packet processing task and polls for request
-   messages send by the master thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
+   messages send by the initial thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
    to/from given table of a specific pipeline owned by the current data plane thread, read statistics, etc.
 
 Examples
diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst
index 865ba69e5c47..bca5df8ba934 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -16,7 +16,7 @@ Overview
 --------
 
 The application demonstrates how to protect against 'silent outages'
-on packet processing cores. A Keep Alive Monitor Agent Core (master)
+on packet processing cores. A Keep Alive Monitor Agent Core (initial)
 monitors the state of packet processing cores (worker cores) by
 dispatching pings at a regular time interval (default is 5ms) and
 monitoring the state of the cores. Cores states are: Alive, MIA, Dead
diff --git a/doc/guides/sample_app_ug/l2_forward_event.rst b/doc/guides/sample_app_ug/l2_forward_event.rst
index d536eee819d0..f384420cf1f0 100644
--- a/doc/guides/sample_app_ug/l2_forward_event.rst
+++ b/doc/guides/sample_app_ug/l2_forward_event.rst
@@ -630,8 +630,8 @@ not many packets to send, however it improves performance:
 
                         /* if timer has reached its timeout */
                         if (unlikely(timer_tsc >= timer_period)) {
-                                /* do this only on master core */
-                                if (lcore_id == rte_get_master_lcore()) {
+                                /* do this only on initial core */
+                                if (lcore_id == rte_get_initial_lcore()) {
                                         print_stats();
                                         /* reset the timer */
                                         timer_tsc = 0;
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 671d0c7c19d4..615a55c36db9 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -440,9 +440,9 @@ however it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index df50827bab86..4ac96fc0c2f7 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -22,7 +22,7 @@ Run-time path is main thing that differs from L3 forwarding sample application.
 Difference is that forwarding logic starting from Rx, followed by LPM lookup,
 TTL update and finally Tx is implemented inside graph nodes. These nodes are
 interconnected in graph framework. Application main loop needs to walk over
-graph using ``rte_graph_walk()`` with graph objects created one per slave lcore.
+graph using ``rte_graph_walk()`` with graph objects created one per worker lcore.
 
 The lookup method is as per implementation of ``ip4_lookup`` graph node.
 The ID of the output interface for the input packet is the next hop returned by
@@ -265,7 +265,7 @@ headers will be provided run-time using ``rte_node_ip4_route_add()`` and
     Since currently ``ip4_lookup`` and ``ip4_rewrite`` nodes don't support
     lock-less mechanisms(RCU, etc) to add run-time forwarding data like route and
     rewrite data, forwarding data is added before packet processing loop is
-    launched on slave lcore.
+    launched on worker lcore.
 
 .. code-block:: c
 
@@ -297,7 +297,7 @@ Packet Forwarding using Graph Walk
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Now that all the device configurations are done, graph creations are done and
-forwarding data is updated with nodes, slave lcores will be launched with graph
+forwarding data is updated with nodes, worker lcores will be launched with graph
 main loop. Graph main loop is very simple in the sense that it needs to
 continuously call a non-blocking API ``rte_graph_walk()`` with it's lcore
 specific graph object that was already created.
diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst
index 0cc6f2e62e75..f20502c41a37 100644
--- a/doc/guides/sample_app_ug/l3_forward_power_man.rst
+++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst
@@ -441,7 +441,7 @@ The telemetry mode support for ``l3fwd-power`` is a standalone mode, in this mod
 ``l3fwd-power`` does simple l3fwding along with calculating empty polls, full polls,
 and busy percentage for each forwarding core. The aggregation of these
 values of all cores is reported as application level telemetry to metric
-library for every 500ms from the master core.
+library for every 500ms from the initial core.
 
 The busy percentage is calculated by recording the poll_count
 and when the count reaches a defined value the total
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst b/doc/guides/sample_app_ug/link_status_intr.rst
index 04c40f28540d..e31fd2cc7368 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -401,9 +401,9 @@ However, it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/multi_process.rst b/doc/guides/sample_app_ug/multi_process.rst
index f2a79a639763..51b8db5cf75a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -66,7 +66,7 @@ The process should start successfully and display a command prompt as follows:
 
     EAL: check igb_uio module
     EAL: check module finished
-    EAL: Master core 0 is ready (tid=54e41820)
+    EAL: Initial core 0 is ready (tid=54e41820)
     EAL: Core 1 is ready (tid=53b32700)
 
     Starting core 1
@@ -92,7 +92,7 @@ At any stage, either process can be terminated using the quit command.
 
 .. code-block:: console
 
-   EAL: Master core 10 is ready (tid=b5f89820)           EAL: Master core 8 is ready (tid=864a3820)
+   EAL: Initial core 10 is ready (tid=b5f89820)           EAL: Initial core 8 is ready (tid=864a3820)
    EAL: Core 11 is ready (tid=84ffe700)                  EAL: Core 9 is ready (tid=85995700)
    Starting core 11                                      Starting core 9
    simple_mp > send hello_secondary                      simple_mp > core 9: Received 'hello_secondary'
@@ -273,7 +273,7 @@ In addition to the EAL parameters, the application- specific parameters are:
 
 .. note::
 
-    In the server process, a single thread, the master thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
+    In the server process, a single thread, the initial thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
     If a coremask/corelist is specified with more than a single lcore bit set in it,
     an additional lcore will be used for a thread to periodically print packet count statistics.
 
diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst
index 1c8ee5d04071..e82938bd7c9c 100644
--- a/doc/guides/sample_app_ug/packet_ordering.rst
+++ b/doc/guides/sample_app_ug/packet_ordering.rst
@@ -12,14 +12,14 @@ Overview
 
 The application uses at least three CPU cores:
 
-* RX core (maser core) receives traffic from the NIC ports and feeds Worker
+* RX core (initial core) receives traffic from the NIC ports and feeds Worker
   cores with traffic through SW queues.
 
-* Worker core (slave core) basically do some light work on the packet.
+* Worker cores basically do some light work on the packet.
   Currently it modifies the output port of the packet for configurations with
   more than one port enabled.
 
-* TX Core (slave core) receives traffic from Worker cores through software queues,
+* TX Core receives traffic from Worker cores through software queues,
   inserts out-of-order packets into reorder buffer, extracts ordered packets
   from the reorder buffer and sends them to the NIC ports for transmission.
 
@@ -46,7 +46,7 @@ The application execution command line is:
     ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker]
 
 The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores.
-The first CPU core in the core mask is the master core and would be assigned to
+The first CPU core in the core mask is the initial core and would be assigned to
 RX core, the last to TX core and the rest to Worker cores.
 
 The PORTMASK parameter must contain either 1 or even enabled port numbers.
diff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst
index b04d0ba444af..29105f9708eb 100644
--- a/doc/guides/sample_app_ug/performance_thread.rst
+++ b/doc/guides/sample_app_ug/performance_thread.rst
@@ -280,8 +280,8 @@ functionality into different threads, and the pairs of RX and TX threads are
 interconnected via software rings.
 
 On initialization an L-thread scheduler is started on every EAL thread. On all
-but the master EAL thread only a dummy L-thread is initially started.
-The L-thread started on the master EAL thread then spawns other L-threads on
+but the initial EAL thread only a dummy L-thread is initially started.
+The L-thread started on the initial EAL thread then spawns other L-threads on
 different L-thread schedulers according the command line parameters.
 
 The RX threads poll the network interface queues and post received packets
@@ -1217,5 +1217,5 @@ Setting ``LTHREAD_DIAG`` also enables counting of statistics about cache and
 queue usage, and these statistics can be displayed by calling the function
 ``lthread_diag_stats_display()``. This function also performs a consistency
 check on the caches and queues. The function should only be called from the
-master EAL thread after all slave threads have stopped and returned to the C
+initial EAL thread after all worker threads have stopped and returned to the C
 main program, otherwise the consistency check will fail.
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst b/doc/guides/sample_app_ug/qos_scheduler.rst
index b5010657a7d8..345ecbb5905d 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -71,7 +71,7 @@ Optional application parameters include:
     In this mode, the application shows a command line that can be used for obtaining statistics while
     scheduling is taking place (see interactive mode below for more information).
 
-*   --mst n: Master core index (the default value is 1).
+*   --mst n: Initial core index (the default value is 1).
 
 *   --rsz "A, B, C": Ring sizes:
 
@@ -329,7 +329,7 @@ Another example with 2 packet flow configurations using different ports but shar
 Note that independent cores for the packet flow configurations for each of the RX, WT and TX thread are also supported,
 providing flexibility to balance the work.
 
-The EAL coremask/corelist is constrained to contain the default mastercore 1 and the RX, WT and TX cores only.
+The EAL coremask/corelist is constrained to contain the default initial lcore 1 and the RX, WT and TX cores only.
 
 Explanation
 -----------
diff --git a/doc/guides/sample_app_ug/timer.rst b/doc/guides/sample_app_ug/timer.rst
index 98d762d2388c..59a8ab11e9b6 100644
--- a/doc/guides/sample_app_ug/timer.rst
+++ b/doc/guides/sample_app_ug/timer.rst
@@ -49,17 +49,18 @@ In addition to EAL initialization, the timer subsystem must be initialized, by c
     rte_timer_subsystem_init();
 
 After timer creation (see the next paragraph),
-the main loop is executed on each slave lcore using the well-known rte_eal_remote_launch() and also on the master.
+the main loop is executed on each worker lcore using the well-known rte_eal_remote_launch() and
+also on the initial lcore.
 
 .. code-block:: c
 
-    /* call lcore_mainloop() on every slave lcore  */
+    /* call lcore_mainloop() on every worker lcore  */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
         rte_eal_remote_launch(lcore_mainloop, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     (void) lcore_mainloop(NULL);
 
@@ -105,7 +106,7 @@ This call to rte_timer_init() is necessary before doing any other operation on t
 
 Then, the two timers are configured:
 
-*   The first timer (timer0) is loaded on the master lcore and expires every second.
+*   The first timer (timer0) is loaded on the initial lcore and expires every second.
     Since the PERIODICAL flag is provided, the timer is reloaded automatically by the timer subsystem.
     The callback function is timer0_cb().
 
@@ -115,7 +116,7 @@ Then, the two timers are configured:
 
 .. code-block:: c
 
-    /* load timer0, every second, on master lcore, reloaded automatically */
+    /* load timer0, every second, on initial lcore, reloaded automatically */
 
     hz = rte_get_hpet_hz();
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f169604752b8..7d6b81de7f46 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -71,7 +71,7 @@ The command line options are:
 *   ``--coremask=0xXX``
 
     Set the hexadecimal bitmask of the cores running the packet forwarding test.
-    The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+    The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 *   ``--portmask=0xXX``
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a308f2..7d4db1140092 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -692,7 +692,7 @@ This is equivalent to the ``--coremask`` command-line option.
 
 .. note::
 
-   The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+   The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 set portmask
 ~~~~~~~~~~~~
-- 
2.26.2


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v4 22/27] doc: update references to master/slave lcore in documentation
  @ 2020-07-01 20:23  1%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-01 20:23 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

New terms are initial and worker lcores.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/guides/contributing/coding_style.rst            |  2 +-
 doc/guides/faq/faq.rst                              |  6 +++---
 doc/guides/howto/debug_troubleshoot.rst             |  2 +-
 doc/guides/linux_gsg/eal_args.include.rst           |  4 ++--
 doc/guides/nics/bnxt.rst                            |  2 +-
 doc/guides/nics/fail_safe.rst                       |  3 ---
 doc/guides/prog_guide/env_abstraction_layer.rst     |  6 +++---
 doc/guides/prog_guide/event_ethernet_rx_adapter.rst |  2 +-
 doc/guides/prog_guide/glossary.rst                  |  8 ++++----
 doc/guides/rel_notes/release_20_08.rst              |  7 ++++++-
 doc/guides/sample_app_ug/bbdev_app.rst              |  2 +-
 doc/guides/sample_app_ug/ethtool.rst                |  4 ++--
 doc/guides/sample_app_ug/hello_world.rst            |  8 ++++----
 doc/guides/sample_app_ug/ioat.rst                   | 12 ++++++------
 doc/guides/sample_app_ug/ip_pipeline.rst            |  4 ++--
 doc/guides/sample_app_ug/keep_alive.rst             |  2 +-
 doc/guides/sample_app_ug/l2_forward_event.rst       |  4 ++--
 .../sample_app_ug/l2_forward_real_virtual.rst       |  4 ++--
 doc/guides/sample_app_ug/l3_forward_graph.rst       |  6 +++---
 doc/guides/sample_app_ug/l3_forward_power_man.rst   |  2 +-
 doc/guides/sample_app_ug/link_status_intr.rst       |  4 ++--
 doc/guides/sample_app_ug/multi_process.rst          |  6 +++---
 doc/guides/sample_app_ug/packet_ordering.rst        |  8 ++++----
 doc/guides/sample_app_ug/performance_thread.rst     |  6 +++---
 doc/guides/sample_app_ug/qos_scheduler.rst          |  4 ++--
 doc/guides/sample_app_ug/timer.rst                  | 13 +++++++------
 doc/guides/testpmd_app_ug/run_app.rst               |  2 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst         |  2 +-
 28 files changed, 69 insertions(+), 66 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst
index 4efde93f6af0..321d54438f7d 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -334,7 +334,7 @@ For example:
 	typedef int (lcore_function_t)(void *);
 
 	/* launch a function of lcore_function_t type */
-	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned slave_id);
+	int rte_eal_remote_launch(lcore_function_t *f, void *arg, unsigned id);
 
 
 C Indentation
diff --git a/doc/guides/faq/faq.rst b/doc/guides/faq/faq.rst
index f19c1389b6af..cb5f35923d64 100644
--- a/doc/guides/faq/faq.rst
+++ b/doc/guides/faq/faq.rst
@@ -42,13 +42,13 @@ I am running a 32-bit DPDK application on a NUMA system, and sometimes the appli
 If your system has a lot (>1 GB size) of hugepage memory, not all of it will be allocated.
 Due to hugepages typically being allocated on a local NUMA node, the hugepages allocation the application gets during the initialization depends on which
 NUMA node it is running on (the EAL does not affinitize cores until much later in the initialization process).
-Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK master core and
+Sometimes, the Linux OS runs the DPDK application on a core that is located on a different NUMA node from DPDK initial core and
 therefore all the hugepages are allocated on the wrong socket.
 
 To avoid this scenario, either lower the amount of hugepage memory available to 1 GB size (or less), or run the application with taskset
-affinitizing the application to a would-be master core.
+affinitizing the application to a would-be initial core.
 
-For example, if your EAL coremask is 0xff0, the master core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
+For example, if your EAL coremask is 0xff0, the initial core will usually be the first core in the coremask (0x10); this is what you have to supply to taskset::
 
    taskset 0x10 ./l2fwd -l 4-11 -n 2
 
diff --git a/doc/guides/howto/debug_troubleshoot.rst b/doc/guides/howto/debug_troubleshoot.rst
index cef016b2fef4..fdeaabe62206 100644
--- a/doc/guides/howto/debug_troubleshoot.rst
+++ b/doc/guides/howto/debug_troubleshoot.rst
@@ -311,7 +311,7 @@ Custom worker function :numref:`dtg_distributor_worker`.
      SERVICE. Check performance functions are mapped to run on the cores.
 
    * For high-performance execution logic ensure running it on correct NUMA
-     and non-master core.
+     and worker core.
 
    * Analyze run logic with ``rte_dump_stack``, ``rte_dump_registers`` and
      ``rte_memdump`` for more insights.
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe44579689b..ca7508fb423e 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -33,9 +33,9 @@ Lcore-related options
     At a given instance only one core option ``--lcores``, ``-l`` or ``-c`` can
     be used.
 
-*   ``--master-lcore <core ID>``
+*   ``--initial-lcore <core ID>``
 
-    Core ID that is used as master.
+    Core ID that is used as initial lcore.
 
 *   ``-s <service core mask>``
 
diff --git a/doc/guides/nics/bnxt.rst b/doc/guides/nics/bnxt.rst
index a53cdad21d34..6a7314a91627 100644
--- a/doc/guides/nics/bnxt.rst
+++ b/doc/guides/nics/bnxt.rst
@@ -385,7 +385,7 @@ The application enables multiple TX and RX queues when it is started.
 
 .. code-block:: console
 
-    testpmd -l 1,3,5 --master-lcore 1 --txq=2 –rxq=2 --nb-cores=2
+    testpmd -l 1,3,5 --initial-lcore 1 --txq=2 –rxq=2 --nb-cores=2
 
 **TSS**
 
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index b4a92f663b17..3b15d6f0743d 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -236,9 +236,6 @@ Upkeep round
     (brought down or up accordingly). Additionally, any sub-device marked for
     removal is cleaned-up.
 
-Slave
-    In the context of the fail-safe PMD, synonymous to sub-device.
-
 Sub-device
     A device being utilized by the fail-safe PMD.
     This is another PMD running underneath the fail-safe PMD.
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 48a2fec066db..463245463c52 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -64,7 +64,7 @@ It consist of calls to the pthread library (more specifically, pthread_self(), p
 .. note::
 
     Initialization of objects, such as memory zones, rings, memory pools, lpm tables and hash tables,
-    should be done as part of the overall application initialization on the master lcore.
+    should be done as part of the overall application initialization on the initial lcore.
     The creation and initialization functions for these objects are not multi-thread safe.
     However, once initialized, the objects themselves can safely be used in multiple threads simultaneously.
 
@@ -186,7 +186,7 @@ very dependent on the memory allocation patterns of the application.
 
 Additional restrictions are present when running in 32-bit mode. In dynamic
 memory mode, by default maximum of 2 gigabytes of VA space will be preallocated,
-and all of it will be on master lcore NUMA node unless ``--socket-mem`` flag is
+and all of it will be on initial lcore NUMA node unless ``--socket-mem`` flag is
 used.
 
 In legacy mode, VA space will only be preallocated for segments that were
@@ -603,7 +603,7 @@ controlled with tools like taskset (Linux) or cpuset (FreeBSD),
 - with affinity restricted to 2-4, the Control Threads will end up on
   CPU 4.
 - with affinity restricted to 2-3, the Control Threads will end up on
-  CPU 2 (master lcore, which is the default when no CPU is available).
+  CPU 2 (initial lcore, which is the default when no CPU is available).
 
 .. _known_issue_label:
 
diff --git a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
index c7dda92215ea..5d015fa2d678 100644
--- a/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
+++ b/doc/guides/prog_guide/event_ethernet_rx_adapter.rst
@@ -172,7 +172,7 @@ converts the received packets to events in the same manner as packets
 received on a polled Rx queue. The interrupt thread is affinitized to the same
 CPUs as the lcores of the Rx adapter service function, if the Rx adapter
 service function has not been mapped to any lcores, the interrupt thread
-is mapped to the master lcore.
+is mapped to the initial lcore.
 
 Rx Callback for SW Rx Adapter
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst
index 21063a414729..3716efd13da2 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -124,9 +124,9 @@ LAN
 LPM
    Longest Prefix Match
 
-master lcore
+initial lcore
    The execution unit that executes the main() function and that launches
-   other lcores.
+   other lcores. Described in older versions as master lcore.
 
 mbuf
    An mbuf is a data structure used internally to carry messages (mainly
@@ -184,8 +184,8 @@ RTE
 Rx
    Reception
 
-Slave lcore
-   Any *lcore* that is not the *master lcore*.
+Worker lcore
+   Any *lcore* that is not the *initial lcore*.
 
 Socket
    A physical CPU, that includes several *cores*.
diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 5cbc4ce14446..ecbceb0d05e3 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -107,6 +107,9 @@ New Features
   * Dump ``rte_flow`` memory consumption.
   * Measure packet per second forwarding.
 
+* **Renamed master lcore to initial lcore.**
+
+  The name given to the first thread in DPDK is changed from master lcore to initial lcore.
 
 Removed Items
 -------------
@@ -122,7 +125,6 @@ Removed Items
 
 * Removed ``RTE_KDRV_NONE`` based PCI device driver probing.
 
-
 API Changes
 -----------
 
@@ -143,6 +145,9 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* ``rte_get_master_lcore`` was renamed to ``rte_get_initial_lcore``
+  The old function is deprecated and will be removed in future release.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index 405e706a46e4..5917d52ca199 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -94,7 +94,7 @@ device gets linked to a corresponding ethernet port as whitelisted by
 the parameter -w.
 3 cores are allocated to the application, and assigned as:
 
- - core 3 is the master and used to print the stats live on screen,
+ - core 3 is the initial and used to print the stats live on screen,
 
  - core 4 is the encoding lcore performing Rx and Turbo Encode operations
 
diff --git a/doc/guides/sample_app_ug/ethtool.rst b/doc/guides/sample_app_ug/ethtool.rst
index 8f7fc6ca66c0..a4b92255c266 100644
--- a/doc/guides/sample_app_ug/ethtool.rst
+++ b/doc/guides/sample_app_ug/ethtool.rst
@@ -64,8 +64,8 @@ Explanation
 -----------
 
 The sample program has two parts: A background `packet reflector`_
-that runs on a slave core, and a foreground `Ethtool Shell`_ that
-runs on the master core. These are described below.
+that runs on a worker core, and a foreground `Ethtool Shell`_ that
+runs on the initial core. These are described below.
 
 Packet Reflector
 ~~~~~~~~~~~~~~~~
diff --git a/doc/guides/sample_app_ug/hello_world.rst b/doc/guides/sample_app_ug/hello_world.rst
index 46f997a7dce3..f6740b10e385 100644
--- a/doc/guides/sample_app_ug/hello_world.rst
+++ b/doc/guides/sample_app_ug/hello_world.rst
@@ -75,13 +75,13 @@ The code that launches the function on each lcore is as follows:
 
 .. code-block:: c
 
-    /* call lcore_hello() on every slave lcore */
+    /* call lcore_hello() on every worker lcore */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
        rte_eal_remote_launch(lcore_hello, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     lcore_hello(NULL);
 
@@ -89,6 +89,6 @@ The following code is equivalent and simpler:
 
 .. code-block:: c
 
-    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_MASTER);
+    rte_eal_mp_remote_launch(lcore_hello, NULL, CALL_INITIAL);
 
 Refer to the *DPDK API Reference* for detailed information on the rte_eal_mp_remote_launch() function.
diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst
index bab7654b8d4d..c75b91bfa989 100644
--- a/doc/guides/sample_app_ug/ioat.rst
+++ b/doc/guides/sample_app_ug/ioat.rst
@@ -69,13 +69,13 @@ provided parameters. The app can use up to 2 lcores: one of them receives
 incoming traffic and makes a copy of each packet. The second lcore then
 updates MAC address and sends the copy. If one lcore per port is used,
 both operations are done sequentially. For each configuration an additional
-lcore is needed since the master lcore does not handle traffic but is
+lcore is needed since the initial lcore does not handle traffic but is
 responsible for configuration, statistics printing and safe shutdown of
 all ports and devices.
 
 The application can use a maximum of 8 ports.
 
-To run the application in a Linux environment with 3 lcores (the master lcore,
+To run the application in a Linux environment with 3 lcores (the initial lcore,
 plus two forwarding cores), a single port (port 0), software copying and MAC
 updating issue the command:
 
@@ -83,7 +83,7 @@ updating issue the command:
 
     $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw
 
-To run the application in a Linux environment with 2 lcores (the master lcore,
+To run the application in a Linux environment with 2 lcores (the initial lcore,
 plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC
 updating issue the command:
 
@@ -208,7 +208,7 @@ After that each port application assigns resources needed.
     cfg.nb_lcores = rte_lcore_count() - 1;
     if (cfg.nb_lcores < 1)
         rte_exit(EXIT_FAILURE,
-            "There should be at least one slave lcore.\n");
+            "There should be at least one worker lcore.\n");
 
     ret = 0;
 
@@ -310,8 +310,8 @@ If initialization is successful, memory for hardware device
 statistics is allocated.
 
 Finally ``main()`` function starts all packet handling lcores and starts
-printing stats in a loop on the master lcore. The application can be
-interrupted and closed using ``Ctrl-C``. The master lcore waits for
+printing stats in a loop on the initial lcore. The application can be
+interrupted and closed using ``Ctrl-C``. The initial lcore waits for
 all slave processes to finish, deallocates resources and exits.
 
 The processing lcores launching function are described below.
diff --git a/doc/guides/sample_app_ug/ip_pipeline.rst b/doc/guides/sample_app_ug/ip_pipeline.rst
index 56014be17458..f395027b3498 100644
--- a/doc/guides/sample_app_ug/ip_pipeline.rst
+++ b/doc/guides/sample_app_ug/ip_pipeline.rst
@@ -122,7 +122,7 @@ is displayed and the application is terminated.
 Run-time
 ~~~~~~~~
 
-The master thread is creating and managing all the application objects based on CLI input.
+The initial thread is creating and managing all the application objects based on CLI input.
 
 Each data plane thread runs one or several pipelines previously assigned to it in round-robin order. Each data plane thread
 executes two tasks in time-sharing mode:
@@ -130,7 +130,7 @@ executes two tasks in time-sharing mode:
 1. *Packet processing task*: Process bursts of input packets read from the pipeline input ports.
 
 2. *Message handling task*: Periodically, the data plane thread pauses the packet processing task and polls for request
-   messages send by the master thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
+   messages send by the initial thread. Examples: add/remove pipeline to/from current data plane thread, add/delete rules
    to/from given table of a specific pipeline owned by the current data plane thread, read statistics, etc.
 
 Examples
diff --git a/doc/guides/sample_app_ug/keep_alive.rst b/doc/guides/sample_app_ug/keep_alive.rst
index 865ba69e5c47..bca5df8ba934 100644
--- a/doc/guides/sample_app_ug/keep_alive.rst
+++ b/doc/guides/sample_app_ug/keep_alive.rst
@@ -16,7 +16,7 @@ Overview
 --------
 
 The application demonstrates how to protect against 'silent outages'
-on packet processing cores. A Keep Alive Monitor Agent Core (master)
+on packet processing cores. A Keep Alive Monitor Agent Core (initial)
 monitors the state of packet processing cores (worker cores) by
 dispatching pings at a regular time interval (default is 5ms) and
 monitoring the state of the cores. Cores states are: Alive, MIA, Dead
diff --git a/doc/guides/sample_app_ug/l2_forward_event.rst b/doc/guides/sample_app_ug/l2_forward_event.rst
index d536eee819d0..f384420cf1f0 100644
--- a/doc/guides/sample_app_ug/l2_forward_event.rst
+++ b/doc/guides/sample_app_ug/l2_forward_event.rst
@@ -630,8 +630,8 @@ not many packets to send, however it improves performance:
 
                         /* if timer has reached its timeout */
                         if (unlikely(timer_tsc >= timer_period)) {
-                                /* do this only on master core */
-                                if (lcore_id == rte_get_master_lcore()) {
+                                /* do this only on initial core */
+                                if (lcore_id == rte_get_initial_lcore()) {
                                         print_stats();
                                         /* reset the timer */
                                         timer_tsc = 0;
diff --git a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
index 671d0c7c19d4..615a55c36db9 100644
--- a/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
+++ b/doc/guides/sample_app_ug/l2_forward_real_virtual.rst
@@ -440,9 +440,9 @@ however it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/l3_forward_graph.rst b/doc/guides/sample_app_ug/l3_forward_graph.rst
index df50827bab86..4ac96fc0c2f7 100644
--- a/doc/guides/sample_app_ug/l3_forward_graph.rst
+++ b/doc/guides/sample_app_ug/l3_forward_graph.rst
@@ -22,7 +22,7 @@ Run-time path is main thing that differs from L3 forwarding sample application.
 Difference is that forwarding logic starting from Rx, followed by LPM lookup,
 TTL update and finally Tx is implemented inside graph nodes. These nodes are
 interconnected in graph framework. Application main loop needs to walk over
-graph using ``rte_graph_walk()`` with graph objects created one per slave lcore.
+graph using ``rte_graph_walk()`` with graph objects created one per worker lcore.
 
 The lookup method is as per implementation of ``ip4_lookup`` graph node.
 The ID of the output interface for the input packet is the next hop returned by
@@ -265,7 +265,7 @@ headers will be provided run-time using ``rte_node_ip4_route_add()`` and
     Since currently ``ip4_lookup`` and ``ip4_rewrite`` nodes don't support
     lock-less mechanisms(RCU, etc) to add run-time forwarding data like route and
     rewrite data, forwarding data is added before packet processing loop is
-    launched on slave lcore.
+    launched on worker lcore.
 
 .. code-block:: c
 
@@ -297,7 +297,7 @@ Packet Forwarding using Graph Walk
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Now that all the device configurations are done, graph creations are done and
-forwarding data is updated with nodes, slave lcores will be launched with graph
+forwarding data is updated with nodes, worker lcores will be launched with graph
 main loop. Graph main loop is very simple in the sense that it needs to
 continuously call a non-blocking API ``rte_graph_walk()`` with it's lcore
 specific graph object that was already created.
diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst
index 0cc6f2e62e75..f20502c41a37 100644
--- a/doc/guides/sample_app_ug/l3_forward_power_man.rst
+++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst
@@ -441,7 +441,7 @@ The telemetry mode support for ``l3fwd-power`` is a standalone mode, in this mod
 ``l3fwd-power`` does simple l3fwding along with calculating empty polls, full polls,
 and busy percentage for each forwarding core. The aggregation of these
 values of all cores is reported as application level telemetry to metric
-library for every 500ms from the master core.
+library for every 500ms from the initial core.
 
 The busy percentage is calculated by recording the poll_count
 and when the count reaches a defined value the total
diff --git a/doc/guides/sample_app_ug/link_status_intr.rst b/doc/guides/sample_app_ug/link_status_intr.rst
index 04c40f28540d..e31fd2cc7368 100644
--- a/doc/guides/sample_app_ug/link_status_intr.rst
+++ b/doc/guides/sample_app_ug/link_status_intr.rst
@@ -401,9 +401,9 @@ However, it improves performance:
             /* if timer has reached its timeout */
 
             if (unlikely(timer_tsc >= (uint64_t) timer_period)) {
-                /* do this only on master core */
+                /* do this only on initial core */
 
-                if (lcore_id == rte_get_master_lcore()) {
+                if (lcore_id == rte_get_initial_lcore()) {
                     print_stats();
 
                     /* reset the timer */
diff --git a/doc/guides/sample_app_ug/multi_process.rst b/doc/guides/sample_app_ug/multi_process.rst
index f2a79a639763..51b8db5cf75a 100644
--- a/doc/guides/sample_app_ug/multi_process.rst
+++ b/doc/guides/sample_app_ug/multi_process.rst
@@ -66,7 +66,7 @@ The process should start successfully and display a command prompt as follows:
 
     EAL: check igb_uio module
     EAL: check module finished
-    EAL: Master core 0 is ready (tid=54e41820)
+    EAL: Initial core 0 is ready (tid=54e41820)
     EAL: Core 1 is ready (tid=53b32700)
 
     Starting core 1
@@ -92,7 +92,7 @@ At any stage, either process can be terminated using the quit command.
 
 .. code-block:: console
 
-   EAL: Master core 10 is ready (tid=b5f89820)           EAL: Master core 8 is ready (tid=864a3820)
+   EAL: Initial core 10 is ready (tid=b5f89820)           EAL: Initial core 8 is ready (tid=864a3820)
    EAL: Core 11 is ready (tid=84ffe700)                  EAL: Core 9 is ready (tid=85995700)
    Starting core 11                                      Starting core 9
    simple_mp > send hello_secondary                      simple_mp > core 9: Received 'hello_secondary'
@@ -273,7 +273,7 @@ In addition to the EAL parameters, the application- specific parameters are:
 
 .. note::
 
-    In the server process, a single thread, the master thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
+    In the server process, a single thread, the initial thread, that is, the lowest numbered lcore in the coremask/corelist, performs all packet I/O.
     If a coremask/corelist is specified with more than a single lcore bit set in it,
     an additional lcore will be used for a thread to periodically print packet count statistics.
 
diff --git a/doc/guides/sample_app_ug/packet_ordering.rst b/doc/guides/sample_app_ug/packet_ordering.rst
index 1c8ee5d04071..e82938bd7c9c 100644
--- a/doc/guides/sample_app_ug/packet_ordering.rst
+++ b/doc/guides/sample_app_ug/packet_ordering.rst
@@ -12,14 +12,14 @@ Overview
 
 The application uses at least three CPU cores:
 
-* RX core (maser core) receives traffic from the NIC ports and feeds Worker
+* RX core (initial core) receives traffic from the NIC ports and feeds Worker
   cores with traffic through SW queues.
 
-* Worker core (slave core) basically do some light work on the packet.
+* Worker cores basically do some light work on the packet.
   Currently it modifies the output port of the packet for configurations with
   more than one port enabled.
 
-* TX Core (slave core) receives traffic from Worker cores through software queues,
+* TX Core receives traffic from Worker cores through software queues,
   inserts out-of-order packets into reorder buffer, extracts ordered packets
   from the reorder buffer and sends them to the NIC ports for transmission.
 
@@ -46,7 +46,7 @@ The application execution command line is:
     ./packet_ordering [EAL options] -- -p PORTMASK [--disable-reorder] [--insight-worker]
 
 The -c EAL CPU_COREMASK option has to contain at least 3 CPU cores.
-The first CPU core in the core mask is the master core and would be assigned to
+The first CPU core in the core mask is the initial core and would be assigned to
 RX core, the last to TX core and the rest to Worker cores.
 
 The PORTMASK parameter must contain either 1 or even enabled port numbers.
diff --git a/doc/guides/sample_app_ug/performance_thread.rst b/doc/guides/sample_app_ug/performance_thread.rst
index b04d0ba444af..29105f9708eb 100644
--- a/doc/guides/sample_app_ug/performance_thread.rst
+++ b/doc/guides/sample_app_ug/performance_thread.rst
@@ -280,8 +280,8 @@ functionality into different threads, and the pairs of RX and TX threads are
 interconnected via software rings.
 
 On initialization an L-thread scheduler is started on every EAL thread. On all
-but the master EAL thread only a dummy L-thread is initially started.
-The L-thread started on the master EAL thread then spawns other L-threads on
+but the initial EAL thread only a dummy L-thread is initially started.
+The L-thread started on the initial EAL thread then spawns other L-threads on
 different L-thread schedulers according the command line parameters.
 
 The RX threads poll the network interface queues and post received packets
@@ -1217,5 +1217,5 @@ Setting ``LTHREAD_DIAG`` also enables counting of statistics about cache and
 queue usage, and these statistics can be displayed by calling the function
 ``lthread_diag_stats_display()``. This function also performs a consistency
 check on the caches and queues. The function should only be called from the
-master EAL thread after all slave threads have stopped and returned to the C
+initial EAL thread after all worker threads have stopped and returned to the C
 main program, otherwise the consistency check will fail.
diff --git a/doc/guides/sample_app_ug/qos_scheduler.rst b/doc/guides/sample_app_ug/qos_scheduler.rst
index b5010657a7d8..345ecbb5905d 100644
--- a/doc/guides/sample_app_ug/qos_scheduler.rst
+++ b/doc/guides/sample_app_ug/qos_scheduler.rst
@@ -71,7 +71,7 @@ Optional application parameters include:
     In this mode, the application shows a command line that can be used for obtaining statistics while
     scheduling is taking place (see interactive mode below for more information).
 
-*   --mst n: Master core index (the default value is 1).
+*   --mst n: Initial core index (the default value is 1).
 
 *   --rsz "A, B, C": Ring sizes:
 
@@ -329,7 +329,7 @@ Another example with 2 packet flow configurations using different ports but shar
 Note that independent cores for the packet flow configurations for each of the RX, WT and TX thread are also supported,
 providing flexibility to balance the work.
 
-The EAL coremask/corelist is constrained to contain the default mastercore 1 and the RX, WT and TX cores only.
+The EAL coremask/corelist is constrained to contain the default initial lcore 1 and the RX, WT and TX cores only.
 
 Explanation
 -----------
diff --git a/doc/guides/sample_app_ug/timer.rst b/doc/guides/sample_app_ug/timer.rst
index 98d762d2388c..59a8ab11e9b6 100644
--- a/doc/guides/sample_app_ug/timer.rst
+++ b/doc/guides/sample_app_ug/timer.rst
@@ -49,17 +49,18 @@ In addition to EAL initialization, the timer subsystem must be initialized, by c
     rte_timer_subsystem_init();
 
 After timer creation (see the next paragraph),
-the main loop is executed on each slave lcore using the well-known rte_eal_remote_launch() and also on the master.
+the main loop is executed on each worker lcore using the well-known rte_eal_remote_launch() and
+also on the initial lcore.
 
 .. code-block:: c
 
-    /* call lcore_mainloop() on every slave lcore  */
+    /* call lcore_mainloop() on every worker lcore  */
 
-    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
+    RTE_LCORE_FOREACH_WORKER(lcore_id) {
         rte_eal_remote_launch(lcore_mainloop, NULL, lcore_id);
     }
 
-    /* call it on master lcore too */
+    /* call it on initial lcore too */
 
     (void) lcore_mainloop(NULL);
 
@@ -105,7 +106,7 @@ This call to rte_timer_init() is necessary before doing any other operation on t
 
 Then, the two timers are configured:
 
-*   The first timer (timer0) is loaded on the master lcore and expires every second.
+*   The first timer (timer0) is loaded on the initial lcore and expires every second.
     Since the PERIODICAL flag is provided, the timer is reloaded automatically by the timer subsystem.
     The callback function is timer0_cb().
 
@@ -115,7 +116,7 @@ Then, the two timers are configured:
 
 .. code-block:: c
 
-    /* load timer0, every second, on master lcore, reloaded automatically */
+    /* load timer0, every second, on initial lcore, reloaded automatically */
 
     hz = rte_get_hpet_hz();
 
diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst
index f169604752b8..7d6b81de7f46 100644
--- a/doc/guides/testpmd_app_ug/run_app.rst
+++ b/doc/guides/testpmd_app_ug/run_app.rst
@@ -71,7 +71,7 @@ The command line options are:
 *   ``--coremask=0xXX``
 
     Set the hexadecimal bitmask of the cores running the packet forwarding test.
-    The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+    The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 *   ``--portmask=0xXX``
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index a808b6a308f2..7d4db1140092 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -692,7 +692,7 @@ This is equivalent to the ``--coremask`` command-line option.
 
 .. note::
 
-   The master lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
+   The initial lcore is reserved for command line parsing only and cannot be masked on for packet forwarding.
 
 set portmask
 ~~~~~~~~~~~~
-- 
2.26.2


^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension
@ 2020-07-02  6:19  4% pbhagavatula
  2020-07-02  6:19  4% ` [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer " pbhagavatula
  0 siblings, 1 reply; 200+ results
From: pbhagavatula @ 2020-07-02  6:19 UTC (permalink / raw)
  To: jerinj, Abhinandan Gujjar, Nikhil Rao, Erik Gabriel Carrillo
  Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Reserve space in event device configuration structures as increasing their
size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, reserve some space.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/librte_eventdev/rte_event_crypto_adapter.h |  4 ++++
 lib/librte_eventdev/rte_event_eth_rx_adapter.h |  8 ++++++++
 lib/librte_eventdev/rte_event_eth_tx_adapter.h |  4 ++++
 lib/librte_eventdev/rte_event_timer_adapter.h  |  8 ++++++++
 lib/librte_eventdev/rte_eventdev.h             | 16 ++++++++++++++++
 5 files changed, 40 insertions(+)

diff --git a/lib/librte_eventdev/rte_event_crypto_adapter.h b/lib/librte_eventdev/rte_event_crypto_adapter.h
index 60630ef66..c471ece79 100644
--- a/lib/librte_eventdev/rte_event_crypto_adapter.h
+++ b/lib/librte_eventdev/rte_event_crypto_adapter.h
@@ -250,6 +250,10 @@ struct rte_event_crypto_adapter_conf {
 	 * max_nb crypto ops. This isn't treated as a requirement; batching
 	 * may cause the adapter to process more than max_nb crypto ops.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_eth_rx_adapter.h b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
index 2dd259c27..d10f632e9 100644
--- a/lib/librte_eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_rx_adapter.h
@@ -112,6 +112,10 @@ struct rte_event_eth_rx_adapter_conf {
 	 * max_nb_rx mbufs. This isn't treated as a requirement; batching may
 	 * cause the adapter to process more than max_nb_rx mbufs.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -171,6 +175,10 @@ struct rte_event_eth_rx_adapter_queue_conf {
 	 * The event adapter sets ev.event_type to RTE_EVENT_TYPE_ETHDEV in the
 	 * enqueued event.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_eth_tx_adapter.h b/lib/librte_eventdev/rte_event_eth_tx_adapter.h
index 8c5954716..442e54da4 100644
--- a/lib/librte_eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/librte_eventdev/rte_event_eth_tx_adapter.h
@@ -97,6 +97,10 @@ struct rte_event_eth_tx_adapter_conf {
 	 * max_nb_tx mbufs. This isn't treated as a requirement; batching may
 	 * cause the adapter to process more than max_nb_tx mbufs.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.h b/lib/librte_eventdev/rte_event_timer_adapter.h
index d2ebcb090..f83d85f4d 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter.h
@@ -171,6 +171,10 @@ struct rte_event_timer_adapter_conf {
 	/**< Total number of timers per adapter */
 	uint64_t flags;
 	/**< Timer adapter config flags (RTE_EVENT_TIMER_ADAPTER_F_*) */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -268,6 +272,10 @@ struct rte_event_timer_adapter_info {
 	/**< Event timer adapter capabilities */
 	int16_t event_dev_port_id;
 	/**< Event device port ID, if applicable */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 7dc832353..1effeed58 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -387,6 +387,10 @@ struct rte_event_dev_info {
 	 */
 	uint32_t event_dev_cap;
 	/**< Event device capabilities(RTE_EVENT_DEV_CAP_)*/
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -494,6 +498,10 @@ struct rte_event_dev_config {
 	 */
 	uint32_t event_dev_cfg;
 	/**< Event device config flags(RTE_EVENT_DEV_CFG_)*/
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -574,6 +582,10 @@ struct rte_event_queue_conf {
 	 * event device supported priority value.
 	 * Valid when the device has RTE_EVENT_DEV_CAP_QUEUE_QOS capability
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
@@ -705,6 +717,10 @@ struct rte_event_port_conf {
 	 * RTE_EVENT_OP_FORWARD. Must be false when the device is not
 	 * RTE_EVENT_DEV_CAP_IMPLICIT_RELEASE_DISABLE capable.
 	 */
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields. */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields. */
 };
 
 /**
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer structs for extension
  2020-07-02  6:19  4% [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension pbhagavatula
@ 2020-07-02  6:19  4% ` pbhagavatula
  0 siblings, 0 replies; 200+ results
From: pbhagavatula @ 2020-07-02  6:19 UTC (permalink / raw)
  To: jerinj, Erik Gabriel Carrillo; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

The struct rte_event_timer_adapter and rte_event_timer_adapter_data are
supposed to be used internally only, but there is a chance that
increasing their size would break ABI for some applications.
In order to allow smooth addition of features without breaking
ABI compatibility, reserve some space.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 lib/librte_eventdev/rte_event_timer_adapter.h     | 5 +++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.h b/lib/librte_eventdev/rte_event_timer_adapter.h
index f83d85f4d..ce57a990a 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter.h
@@ -529,6 +529,11 @@ struct rte_event_timer_adapter {
 	RTE_STD_C11
 	uint8_t allocated : 1;
 	/**< Flag to indicate that this adapter has been allocated */
+
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields */
 } __rte_cache_aligned;
 
 #define ADAPTER_VALID_OR_ERR_RET(adapter, retval) do {		\
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
index cf3509dc6..0a6682833 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -105,6 +105,11 @@ struct rte_event_timer_adapter_data {
 	RTE_STD_C11
 	uint8_t started : 1;
 	/**< Flag to indicate adapter started. */
+
+	uint64_t reserved_64s[4];
+	/**< Reserved for future fields */
+	void *reserved_ptrs[4];
+	/**< Reserved for future fields */
 } __rte_cache_aligned;
 
 #ifdef __cplusplus
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 8/9] devtools: support python3 only
  @ 2020-07-02 10:37  4%   ` Louise Kilheeney
  0 siblings, 0 replies; 200+ results
From: Louise Kilheeney @ 2020-07-02 10:37 UTC (permalink / raw)
  To: dev
  Cc: robin.jarry, anatoly.burakov, bruce.richardson, Louise Kilheeney,
	Neil Horman, Ray Kinsella

Changed script to explicitly use python3 only to avoid
maintaining python 2.

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Ray Kinsella <mdr@ashroe.eu>

Signed-off-by: Louise Kilheeney <louise.kilheeney@intel.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/update_version_map_abi.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/devtools/update_version_map_abi.py b/devtools/update_version_map_abi.py
index e2104e61e..830e6c58c 100755
--- a/devtools/update_version_map_abi.py
+++ b/devtools/update_version_map_abi.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2019 Intel Corporation
 
@@ -9,7 +9,6 @@
 from the devtools/update-abi.sh utility.
 """
 
-from __future__ import print_function
 import argparse
 import sys
 import re
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] DPDK Release Status Meeting 2/07/2020
@ 2020-07-02 14:58  4% Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-02 14:58 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

Minutes 2 July 2020
-------------------

Agenda:
* Release Dates
* Highlights
* Subtrees
* LTS


Participants:
* Arm
* Debian/Microsoft
* Intel
* Marvell
* Nvidia
* NXP
* Red Hat


Release Dates
-------------

* v20.08 dates:
  * -rc1:           Wednesday, 8 July   2020
  * -rc2:           Monday,   20 July   2020
  * Release:        Tuesday,   4 August 2020

* v20.11 proposal dates, please comment:
  * Proposal/V1:    Wednesday, 2 September 2020
  * -rc1:           Wednesday, 30 September 2020
  * -rc2:           Friday, 16 October 2020
  * Release:        Friday, 6 November 2020


Highlights
----------

* We are close to -rc1 but still lots of patches in backlog waiting for review
  *Please help on code reviews*, missing code reviews may lead some features
  missing the release.
  Please check "call for reviews" email for the list of patches to review:
  https://mails.dpdk.org/archives/announce/2020-June/000329.html

* Please subscribe to patchwork to be able to update status of your patches,
  not updating them adding overhead to the maintainers.
  * We are observing an issue in Intel, that not receiving patchwork
    registration and lost password emails.
    * If there is anyone else outside Intel having the same problem please reach
      out to help analyzing the problem.
    * Within Intel please reach to Ferruh if there are patches their status
      needs update in patchwork and you don't have the access.


Subtrees
--------

* main
  * Started to merge ring and vfio patches
  * Would like to close following
    * non-EAL threads as lcore from David
    * rte_log registration usage improvement from Jerin
    * if-proxy
      * Stephen reviewed the patch
    * regex
      * Waiting for PMD implementations. How many PMD required for merge?
      * A HW and two SW PMDs were planned
  * Worrying that ethdev doesn't have enough review
    * Jerin did review on some rte flow ones

* next-net
  * Pulled from vendor sub-trees
  * Some big base update patches from Intel and bnxt merged
  * Need to get ethdev patches before -rc1, requires more review

* next-crypto
  * Reviewed half of the backlog
  * Will be good for -rc1
  * cryptodev patches has been reviewed

* next-eventdev
  * Almost ready for -rc1
  * Intel DLB PMD new version still has ABI breakage
    * Postponed to next release because of the ABI
    * No controversial issues otherwise

* next-virtio
  * Maxime did a pull request for majority of the patches
  * Maxime sent a status for remaining ones
    * 2 patches for async datapath, looks good
    * 2 patches for vhost-user protocol features
      * Has a dependency to quemu
      * Adrian from Red Hat will takeover the patches
    * Performance optimization (loops vectorization)
      * Waiting for new version
      * Not critical for this release, may be postponed if needed
  * Chenbo is managing the virtio patches during Maxime's absence

* next-net-intel
  * Qi is actively merging patches
  * Some base code updates already merged
  * DCF datapath merged

* next-net-mlx
  * Some patches already merged
  * Expecting more but not many

* next-net-mrvl
  * A few patches merged
  * Two more patches for -rc1
  * change requested for qede patches, can merge when they are ready


LTS
---

* v18.11.9-rc2 is out, please test
  * https://mails.dpdk.org/archives/dev/2020-June/171690.html
  * OvS testing reported an issue
    * A workaround can exist for it
  * Nvidia reported an error
    * Which is not regression for 18.11.9 release
  * The release is planned on end of this week or early next week



DPDK Release Status Meetings
============================

The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to
"John McNamara <john.mcnamara@intel.com>" for the invite.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-06-30 12:14  0%           ` Jerin Jacob
@ 2020-07-02 15:21  0%             ` Kinsella, Ray
  2020-07-02 16:35  3%               ` McDaniel, Timothy
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-02 15:21 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Tim McDaniel, Neil Horman, Jerin Jacob, Mattias Rönnblom,
	dpdk-dev, Gage Eads, Van Haaren, Harry



On 30/06/2020 13:14, Jerin Jacob wrote:
> On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>>
>>
>> On 30/06/2020 12:30, Jerin Jacob wrote:
>>> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>>
>>>>
>>>>
>>>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>>>> +
>>>>>> +/** Event port configuration structure */
>>>>>> +struct rte_event_port_conf_v20 {
>>>>>> +       int32_t new_event_threshold;
>>>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>>>> +        * Use for *closed system* event dev where event capacity is limited,
>>>>>> +        * and cannot exceed the capacity of the event dev.
>>>>>> +        * Configuring ports with different thresholds can make higher priority
>>>>>> +        * traffic less likely to  be backpressured.
>>>>>> +        * For example, a port used to inject NIC Rx packets into the event dev
>>>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>>>> +        * while ports used for worker pools can have a higher threshold.
>>>>>> +        * This value cannot exceed the *nb_events_limit*
>>>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>>>> +        * This should be set to '-1' for *open system*.
>>>>>> +        */
>>>>>> +       uint16_t dequeue_depth;
>>>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>>>> +        */
>>>>>> +       uint16_t enqueue_depth;
>>>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE capable.
>>>>>> +        */
>>>>>>         uint8_t disable_implicit_release;
>>>>>>         /**< Configure the port not to release outstanding events in
>>>>>>          * rte_event_dev_dequeue_burst(). If true, all events received through
>>>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>>>                                 struct rte_event_port_conf *port_conf);
>>>>>>
>>>>>> +int
>>>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>>>> +
>>>>>> +int
>>>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>>>> +                                     struct rte_event_port_conf *port_conf);
>>>>>
>>>>> Hi Timothy,
>>>>>
>>>>> + ABI Maintainers (Ray, Neil)
>>>>>
>>>>> # As per my understanding, the structures can not be versioned, only
>>>>> function can be versioned.
>>>>> i.e we can not make any change to " struct rte_event_port_conf"
>>>>
>>>> So the answer is (as always): depends
>>>>
>>>> If the structure is being use in inline functions is when you run into trouble
>>>> - as knowledge of the structure is embedded in the linked application.
>>>>
>>>> However if the structure is _strictly_ being used as a non-inlined function parameter,
>>>> It can be safe to version in this way.
>>>
>>> But based on the optimization applied when building the consumer code
>>> matters. Right?
>>> i.e compiler can "inline" it, based on the optimization even the
>>> source code explicitly mentions it.
>>
>> Well a compiler will typically only inline within the confines of a given object file, or
>> binary, if LTO is enabled.
> 
>>
>> If a function symbol is exported from library however, it won't be inlined in a linked application.
> 
> Yes, With respect to that function.
> But the application can use struct rte_event_port_conf in their code
> and it can be part of other structures.
> Right?

Tim, it looks like you might be inadvertently breaking other symbols also.
For example ... 

int
rte_event_crypto_adapter_create(uint8_t id, uint8_t dev_id,
                                struct rte_event_port_conf *port_config,
                                enum rte_event_crypto_adapter_mode mode);

int
rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
                     const struct rte_event_port_conf *port_conf);

These and others symbols are also using rte_event_port_conf and would need to be updated to use the v20 struct,
if we aren't to break them . 

> 
> 
>> The compiler doesn't have enough information to inline it.
>> All the compiler will know about it is it's offset in memory, and it's signature.
>>
>>>
>>>
>>>>
>>>> So just to be clear, it is still the function that is actually being versioned here.
>>>>
>>>>>
>>>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>>>> http://patches.dpdk.org/patch/69113/
>>>>
>>>> Yes - I spent a why looking at this one, but I am struggling to recall,
>>>> why when I looked it we didn't suggest function versioning as a potential solution in this case.
>>>>
>>>> Looking back at it now, looks like it would have been ok.
>>>
>>> Ok.
>>>
>>>>
>>>>>
>>>>> Regarding the API changes:
>>>>> # The slow path changes general looks good to me. I will review the
>>>>> next level in the coming days
>>>>> # The following fast path changes bothers to me. Could you share more
>>>>> details on below change?
>>>>>
>>>>> diff --git a/app/test-eventdev/test_order_atq.c
>>>>> b/app/test-eventdev/test_order_atq.c
>>>>> index 3366cfc..8246b96 100644
>>>>> --- a/app/test-eventdev/test_order_atq.c
>>>>> +++ b/app/test-eventdev/test_order_atq.c
>>>>> @@ -34,6 +34,8 @@
>>>>>                         continue;
>>>>>                 }
>>>>>
>>>>> +               ev.flow_id = ev.mbuf->udata64;
>>>>> +
>>>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>>>> now and sort out ABI stuffs.
>>>>> # Other concern is eventdev spec get bloated with versioning files
>>>>> just for ONE release as 20.11 will be OK to change the ABI.
>>>>> # While we discuss the API change, Please send deprecation notice for
>>>>> ABI change for 20.11,
>>>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
  2020-07-02 15:21  0%             ` Kinsella, Ray
@ 2020-07-02 16:35  3%               ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2020-07-02 16:35 UTC (permalink / raw)
  To: Kinsella, Ray, Jerin Jacob
  Cc: Neil Horman, Jerin Jacob, Mattias Rönnblom, dpdk-dev, Eads,
	Gage, Van Haaren, Harry

>-----Original Message-----
>From: Kinsella, Ray <mdr@ashroe.eu>
>Sent: Thursday, July 2, 2020 10:21 AM
>To: Jerin Jacob <jerinjacobk@gmail.com>
>Cc: McDaniel, Timothy <timothy.mcdaniel@intel.com>; Neil Horman
><nhorman@tuxdriver.com>; Jerin Jacob <jerinj@marvell.com>; Mattias
>Rönnblom <mattias.ronnblom@ericsson.com>; dpdk-dev <dev@dpdk.org>; Eads,
>Gage <gage.eads@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>
>Subject: Re: [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites
>
>
>
>On 30/06/2020 13:14, Jerin Jacob wrote:
>> On Tue, Jun 30, 2020 at 5:06 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>
>>>
>>>
>>> On 30/06/2020 12:30, Jerin Jacob wrote:
>>>> On Tue, Jun 30, 2020 at 4:52 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 27/06/2020 08:44, Jerin Jacob wrote:
>>>>>>> +
>>>>>>> +/** Event port configuration structure */
>>>>>>> +struct rte_event_port_conf_v20 {
>>>>>>> +       int32_t new_event_threshold;
>>>>>>> +       /**< A backpressure threshold for new event enqueues on this port.
>>>>>>> +        * Use for *closed system* event dev where event capacity is
>limited,
>>>>>>> +        * and cannot exceed the capacity of the event dev.
>>>>>>> +        * Configuring ports with different thresholds can make higher
>priority
>>>>>>> +        * traffic less likely to  be backpressured.
>>>>>>> +        * For example, a port used to inject NIC Rx packets into the event
>dev
>>>>>>> +        * can have a lower threshold so as not to overwhelm the device,
>>>>>>> +        * while ports used for worker pools can have a higher threshold.
>>>>>>> +        * This value cannot exceed the *nb_events_limit*
>>>>>>> +        * which was previously supplied to rte_event_dev_configure().
>>>>>>> +        * This should be set to '-1' for *open system*.
>>>>>>> +        */
>>>>>>> +       uint16_t dequeue_depth;
>>>>>>> +       /**< Configure number of bulk dequeues for this event port.
>>>>>>> +        * This value cannot exceed the *nb_event_port_dequeue_depth*
>>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>>>>>>> +        */
>>>>>>> +       uint16_t enqueue_depth;
>>>>>>> +       /**< Configure number of bulk enqueues for this event port.
>>>>>>> +        * This value cannot exceed the *nb_event_port_enqueue_depth*
>>>>>>> +        * which previously supplied to rte_event_dev_configure().
>>>>>>> +        * Ignored when device is not RTE_EVENT_DEV_CAP_BURST_MODE
>capable.
>>>>>>> +        */
>>>>>>>         uint8_t disable_implicit_release;
>>>>>>>         /**< Configure the port not to release outstanding events in
>>>>>>>          * rte_event_dev_dequeue_burst(). If true, all events received
>through
>>>>>>> @@ -733,6 +911,14 @@ struct rte_event_port_conf {
>>>>>>>  rte_event_port_default_conf_get(uint8_t dev_id, uint8_t port_id,
>>>>>>>                                 struct rte_event_port_conf *port_conf);
>>>>>>>
>>>>>>> +int
>>>>>>> +rte_event_port_default_conf_get_v20(uint8_t dev_id, uint8_t port_id,
>>>>>>> +                               struct rte_event_port_conf_v20 *port_conf);
>>>>>>> +
>>>>>>> +int
>>>>>>> +rte_event_port_default_conf_get_v21(uint8_t dev_id, uint8_t port_id,
>>>>>>> +                                     struct rte_event_port_conf *port_conf);
>>>>>>
>>>>>> Hi Timothy,
>>>>>>
>>>>>> + ABI Maintainers (Ray, Neil)
>>>>>>
>>>>>> # As per my understanding, the structures can not be versioned, only
>>>>>> function can be versioned.
>>>>>> i.e we can not make any change to " struct rte_event_port_conf"
>>>>>
>>>>> So the answer is (as always): depends
>>>>>
>>>>> If the structure is being use in inline functions is when you run into trouble
>>>>> - as knowledge of the structure is embedded in the linked application.
>>>>>
>>>>> However if the structure is _strictly_ being used as a non-inlined function
>parameter,
>>>>> It can be safe to version in this way.
>>>>
>>>> But based on the optimization applied when building the consumer code
>>>> matters. Right?
>>>> i.e compiler can "inline" it, based on the optimization even the
>>>> source code explicitly mentions it.
>>>
>>> Well a compiler will typically only inline within the confines of a given object
>file, or
>>> binary, if LTO is enabled.
>>
>>>
>>> If a function symbol is exported from library however, it won't be inlined in a
>linked application.
>>
>> Yes, With respect to that function.
>> But the application can use struct rte_event_port_conf in their code
>> and it can be part of other structures.
>> Right?
>
>Tim, it looks like you might be inadvertently breaking other symbols also.
>For example ...
>
>int
>rte_event_crypto_adapter_create(uint8_t id, uint8_t dev_id,
>                                struct rte_event_port_conf *port_config,
>                                enum rte_event_crypto_adapter_mode mode);
>
>int
>rte_event_port_setup(uint8_t dev_id, uint8_t port_id,
>                     const struct rte_event_port_conf *port_conf);
>
>These and others symbols are also using rte_event_port_conf and would need to
>be updated to use the v20 struct,
>if we aren't to break them .
>

Yes, we just discovered that after successfully running the ABI checker. I will address those in the v3
patch set.  Thanks.

>>
>>
>>> The compiler doesn't have enough information to inline it.
>>> All the compiler will know about it is it's offset in memory, and it's signature.
>>>
>>>>
>>>>
>>>>>
>>>>> So just to be clear, it is still the function that is actually being versioned
>here.
>>>>>
>>>>>>
>>>>>> # We have a similar case with ethdev and it deferred to next release v20.11
>>>>>> http://patches.dpdk.org/patch/69113/
>>>>>
>>>>> Yes - I spent a why looking at this one, but I am struggling to recall,
>>>>> why when I looked it we didn't suggest function versioning as a potential
>solution in this case.
>>>>>
>>>>> Looking back at it now, looks like it would have been ok.
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>>>
>>>>>> Regarding the API changes:
>>>>>> # The slow path changes general looks good to me. I will review the
>>>>>> next level in the coming days
>>>>>> # The following fast path changes bothers to me. Could you share more
>>>>>> details on below change?
>>>>>>
>>>>>> diff --git a/app/test-eventdev/test_order_atq.c
>>>>>> b/app/test-eventdev/test_order_atq.c
>>>>>> index 3366cfc..8246b96 100644
>>>>>> --- a/app/test-eventdev/test_order_atq.c
>>>>>> +++ b/app/test-eventdev/test_order_atq.c
>>>>>> @@ -34,6 +34,8 @@
>>>>>>                         continue;
>>>>>>                 }
>>>>>>
>>>>>> +               ev.flow_id = ev.mbuf->udata64;
>>>>>> +
>>>>>> # Since RC1 is near, I am not sure how to accommodate the API changes
>>>>>> now and sort out ABI stuffs.
>>>>>> # Other concern is eventdev spec get bloated with versioning files
>>>>>> just for ONE release as 20.11 will be OK to change the ABI.
>>>>>> # While we discuss the API change, Please send deprecation notice for
>>>>>> ABI change for 20.11,
>>>>>> so that there is no ambiguity of this patch for the 20.11 release.
>>>>>>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-06-30  7:35  3%   ` Akhil Goyal
@ 2020-07-02 17:54  0%     ` Akhil Goyal
  2020-07-02 18:02  3%       ` Chautru, Nicolas
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2020-07-02 17:54 UTC (permalink / raw)
  To: David Marchand, Nicolas Chautru; +Cc: dev, Thomas Monjalon


> 
> >
> > Hello Nicolas,
> >
> > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > <nicolas.chautru@intel.com> wrote:
> > >
> > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > and remove experimental tag.
> > > Sending now to advertise and get any feedback.
> > > Some manual rebase will be required later on notably as the
> > > actual release note which is not there yet.
> >
> > Cool that we want to stabilize this API.
> > My concern is that we have drivers from a single vendor.
> > I would hate to see a new vendor unable to submit a driver (or having
> > to wait until the next ABI breakage window) because of the current
> > API/ABI.
> >
> >
> 
> +1 from my side. I am not sure how much it is acceptable for all the
> vendors/customers.
> It is not reviewed by most of the vendors who may support in future.
> It is not good to remove experimental tag as we have a long 1 year cycle to
> break the API/ABI.
> 
Moving the patch as deferred in patchworks.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-07-02 17:54  0%     ` Akhil Goyal
@ 2020-07-02 18:02  3%       ` Chautru, Nicolas
  2020-07-02 18:09  4%         ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Chautru, Nicolas @ 2020-07-02 18:02 UTC (permalink / raw)
  To: Akhil Goyal, David Marchand; +Cc: dev, Thomas Monjalon

> From: Akhil Goyal <akhil.goyal@nxp.com>
> > > Hello Nicolas,
> > >
> > > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > > <nicolas.chautru@intel.com> wrote:
> > > >
> > > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > > and remove experimental tag.
> > > > Sending now to advertise and get any feedback.
> > > > Some manual rebase will be required later on notably as the actual
> > > > release note which is not there yet.
> > >
> > > Cool that we want to stabilize this API.
> > > My concern is that we have drivers from a single vendor.
> > > I would hate to see a new vendor unable to submit a driver (or
> > > having to wait until the next ABI breakage window) because of the
> > > current API/ABI.
> > >
> > >
> >
> > +1 from my side. I am not sure how much it is acceptable for all the
> > vendors/customers.
> > It is not reviewed by most of the vendors who may support in future.
> > It is not good to remove experimental tag as we have a long 1 year
> > cycle to break the API/ABI.
> >
> Moving the patch as deferred in patchworks.

That is fine and all good discussion. 
We know of another vendor who plan to release a bbdev driver but probably after 20.11.
There is one extra capability they will need exposed, we will aim to have the API is updated prior to that.
Assuming the API get updated between now and 20.11, is there still room to remove experimental tag in 20.11 or the expectation is to wait regardless for a full stable cycle and only intercept ABI v22 in  21.11?

Thanks
Nic


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API
  2020-07-02 18:02  3%       ` Chautru, Nicolas
@ 2020-07-02 18:09  4%         ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2020-07-02 18:09 UTC (permalink / raw)
  To: Chautru, Nicolas, David Marchand; +Cc: dev, Thomas Monjalon

> 
> > From: Akhil Goyal <akhil.goyal@nxp.com>
> > > > Hello Nicolas,
> > > >
> > > > On Sat, Jun 27, 2020 at 1:14 AM Nicolas Chautru
> > > > <nicolas.chautru@intel.com> wrote:
> > > > >
> > > > > Planning to move bbdev API to stable from 20.11 (ABI version 21)
> > > > > and remove experimental tag.
> > > > > Sending now to advertise and get any feedback.
> > > > > Some manual rebase will be required later on notably as the actual
> > > > > release note which is not there yet.
> > > >
> > > > Cool that we want to stabilize this API.
> > > > My concern is that we have drivers from a single vendor.
> > > > I would hate to see a new vendor unable to submit a driver (or
> > > > having to wait until the next ABI breakage window) because of the
> > > > current API/ABI.
> > > >
> > > >
> > >
> > > +1 from my side. I am not sure how much it is acceptable for all the
> > > vendors/customers.
> > > It is not reviewed by most of the vendors who may support in future.
> > > It is not good to remove experimental tag as we have a long 1 year
> > > cycle to break the API/ABI.
> > >
> > Moving the patch as deferred in patchworks.
> 
> That is fine and all good discussion.
> We know of another vendor who plan to release a bbdev driver but probably
> after 20.11.
> There is one extra capability they will need exposed, we will aim to have the API
> is updated prior to that.
> Assuming the API get updated between now and 20.11, is there still room to
> remove experimental tag in 20.11 or the expectation is to wait regardless for a
> full stable cycle and only intercept ABI v22 in  21.11?
> 
I think ABI v22 in 21.11 would be good to move this to stable so that if there are changes
In the ABI when a new vendor PMD comes up, they can be incorporated.
And as the world is evolving towards 5G, there may be multiple vendors and ABI may change.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] devtools: remove useless files from ABI reference
  @ 2020-07-03  9:08  4%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-07-03  9:08 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Bruce Richardson

On Thu, May 28, 2020 at 3:16 PM David Marchand
<david.marchand@redhat.com> wrote:
> On Sun, May 24, 2020 at 7:43 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > When building an ABI reference with meson, some static libraries
> > are built and linked in apps. They are useless and take a lot of space.
> > Those binaries, and other useless files (examples and doc files)
> > in the share/ directory, are removed after being installed.
> >
> > In order to save time when building the ABI reference,
> > the examples (which are not installed anyway) are not compiled.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> Acked-by: David Marchand <david.marchand@redhat.com>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v17 0/2] support for VFIO-PCI VF token interface
  @ 2020-07-03 14:57  4% ` Haiyue Wang
  0 siblings, 0 replies; 200+ results
From: Haiyue Wang @ 2020-07-03 14:57 UTC (permalink / raw)
  To: dev, anatoly.burakov, thomas, jerinj, david.marchand, arybchenko
  Cc: Haiyue Wang

v17: Rebase for new EAL config API, update the commit message and doc.

v16: Rebase the patch for 20.08 release note.

v15: Add the missed EXPERIMENTAL warning for API doxgen.

v14: Rebase the patch for 20.08 release note.

v13: Rename the EAL get VF token function, and leave the freebsd type as empty.

v12: support to vfio devices with VF token and no token.

v11: Use the eal parameter to pass the VF token, then not every PCI
     device needs to be specified with this token. Also no ABI issue
     now.

v10: Use the __rte_internal to mark the internal API changing.

v9: Rewrite the document.

v8: Update the document.

v7: Add the Fixes tag in uuid, the release note and help
    document.

v6: Drop the Fixes tag in uuid, since the file has been
    moved to another place, not suitable to apply on stable.
    And this is not a bug, just some kind of enhancement.

v5: 1. Add the VF token parse error handling.
    2. Split into two patches for different logic module.
    3. Add more comments into the code for explaining the design.
    4. Drop the ABI change workaround, this patch set focuses on code review.

v4: 1. Ignore rte_vfio_setup_device ABI check since it is
       for Linux driver use.

v3: Fix the Travis build failed:
           (1). rte_uuid.h:97:55: error: unknown type name ‘size_t’
           (2). rte_uuid.h:58:2: error: implicit declaration of function ‘memcpy’

v2: Fix the FreeBSD build error.

v1: Update the commit message.

RFC v2:
         Based on Vamsi's RFC v1, and Alex's patch for Qemu
        [https://lore.kernel.org/lkml/20200204161737.34696b91@w520.home/]: 
       Use the devarg to pass-down the VF token.

RFC v1: https://patchwork.dpdk.org/patch/66281/ by Vamsi.

Haiyue Wang (2):
  eal: add uuid dependent header files explicitly
  eal: support for VFIO-PCI VF token

 doc/guides/linux_gsg/linux_drivers.rst        | 35 ++++++++++++++++++-
 doc/guides/linux_gsg/linux_eal_parameters.rst |  4 +++
 doc/guides/rel_notes/release_20_08.rst        |  6 ++++
 lib/librte_eal/common/eal_common_options.c    |  3 ++
 lib/librte_eal/common/eal_internal_cfg.h      |  2 ++
 lib/librte_eal/common/eal_options.h           |  2 ++
 lib/librte_eal/freebsd/eal.c                  |  5 +++
 lib/librte_eal/include/rte_eal.h              | 14 ++++++++
 lib/librte_eal/include/rte_uuid.h             |  2 ++
 lib/librte_eal/linux/eal.c                    | 33 +++++++++++++++++
 lib/librte_eal/linux/eal_vfio.c               | 19 ++++++++++
 lib/librte_eal/rte_eal_version.map            |  3 ++
 12 files changed, 127 insertions(+), 1 deletion(-)

-- 
2.27.0


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
  @ 2020-07-03 15:38  3% ` David Marchand
  2020-07-06  8:03  3%   ` Phil Yang
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-07-03 15:38 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, Olivier Matz, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd

On Thu, Jun 11, 2020 at 12:26 PM Phil Yang <phil.yang@arm.com> wrote:
>
> Use c11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

I did not look at the details, but this patch is refused by the ABI
check in Travis.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  @ 2020-07-03 16:16  4%   ` Kinsella, Ray
  2020-07-03 18:46  3%     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-03 16:16 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_reset API that have been around
> for 4 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 3 ---
>  lib/librte_ring/rte_ring_version.map | 4 +---
>  2 files changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index f67141482..7181c33b4 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>   *
>   * This function flush all the elements in a ring
>   *
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * @warning
>   * Make sure the ring is not in use while calling this function.
>   *
>   * @param r
>   *   A pointer to the ring structure.
>   */
> -__rte_experimental
>  void
>  rte_ring_reset(struct rte_ring *r);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index e88c143cf..aec6f3820 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -8,6 +8,7 @@ DPDK_20.0 {
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> +	rte_ring_reset;
>  
>  	local: *;
>  };
> @@ -15,9 +16,6 @@ DPDK_20.0 {
>  EXPERIMENTAL {
>  	global:
>  
> -	# added in 19.08
> -	rte_ring_reset;
> -
>  	# added in 20.02
>  	rte_ring_create_elem;
>  	rte_ring_get_memsize_elem;

So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0 ABI.

The way to solve is to add it the DPDK_21 ABI in the map file.
And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental if necessary. 

https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-macros

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs
  @ 2020-07-03 16:17  3%   ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-03 16:17 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 11:26, Feifei Wang wrote:
> Remove the experimental tag for rte_ring_xxx_elem APIs that have been
> around for 2 releases.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h           | 5 +----
>  lib/librte_ring/rte_ring_elem.h      | 8 --------
>  lib/librte_ring/rte_ring_version.map | 9 ++-------
>  3 files changed, 3 insertions(+), 19 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> index 7181c33b4..35f3f8c42 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -40,6 +40,7 @@ extern "C" {
>  #endif
>  
>  #include <rte_ring_core.h>
> +#include <rte_ring_elem.h>
>  
>  /**
>   * Calculate the memory size needed for a ring
> @@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			RTE_RING_SYNC_ST, free_space);
>  }
>  
> -#ifdef ALLOW_EXPERIMENTAL_API
> -#include <rte_ring_elem.h>
> -#endif
> -
>  /**
>   * Enqueue several objects on a ring.
>   *
> diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
> index 9e5192ae6..69dc51746 100644
> --- a/lib/librte_ring/rte_ring_elem.h
> +++ b/lib/librte_ring/rte_ring_elem.h
> @@ -23,9 +23,6 @@ extern "C" {
>  #include <rte_ring_core.h>
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Calculate the memory size needed for a ring with given element size
>   *
>   * This function returns the number of bytes needed for a ring, given
> @@ -43,13 +40,9 @@ extern "C" {
>   *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
>   *		 power of 2.
>   */
> -__rte_experimental
>  ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>  
>  /**
> - * @warning
> - * @b EXPERIMENTAL: this API may change without prior notice
> - *
>   * Create a new ring named *name* that stores elements with given size.
>   *
>   * This function uses ``memzone_reserve()`` to allocate memory. Then it
> @@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
>   *    - EEXIST - a memzone with the same name already exists
>   *    - ENOMEM - no appropriate memory area found in which to create memzone
>   */
> -__rte_experimental
>  struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
>  			unsigned int count, int socket_id, unsigned int flags);
>  
> diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
> index aec6f3820..3030e8edb 100644
> --- a/lib/librte_ring/rte_ring_version.map
> +++ b/lib/librte_ring/rte_ring_version.map
> @@ -2,9 +2,11 @@ DPDK_20.0 {
>  	global:
>  
>  	rte_ring_create;
> +	rte_ring_create_elem;
>  	rte_ring_dump;
>  	rte_ring_free;
>  	rte_ring_get_memsize;
> +	rte_ring_get_memsize_elem;
>  	rte_ring_init;
>  	rte_ring_list_dump;
>  	rte_ring_lookup;
> @@ -13,10 +15,3 @@ DPDK_20.0 {
>  	local: *;
>  };
>  
> -EXPERIMENTAL {
> -	global:
> -
> -	# added in 20.02
> -	rte_ring_create_elem;
> -	rte_ring_get_memsize_elem;
> -};
> 

Same as the last comment.
Rte_ring_get_memsize_elem and rte_ring_create_elem are part of the DPDK_21 ABI.

Ray K

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] doc: add sample for ABI checks in contribution guide
@ 2020-07-03 17:15  4% Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-03 17:15 UTC (permalink / raw)
  To: John McNamara, Marko Kovacevic; +Cc: dev, Ferruh Yigit

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 doc/guides/contributing/patches.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/contributing/patches.rst b/doc/guides/contributing/patches.rst
index 25d97b85b..39ec64ec8 100644
--- a/doc/guides/contributing/patches.rst
+++ b/doc/guides/contributing/patches.rst
@@ -550,6 +550,10 @@ results in a subfolder of the current working directory.
 The environment variable ``DPDK_ABI_REF_DIR`` can be set so that the results go
 to a different location.
 
+Sample::
+
+        DPDK_ABI_REF_VERSION=v19.11 DPDK_ABI_REF_DIR=/tmp ./devtools/test-meson-builds.sh
+
 
 Sending Patches
 ---------------
-- 
2.25.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 16:16  4%   ` Kinsella, Ray
@ 2020-07-03 18:46  3%     ` Honnappa Nagarahalli
  2020-07-06  6:23  3%       ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-03 18:46 UTC (permalink / raw)
  To: Kinsella, Ray, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, Honnappa Nagarahalli, nd

<snip>

> 
> On 03/07/2020 11:26, Feifei Wang wrote:
> > Remove the experimental tag for rte_ring_reset API that have been
> > around for 4 releases.
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h           | 3 ---
> >  lib/librte_ring/rte_ring_version.map | 4 +---
> >  2 files changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index f67141482..7181c33b4 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
> >   *
> >   * This function flush all the elements in a ring
> >   *
> > - * @b EXPERIMENTAL: this API may change without prior notice
> > - *
> >   * @warning
> >   * Make sure the ring is not in use while calling this function.
> >   *
> >   * @param r
> >   *   A pointer to the ring structure.
> >   */
> > -__rte_experimental
> >  void
> >  rte_ring_reset(struct rte_ring *r);
> >
> > diff --git a/lib/librte_ring/rte_ring_version.map
> > b/lib/librte_ring/rte_ring_version.map
> > index e88c143cf..aec6f3820 100644
> > --- a/lib/librte_ring/rte_ring_version.map
> > +++ b/lib/librte_ring/rte_ring_version.map
> > @@ -8,6 +8,7 @@ DPDK_20.0 {
> >  	rte_ring_init;
> >  	rte_ring_list_dump;
> >  	rte_ring_lookup;
> > +	rte_ring_reset;
> >
> >  	local: *;
> >  };
> > @@ -15,9 +16,6 @@ DPDK_20.0 {
> >  EXPERIMENTAL {
> >  	global:
> >
> > -	# added in 19.08
> > -	rte_ring_reset;
> > -
> >  	# added in 20.02
> >  	rte_ring_create_elem;
> >  	rte_ring_get_memsize_elem;
> 
> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
> ABI.
Thanks Ray for clarifying this.

> 
> The way to solve is to add it the DPDK_21 ABI in the map file.
> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
> if necessary.
Is using VERSION_SYMBOL_EXPERIMENTAL a must? The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

> 
> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
> macros

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
    @ 2020-07-04 17:00  3%       ` Ruifeng Wang
  1 sibling, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-04 17:00 UTC (permalink / raw)
  To: David Marchand, Vladimir Medvedkin, Bruce Richardson
  Cc: John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman, dev,
	Ananyev, Konstantin, Honnappa Nagarahalli, nd, nd

Hi David,

Sorry, I missed tracking of this thread.

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Monday, June 29, 2020 7:56 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: John McNamara <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR
> 
> On Mon, Jun 29, 2020 at 10:03 AM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> >
> > Currently, the tbl8 group is freed even though the readers might be
> > using the tbl8 group entries. The freed tbl8 group can be reallocated
> > quickly. This results in incorrect lookup results.
> >
> > RCU QSBR process is integrated for safe tbl8 group reclaim.
> > Refer to RCU documentation to understand various aspects of
> > integrating RCU library into other libraries.
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > ---
> >  doc/guides/prog_guide/lpm_lib.rst  |  32 +++++++
> >  lib/librte_lpm/Makefile            |   2 +-
> >  lib/librte_lpm/meson.build         |   1 +
> >  lib/librte_lpm/rte_lpm.c           | 129 ++++++++++++++++++++++++++---
> >  lib/librte_lpm/rte_lpm.h           |  59 +++++++++++++
> >  lib/librte_lpm/rte_lpm_version.map |   6 ++
> >  6 files changed, 216 insertions(+), 13 deletions(-)
> >
> > diff --git a/doc/guides/prog_guide/lpm_lib.rst
> > b/doc/guides/prog_guide/lpm_lib.rst
> > index 1609a57d0..7cc99044a 100644
> > --- a/doc/guides/prog_guide/lpm_lib.rst
> > +++ b/doc/guides/prog_guide/lpm_lib.rst
> > @@ -145,6 +145,38 @@ depending on whether we need to move to the
> next table or not.
> >  Prefix expansion is one of the keys of this algorithm,  since it
> > improves the speed dramatically by adding redundancy.
> >
> > +Deletion
> > +~~~~~~~~
> > +
> > +When deleting a rule, a replacement rule is searched for. Replacement
> > +rule is an existing rule that has the longest prefix match with the rule to be
> deleted, but has smaller depth.
> > +
> > +If a replacement rule is found, target tbl24 and tbl8 entries are
> > +updated to have the same depth and next hop value with the
> replacement rule.
> > +
> > +If no replacement rule can be found, target tbl24 and tbl8 entries will be
> cleared.
> > +
> > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or
> 32 bits.
> > +
> > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry
> are freed in following cases:
> > +
> > +*   All tbl8s in the group are empty .
> > +
> > +*   All tbl8s in the group have the same values and with depth no greater
> than 24.
> > +
> > +Free of tbl8s have different behaviors:
> > +
> > +*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
> > +
> > +*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
> > +
> > +When the LPM is not using RCU, tbl8 group can be freed immediately
> > +even though the readers might be using the tbl8 group entries. This might
> result in incorrect lookup results.
> > +
> > +RCU QSBR process is integrated for safe tbl8 group reclaimation.
> > +Application has certain responsibilities while using this feature.
> > +Please refer to resource reclaimation framework of :ref:`RCU library
> <RCU_Library>` for more details.
> > +
> 
> Would the lpm6 library benefit from the same?
> Asking as I do not see much code shared between lpm and lpm6.
> 
Didn't look into lpm6. It may need separate integration with RCU since no shared code between lpm and lpm6 as you mentioned.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> > 38ab512a4..41e9c49b8 100644
> > --- a/lib/librte_lpm/rte_lpm.c
> > +++ b/lib/librte_lpm/rte_lpm.c
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #include <string.h>
> > @@ -245,13 +246,84 @@ rte_lpm_free(struct rte_lpm *lpm)
> >                 TAILQ_REMOVE(lpm_list, te, next);
> >
> >         rte_mcfg_tailq_write_unlock();
> > -
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       if (lpm->dq)
> > +               rte_rcu_qsbr_dq_delete(lpm->dq); #endif
> 
> All DPDK code under lib/ is compiled with the ALLOW_EXPERIMENTAL_API
> flag set.
> There is no need to protect against this flag in rte_lpm.c.
> 
OK, I see. So DPDK libraries will always be compiled with the ALLOW_EXPERIMENTAL_API. It is application's 
choice to use experimental APIs. 
Will update in next version to remove the ALLOW_EXPERIMENTAL_API flag from rte_lpm.c and only keep the one in rte_lpm.h.

> [...]
> 
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> 
> > @@ -130,6 +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> This is more a comment/question for the lpm maintainers.
> 
> Afaics, the rte_lpm structure is exported/public because of lookup which is
> inlined.
> But most of the structure can be hidden and stored in a private structure that
> would embed the exposed rte_lpm.
> The slowpath functions would only have to translate from publicly exposed
> to internal representation (via container_of).
> 
> This patch could do this and be the first step to hide the unneeded exposure
> of other fields (later/in 20.11 ?).
> 
To hide most of the structure is reasonable. 
Since it will break ABI, I can do that in 20.11.

> Thoughts?
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item
    @ 2020-07-05 13:13  0%       ` Andrew Rybchenko
  1 sibling, 0 replies; 200+ results
From: Andrew Rybchenko @ 2020-07-05 13:13 UTC (permalink / raw)
  To: Adrien Mazarguil, Ori Kam
  Cc: Dekel Peled, ferruh.yigit, john.mcnamara, marko.kovacevic,
	Asaf Penso, Matan Azrad, Eli Britstein, dev, Ivan Malov

On 6/2/20 10:04 PM, Adrien Mazarguil wrote:
> Hi Ori, Andrew, Delek,
> 
> (been a while eh?)
> 
> On Tue, Jun 02, 2020 at 06:28:41PM +0000, Ori Kam wrote:
>> Hi Andrew,
>>
>> PSB,
> [...]
>>>> diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
>>>> index b0e4199..3bc8ce1 100644
>>>> --- a/lib/librte_ethdev/rte_flow.h
>>>> +++ b/lib/librte_ethdev/rte_flow.h
>>>> @@ -787,6 +787,8 @@ struct rte_flow_item_ipv4 {
>>>>   */
>>>>  struct rte_flow_item_ipv6 {
>>>>  	struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
>>>> +	uint32_t is_frag:1; /**< Is IPv6 packet fragmented/non-fragmented. */
>>>> +	uint32_t reserved:31; /**< Reserved, must be zero. */
>>>
>>> The solution is simple, but hardly generic and adds an
>>> example for the future extensions. I doubt that it is a
>>> right way to go.
>>>
>> I agree with you that this is not the most generic way possible,
>> but the IPV6 extensions are very unique. So the solution is also unique.
>> In general, I'm always in favor of finding the most generic way, but sometimes
>> it is better to keep things simple, and see how it goes.
> 
> Same feeling here, it doesn't look right.
> 
>>> May be we should add 256-bit string with one bit for each
>>> IP protocol number and apply it to extension headers only?
>>> If bit A is set in the mask:
>>>  - if bit A is set in spec as well, extension header with
>>>    IP protocol (1 << A) number must present
>>>  - if bit A is clear in spec, extension header with
>>>    IP protocol (1 << A) number must absent
>>> If bit is clear in the mask, corresponding extension header
>>> may present and may absent (i.e. don't care).
>>>
>> There are only 12 possible extension headers and currently none of them
>> are supported in rte_flow. So adding a logic to parse the 256 just to get a max of 12 
>> possible values is an overkill. Also, if we disregard the case of the extension, 
>> the application must select only one next proto. For example, the application
>> can't select udp + tcp. There is the option to add a flag for each of the
>> possible extensions, does it makes more sense to you?
> 
> Each of these extension headers has its own structure, we first need the
> ability to match them properly by adding the necessary pattern items.
> 
>>> The RFC indirectly touches IPv6 proto (next header) matching
>>> logic.
>>>
>>> If logic used in ETH+VLAN is applied on IPv6 as well, it would
>>> make pattern specification and handling complicated. E.g.:
>>>   eth / ipv6 / udp / end
>>> should match UDP over IPv6 without any extension headers only.
>>>
>> The issue with VLAN I agree is different since by definition VLAN is 
>> layer 2.5. We can add the same logic also to the VLAN case, maybe it will
>> be easier. 
>> In any case, in your example above and according to the RFC we will
>> get all ipv6 udp traffic with and without extensions.
>>
>>> And how to specify UPD over IPv6 regardless extension headers?
>>
>> Please see above the rule will be eth / ipv6 /udp.
>>
>>>   eth / ipv6 / ipv6_ext / udp / end
>>> with a convention that ipv6_ext is optional if spec and mask
>>> are NULL (or mask is empty).
>>>
>> I would guess that this flow should match all ipv6 that has one ext and the next 
>> proto is udp.
> 
> In my opinion RTE_FLOW_ITEM_TYPE_IPV6_EXT is a bit useless on its own. It's
> only for matching packets that contain some kind of extension header, not a
> specific one, more about that below.
> 
>>> I'm wondering if any driver treats it this way?
>>>
>> I'm not sure, we can support only the frag ext by default, but if required we can support other 
>> ext.
>>  
>>> I agree that the problem really comes when we'd like match
>>> IPv6 frags or even worse not fragments.
>>>
>>> Two patterns for fragments:
>>>   eth / ipv6 (proto=FRAGMENT) / end
>>>   eth / ipv6 / ipv6_ext (next_hdr=FRAGMENT) / end
>>>
>>> Any sensible solution for not-fragments with any other
>>> extension headers?
>>>
>> The one propose in this mail 😊 
>>
>>> INVERT exists, but hardly useful, since it simply says
>>> that patches which do not match pattern without INVERT
>>> matches the pattern with INVERT and
>>>   invert / eth / ipv6 (proto=FRAGMENT) / end
>>> will match ARP, IPv4, IPv6 with an extension header before
>>> fragment header and so on.
>>>
>> I agree with you, INVERT in this doesn’t help.
>> We were considering adding some kind of not mask / item per item.
>> some think around this line:
>> user request ipv6 unfragmented udp packets. The flow would look something
>> like this:
>> Eth / ipv6 / Not (Ipv6.proto = frag_proto) / udp
>> But it makes the rules much harder to use, and I don't think that there
>> is any HW that support not, and adding such feature to all items is overkill.
>>
>>  
>>> Bit string suggested above will allow to match:
>>>  - UDP over IPv6 with any extension headers:
>>>     eth / ipv6 (ext_hdrs mask empty) / udp / end
>>>  - UDP over IPv6 without any extension headers:
>>>     eth / ipv6 (ext_hdrs mask full, spec empty) / udp / end
>>>  - UDP over IPv6 without fragment header:
>>>     eth / ipv6 (ext.spec & ~FRAGMENT, ext.mask | FRAGMENT) / udp / end
>>>  - UDP over IPv6 with fragment header
>>>     eth / ipv6 (ext.spec | FRAGMENT, ext.mask | FRAGMENT) / udp / end
>>>
>>> where FRAGMENT is 1 << IPPROTO_FRAGMENT.
>>>
>> Please see my response regarding this above.
>>
>>> Above I intentionally keep 'proto' unspecified in ipv6
>>> since otherwise it would specify the next header after IPv6
>>> header.
>>>
>>> Extension headers mask should be empty by default.
> 
> This is a deliberate design choice/issue with rte_flow: an empty pattern
> matches everything; adding items only narrows the selection. As Andrew said
> there is currently no way to provide a specific item to reject, it can only
> be done globally on a pattern through INVERT that no PMD implements so far.
> 
> So we have two requirements here: the ability to specifically match IPv6
> fragment headers and the ability to reject them.
> 

I think that one of key requirements here is an ability to say
that an extension header may be anywhere (or it must be no
extension header anywhere), since specification using a pattern
item suggests specified order of items, but it could be other
extension headers before frag header and after it before UDP protocol
header.

> To match IPv6 fragment headers, we need a dedicated pattern item. The
> generic RTE_FLOW_ITEM_TYPE_IPV6_EXT is useless for that on its own, it must
> be completed with RTE_FLOW_ITEM_TYPE_IPV6_EXT_FRAG and associated object
> to match individual fields if needed (like all the others
> protocols/headers).
> 

Yes, I agree, but it is strictly required if we want to match
on fragment header content or see it in exact place in next
protocols chain.

> Then to reject a pattern item... My preference goes to a new "NOT" meta item
> affecting the meaning of the item coming immediately after in the pattern
> list. That would be ultra generic, wouldn't break any ABI/API and like
> INVERT, wouldn't even require a new object associated with it.
> 

Yes, that's true, but I'm not sure if it is easy to do in HW.
Also, *NOT* scope could be per item field in fact, not whole
item. It sounds like it is getting more and more complicated.

> To match UDPv6 traffic when there is no fragment header, one could then do
> something like:
> 
>  eth / ipv6 / not / ipv6_ext_frag / udp
> 
> PMD support would be trivial to implement (I'm sure!)
> 

The problem is an interpretation of the above pattern.
Strictly speaking only UDP packets with exactly one
not frag extension header match the pattern.
What about packets without any extension headers?
Or packet with two (more) extension headers when the first
one is not frag header?

> We may later implement other kinds of "operator" items as Andrew suggested,
> for bit-wise stuff and so on. Let's keep adding features on a needed basis
> though.
> 

Thanks,
Andrew.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup
  @ 2020-07-05 19:55  3%   ` Thomas Monjalon
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
  2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2020-07-05 19:55 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, honnappa.nagarahalli, techboard, Jiayu Hu, Yipeng Wang,
	Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta, Hemant Agrawal

+Cc maintainers of the problematic libraries:
	- librte_fib
	- librte_rib
	- librte_gro
	- librte_member
	- librte_rawdev

26/06/2020 10:16, David Marchand:
> Following discussions on the mailing list and the 05/20 TB meeting, here
> is a series that drops the special versioning for non stable libraries.
> 
> Two notes:
> 
> - RIB/FIB library is not referenced in the API doxygen index, is this
>   intentional?

Vladimir please, could you fix the miss in the doxygen index?

> - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
>   announced as experimental while their functions are part of the 20
>   stable ABI (in .map files + no __rte_experimental marking).
>   Their fate must be discussed.

I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
They are probably already considered stable for a lot of users.
Maintainers, are you OK to follow the ABI compatibility rules
for these libraries? Do you feel these libraries are mature enough?




^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on Windows
  @ 2020-07-05 20:23  4%     ` Thomas Monjalon
  2020-07-06  7:02  0%       ` Fady Bader
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-05 20:23 UTC (permalink / raw)
  To: Fady Bader
  Cc: dev, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

05/07/2020 15:47, Fady Bader:
> Function versioning implementation is not supported by Windows.
> Function versioning was disabled on Windows.

was -> is

> Signed-off-by: Fady Bader <fady@mellanox.com>
> ---
>  lib/librte_eal/include/rte_function_versioning.h | 2 +-
>  lib/meson.build                                  | 5 +++++
>  2 files changed, 6 insertions(+), 1 deletion(-)

As suggested by Ray, we should add a note in the documentation
about the ABI compatibility. Because we have no function versioning,
we cannot ensure ABI compatibility on Windows.

I recommend adding this text in doc/guides/windows_gsg/intro.rst
under "Limitations":
"
The :doc:`../contributing/abi_policy` cannot be respected for Windows.
Minor ABI versions may be incompatible
because function versioning is not supported on Windows.
"

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-03 18:46  3%     ` Honnappa Nagarahalli
@ 2020-07-06  6:23  3%       ` Kinsella, Ray
  2020-07-07  3:19  3%         ` Feifei Wang
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-06  6:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Feifei Wang, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> On 03/07/2020 11:26, Feifei Wang wrote:
>>> Remove the experimental tag for rte_ring_reset API that have been
>>> around for 4 releases.
>>>
>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>> ---
>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>
>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>> index f67141482..7181c33b4 100644
>>> --- a/lib/librte_ring/rte_ring.h
>>> +++ b/lib/librte_ring/rte_ring.h
>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
>>>   *
>>>   * This function flush all the elements in a ring
>>>   *
>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>> - *
>>>   * @warning
>>>   * Make sure the ring is not in use while calling this function.
>>>   *
>>>   * @param r
>>>   *   A pointer to the ring structure.
>>>   */
>>> -__rte_experimental
>>>  void
>>>  rte_ring_reset(struct rte_ring *r);
>>>
>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>> b/lib/librte_ring/rte_ring_version.map
>>> index e88c143cf..aec6f3820 100644
>>> --- a/lib/librte_ring/rte_ring_version.map
>>> +++ b/lib/librte_ring/rte_ring_version.map
>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>  	rte_ring_init;
>>>  	rte_ring_list_dump;
>>>  	rte_ring_lookup;
>>> +	rte_ring_reset;
>>>
>>>  	local: *;
>>>  };
>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>  EXPERIMENTAL {
>>>  	global:
>>>
>>> -	# added in 19.08
>>> -	rte_ring_reset;
>>> -
>>>  	# added in 20.02
>>>  	rte_ring_create_elem;
>>>  	rte_ring_get_memsize_elem;
>>
>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not the v20.0
>> ABI.
> Thanks Ray for clarifying this.
> 
>>
>> The way to solve is to add it the DPDK_21 ABI in the map file.
>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to experimental
>> if necessary.
> Is using VERSION_SYMBOL_EXPERIMENTAL a must? 

Purely at the discretion of the contributor and maintainer. 
If it has been around for a while, applications are using it and changing the symbol will break them.

You may choose to provide the alias or not. 

> The documentation also seems to be vague. It says " The macro is used when a symbol matures to become part of the stable ABI, to provide an alias to experimental for some time". What does 'some time' mean?

"Some time" is a bit vague alright, should be "until the next major ABI version" - I will fix. 

> 
>>
>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioning-
>> macros

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on Windows
  2020-07-05 20:23  4%     ` Thomas Monjalon
@ 2020-07-06  7:02  0%       ` Fady Bader
  0 siblings, 0 replies; 200+ results
From: Fady Bader @ 2020-07-06  7:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Tasnim Bashar, Tal Shnaiderman, Yohad Tor, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Sunday, July 5, 2020 11:24 PM
> To: Fady Bader <fady@mellanox.com>
> Cc: dev@dpdk.org; Tasnim Bashar <tbashar@mellanox.com>; Tal Shnaiderman
> <talshn@mellanox.com>; Yohad Tor <yohadt@mellanox.com>;
> dmitry.kozliuk@gmail.com; harini.ramakrishnan@microsoft.com;
> ocardona@microsoft.com; pallavi.kadam@intel.com; ranjit.menon@intel.com;
> olivier.matz@6wind.com; arybchenko@solarflare.com; mdr@ashroe.eu;
> nhorman@tuxdriver.com
> Subject: Re: [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning on
> Windows
> 
> 05/07/2020 15:47, Fady Bader:
> > Function versioning implementation is not supported by Windows.
> > Function versioning was disabled on Windows.
> 
> was -> is
> 
> > Signed-off-by: Fady Bader <fady@mellanox.com>
> > ---
> >  lib/librte_eal/include/rte_function_versioning.h | 2 +-
> >  lib/meson.build                                  | 5 +++++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> As suggested by Ray, we should add a note in the documentation about the ABI
> compatibility. Because we have no function versioning, we cannot ensure ABI
> compatibility on Windows.
> 
> I recommend adding this text in doc/guides/windows_gsg/intro.rst under
> "Limitations":
> "
> The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> Minor ABI versions may be incompatible
> because function versioning is not supported on Windows.
> "

Ok, I'll send a new patch with the changes soon.

> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-05 19:55  3%   ` Thomas Monjalon
@ 2020-07-06  8:02  3%     ` Bruce Richardson
  2020-07-06  8:12  0%       ` Thomas Monjalon
  2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-06  8:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, dev, honnappa.nagarahalli, techboard, Jiayu Hu,
	Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta,
	Hemant Agrawal

On Sun, Jul 05, 2020 at 09:55:41PM +0200, Thomas Monjalon wrote:
> +Cc maintainers of the problematic libraries:
> 	- librte_fib
> 	- librte_rib
> 	- librte_gro
> 	- librte_member
> 	- librte_rawdev
> 
> 26/06/2020 10:16, David Marchand:
> > Following discussions on the mailing list and the 05/20 TB meeting, here
> > is a series that drops the special versioning for non stable libraries.
> > 
> > Two notes:
> > 
> > - RIB/FIB library is not referenced in the API doxygen index, is this
> >   intentional?
> 
> Vladimir please, could you fix the miss in the doxygen index?
> 
> > - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
> >   announced as experimental while their functions are part of the 20
> >   stable ABI (in .map files + no __rte_experimental marking).
> >   Their fate must be discussed.
> 
> I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> They are probably already considered stable for a lot of users.
> Maintainers, are you OK to follow the ABI compatibility rules
> for these libraries? Do you feel these libraries are mature enough?
>

I think things being added to the official ABI is good. For these, I wonder
if waiting till the 20.11 release is the best time to officially mark them
as stable, rather than doing so now? 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
  2020-07-03 15:38  3% ` David Marchand
@ 2020-07-06  8:03  3%   ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-06  8:03 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Olivier Matz, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang, nd

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, July 3, 2020 11:39 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: dev <dev@dpdk.org>; Olivier Matz <olivier.matz@6wind.com>; David
> Christensen <drc@linux.vnet.ibm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations
> 
> On Thu, Jun 11, 2020 at 12:26 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > Use c11 atomics with explicit ordering instead of rte_atomic ops which
> > enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> 
> I did not look at the details, but this patch is refused by the ABI
> check in Travis.

Thanks, David.
The ABI issue is the name of 'rte_mbuf_ext_shared_info::refcnt_atomic' changed to 'rte_mbuf_ext_shared_info::refcnt' at rte_mbuf_core.h.
I made this change just to simplify the name of the variable.

Revert the 'rte_mbuf_ext_shared_info::refcnt' to refcnt_atomic can fix this issue.
I will update it in v2.

Thanks,
Phil

> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-techboard] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
@ 2020-07-06  8:12  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06  8:12 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: David Marchand, dev, honnappa.nagarahalli, techboard, Jiayu Hu,
	Yipeng Wang, Sameh Gobriel, Vladimir Medvedkin, Nipun Gupta,
	Hemant Agrawal

06/07/2020 10:02, Bruce Richardson:
> On Sun, Jul 05, 2020 at 09:55:41PM +0200, Thomas Monjalon wrote:
> > +Cc maintainers of the problematic libraries:
> > 	- librte_fib
> > 	- librte_rib
> > 	- librte_gro
> > 	- librte_member
> > 	- librte_rawdev
> > 
> > 26/06/2020 10:16, David Marchand:
> > > Following discussions on the mailing list and the 05/20 TB meeting, here
> > > is a series that drops the special versioning for non stable libraries.
> > > 
> > > Two notes:
> > > 
> > > - RIB/FIB library is not referenced in the API doxygen index, is this
> > >   intentional?
> > 
> > Vladimir please, could you fix the miss in the doxygen index?
> > 
> > > - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
> > >   announced as experimental while their functions are part of the 20
> > >   stable ABI (in .map files + no __rte_experimental marking).
> > >   Their fate must be discussed.
> > 
> > I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> > They are probably already considered stable for a lot of users.
> > Maintainers, are you OK to follow the ABI compatibility rules
> > for these libraries? Do you feel these libraries are mature enough?
> >
> 
> I think things being added to the official ABI is good. For these, I wonder
> if waiting till the 20.11 release is the best time to officially mark them
> as stable, rather than doing so now? 

They are already not marked as experimental symbols...
I think we should remove confusion in the MAINTAINERS file.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [pull-request] next-eventdev 20.08 RC1
  @ 2020-07-06  9:57  3% ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06  9:57 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, phil.yang

05/07/2020 05:41, Jerin Jacob Kollanukkaran:
>   http://dpdk.org/git/next/dpdk-next-eventdev
> 
> ----------------------------------------------------------------
> Harman Kalra (1):
>       event/octeontx: fix memory corruption
> 
> Harry van Haaren (1):
>       examples/eventdev_pipeline: fix 32-bit coremask logic
> 
> Pavan Nikhilesh (3):
>       event/octeontx2: fix device reconfigure
>       event/octeontx2: fix sub event type violation
>       event/octeontx2: improve datapath memory locality

Pulled patches above.

> Phil Yang (4):
>       eventdev: fix race condition on timer list counter
>       eventdev: use c11 atomics for lcore timer armed flag
>       eventdev: remove redundant code
>       eventdev: relax smp barriers with c11 atomics

I cannot merge this C11 series because of an ABI breakage:
	http://mails.dpdk.org/archives/test-report/2020-July/140440.html



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  @ 2020-07-06 10:04  4%     ` Thomas Monjalon
  2020-07-06 15:32  0%       ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-06 10:04 UTC (permalink / raw)
  To: Phil Yang
  Cc: erik.g.carrillo, dev, jerinj, Honnappa.Nagarahalli, drc,
	Ruifeng.Wang, Dharmik.Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

02/07/2020 07:26, Phil Yang:
> The implementation-specific opaque data is shared between arm and cancel
> operations. The state flag acts as a guard variable to make sure the
> update of opaque data is synchronized. This patch uses c11 atomics with
> explicit one way memory barrier instead of full barriers rte_smp_w/rmb()
> to synchronize the opaque data between timer arm and cancel threads.

I think we should write C11 (uppercase).

Please, in your explanations, try to be more specific.
Naming fields may help to make things clear.

[...]
> --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> @@ -467,7 +467,7 @@ struct rte_event_timer {
>  	 *  - op: RTE_EVENT_OP_NEW
>  	 *  - event_type: RTE_EVENT_TYPE_TIMER
>  	 */
> -	volatile enum rte_event_timer_state state;
> +	enum rte_event_timer_state state;
>  	/**< State of the event timer. */

Why do you remove the volatile keyword?
It is not explained in the commit log.

This change is triggering a warning in the ABI check:
http://mails.dpdk.org/archives/test-report/2020-July/140440.html
Moving from volatile to non-volatile is probably not an issue.
I expect the code generated for the volatile case to work the same
in non-volatile case. Do you confirm?

In any case, we need an explanation and an ABI check exception.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  @ 2020-07-06 11:32  5%     ` Fady Bader
  2020-07-06 12:22  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Fady Bader @ 2020-07-06 11:32 UTC (permalink / raw)
  To: dev
  Cc: thomas, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

Function versioning implementation is not supported by Windows.
Function versioning is disabled on Windows.

Signed-off-by: Fady Bader <fady@mellanox.com>
---
 doc/guides/windows_gsg/intro.rst | 4 ++++
 lib/meson.build                  | 6 +++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index a0285732df..58c6246404 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
 compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
+
+The :doc:`../contributing/abi_policy` cannot be respected for Windows.
+Minor ABI versions may be incompatible
+because function versioning is not supported on Windows.
diff --git a/lib/meson.build b/lib/meson.build
index c1b9e1633f..dadf151f78 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -107,6 +107,10 @@ foreach l:libraries
 			shared_dep = declare_dependency(include_directories: includes)
 			static_dep = shared_dep
 		else
+			if is_windows and use_function_versioning
+				message('@0@: Function versioning is not supported by Windows.'
+				.format(name))
+			endif
 
 			if use_function_versioning
 				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
@@ -138,7 +142,7 @@ foreach l:libraries
 					include_directories: includes,
 					dependencies: static_deps)
 
-			if not use_function_versioning
+			if not use_function_versioning or is_windows
 				# use pre-build objects to build shared lib
 				sources = []
 				objs += static_lib.extract_all_objects(recursive: false)
-- 
2.16.1.windows.4


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  2020-07-06 11:32  5%     ` [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning " Fady Bader
@ 2020-07-06 12:22  0%       ` Bruce Richardson
  2020-07-06 23:16  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-06 12:22 UTC (permalink / raw)
  To: Fady Bader
  Cc: dev, thomas, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

On Mon, Jul 06, 2020 at 02:32:39PM +0300, Fady Bader wrote:
> Function versioning implementation is not supported by Windows.
> Function versioning is disabled on Windows.
> 
> Signed-off-by: Fady Bader <fady@mellanox.com>
> ---
>  doc/guides/windows_gsg/intro.rst | 4 ++++
>  lib/meson.build                  | 6 +++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> index a0285732df..58c6246404 100644
> --- a/doc/guides/windows_gsg/intro.rst
> +++ b/doc/guides/windows_gsg/intro.rst
> @@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
>  compile. Support is being added in pieces so as to limit the overall scope
>  of any individual patch series. The goal is to be able to run any DPDK
>  application natively on Windows.
> +
> +The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> +Minor ABI versions may be incompatible
> +because function versioning is not supported on Windows.
> diff --git a/lib/meson.build b/lib/meson.build
> index c1b9e1633f..dadf151f78 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -107,6 +107,10 @@ foreach l:libraries
>  			shared_dep = declare_dependency(include_directories: includes)
>  			static_dep = shared_dep
>  		else
> +			if is_windows and use_function_versioning
> +				message('@0@: Function versioning is not supported by Windows.'
> +				.format(name))
> +			endif
>  

This is ok here, but I think it might be better just moved to somewhere
like config/meson.build, so that it is always just printed once for each
build. I don't see an issue with having it printed even if there is no
function versioning in the build itself.

>  			if use_function_versioning
>  				cflags += '-DRTE_USE_FUNCTION_VERSIONING'
> @@ -138,7 +142,7 @@ foreach l:libraries
>  					include_directories: includes,
>  					dependencies: static_deps)
>  
> -			if not use_function_versioning
> +			if not use_function_versioning or is_windows
>  				# use pre-build objects to build shared lib
>  				sources = []
>  				objs += static_lib.extract_all_objects(recursive: false)
> -- 
> 2.16.1.windows.4
> 

With or without the code move above, which is just a suggestion,

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id
  @ 2020-07-06 14:15  3%   ` David Marchand
  2020-07-06 14:16  3%   ` [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 14:15 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Neil Horman, Cunming Liang

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 7be80c292e..fd13453fee 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper
    2020-07-06 14:15  3%   ` [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-07-06 14:16  3%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 14:16 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Jerin Jacob, Sunil Kumar Kori, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
__rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v4:
- renamed rte_thread_uninit and moved to eal_private.h,
- hid freeing helper,

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 +++++
 lib/librte_eal/common/eal_common_trace.c  | 49 +++++++++++++++++++----
 lib/librte_eal/common/eal_private.h       |  5 +++
 lib/librte_eal/common/eal_trace.h         |  1 +
 lib/librte_eal/windows/eal.c              |  5 +++
 5 files changed, 62 insertions(+), 7 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index fb06f8f802..6d1c87b1c2 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+__rte_thread_uninit(void)
+{
+	trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..b6da5537fe 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
 void
 trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	header = RTE_PER_LCORE(trace_mem);
+	if (header == NULL)
+		return;
+
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
+void
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 5d8b53882d..a77ac7a963 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -709,4 +709,9 @@ eal_get_application_usage_hook(void);
  */
 void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void __rte_thread_uninit(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..92c5951c3a 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,6 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
+void trace_mem_free(void);
 void trace_mem_per_thread_free(void);
 
 /* EAL interface */
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 9f5d019e64..a11daee68b 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -215,6 +215,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  2020-07-06 10:04  4%     ` Thomas Monjalon
@ 2020-07-06 15:32  0%       ` Phil Yang
  2020-07-06 15:40  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Phil Yang @ 2020-07-06 15:32 UTC (permalink / raw)
  To: thomas
  Cc: erik.g.carrillo, dev, jerinj, Honnappa Nagarahalli, drc,
	Ruifeng Wang, Dharmik Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, July 6, 2020 6:04 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: erik.g.carrillo@intel.com; dev@dpdk.org; jerinj@marvell.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; drc@linux.vnet.ibm.com;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>;
> david.marchand@redhat.com; mdr@ashroe.eu; Neil Horman
> <nhorman@tuxdriver.com>; Dodji Seketeli <dodji@redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11
> atomics
> 
> 02/07/2020 07:26, Phil Yang:
> > The implementation-specific opaque data is shared between arm and
> cancel
> > operations. The state flag acts as a guard variable to make sure the
> > update of opaque data is synchronized. This patch uses c11 atomics with
> > explicit one way memory barrier instead of full barriers rte_smp_w/rmb()
> > to synchronize the opaque data between timer arm and cancel threads.
> 
> I think we should write C11 (uppercase).
Agreed. 
I will change it in the next version.

> 
> Please, in your explanations, try to be more specific.
> Naming fields may help to make things clear.
OK. Thanks.

> 
> [...]
> > --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> > +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> > @@ -467,7 +467,7 @@ struct rte_event_timer {
> >  	 *  - op: RTE_EVENT_OP_NEW
> >  	 *  - event_type: RTE_EVENT_TYPE_TIMER
> >  	 */
> > -	volatile enum rte_event_timer_state state;
> > +	enum rte_event_timer_state state;
> >  	/**< State of the event timer. */
> 
> Why do you remove the volatile keyword?
> It is not explained in the commit log.
By using the C11 atomic operations, it will generate the same instructions for non-volatile and volatile version.
Please check the sample code here: https://gcc.godbolt.org/z/8x5rWs

> 
> This change is triggering a warning in the ABI check:
> http://mails.dpdk.org/archives/test-report/2020-July/140440.html
> Moving from volatile to non-volatile is probably not an issue.
> I expect the code generated for the volatile case to work the same
> in non-volatile case. Do you confirm?
They generate the same instructions, so either way will work.
Do I need to revert it to the volatile version?


Thanks,
Phil
> 
> In any case, we need an explanation and an ABI check exception.
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics
  2020-07-06 15:32  0%       ` Phil Yang
@ 2020-07-06 15:40  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 15:40 UTC (permalink / raw)
  To: Phil Yang
  Cc: erik.g.carrillo, dev, jerinj, Honnappa Nagarahalli, drc,
	Ruifeng Wang, Dharmik Thakkar, nd, david.marchand, mdr,
	Neil Horman, Dodji Seketeli

06/07/2020 17:32, Phil Yang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 02/07/2020 07:26, Phil Yang:
> > > --- a/lib/librte_eventdev/rte_event_timer_adapter.h
> > > +++ b/lib/librte_eventdev/rte_event_timer_adapter.h
> > > @@ -467,7 +467,7 @@ struct rte_event_timer {
> > >  	 *  - op: RTE_EVENT_OP_NEW
> > >  	 *  - event_type: RTE_EVENT_TYPE_TIMER
> > >  	 */
> > > -	volatile enum rte_event_timer_state state;
> > > +	enum rte_event_timer_state state;
> > >  	/**< State of the event timer. */
> > 
> > Why do you remove the volatile keyword?
> > It is not explained in the commit log.
> By using the C11 atomic operations, it will generate the same instructions for non-volatile and volatile version.
> Please check the sample code here: https://gcc.godbolt.org/z/8x5rWs
> 
> > This change is triggering a warning in the ABI check:
> > http://mails.dpdk.org/archives/test-report/2020-July/140440.html
> > Moving from volatile to non-volatile is probably not an issue.
> > I expect the code generated for the volatile case to work the same
> > in non-volatile case. Do you confirm?
> They generate the same instructions, so either way will work.
> Do I need to revert it to the volatile version?

Either you revert, or you add explanation in the commit log
+ exception in libabigail.abignore



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 0/3] Experimental/internal libraries cleanup
  2020-07-05 19:55  3%   ` Thomas Monjalon
  2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
@ 2020-07-06 16:57  0%     ` Medvedkin, Vladimir
  1 sibling, 0 replies; 200+ results
From: Medvedkin, Vladimir @ 2020-07-06 16:57 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand
  Cc: dev, honnappa.nagarahalli, techboard, Jiayu Hu, Yipeng Wang,
	Sameh Gobriel, Nipun Gupta, Hemant Agrawal


On 05/07/2020 20:55, Thomas Monjalon wrote:
> +Cc maintainers of the problematic libraries:
> 	- librte_fib
> 	- librte_rib
> 	- librte_gro
> 	- librte_member
> 	- librte_rawdev
>
> 26/06/2020 10:16, David Marchand:
>> Following discussions on the mailing list and the 05/20 TB meeting, here
>> is a series that drops the special versioning for non stable libraries.
>>
>> Two notes:
>>
>> - RIB/FIB library is not referenced in the API doxygen index, is this
>>    intentional?
> Vladimir please, could you fix the miss in the doxygen index?


Sure, I'll send a patch.


>
>> - I inspected MAINTAINERS: librte_gro, librte_member and librte_rawdev are
>>    announced as experimental while their functions are part of the 20
>>    stable ABI (in .map files + no __rte_experimental marking).
>>    Their fate must be discussed.
> I would suggest removing EXPERIMENTAL flag for gro, member and rawdev.
> They are probably already considered stable for a lot of users.
> Maintainers, are you OK to follow the ABI compatibility rules
> for these libraries? Do you feel these libraries are mature enough?
>
>
>
-- 
Regards,
Vladimir


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id
  @ 2020-07-06 20:52  3%   ` David Marchand
  2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 20:52 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Neil Horman, Cunming Liang

Because of the inline accessor + static declaration in rte_gettid(),
we end up with multiple symbols for RTE_PER_LCORE(_thread_id).
Each compilation unit will pay a cost when accessing this information
for the first time.

$ nm build/app/dpdk-testpmd | grep per_lcore__thread_id
0000000000000054 d per_lcore__thread_id.5037
0000000000000040 d per_lcore__thread_id.5103
0000000000000048 d per_lcore__thread_id.5259
000000000000004c d per_lcore__thread_id.5259
0000000000000044 d per_lcore__thread_id.5933
0000000000000058 d per_lcore__thread_id.6261
0000000000000050 d per_lcore__thread_id.7378
000000000000005c d per_lcore__thread_id.7496
000000000000000c d per_lcore__thread_id.8016
0000000000000010 d per_lcore__thread_id.8431

Make it global as part of the DPDK_21 stable ABI.

Fixes: ef76436c6834 ("eal: get unique thread id")

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/librte_eal/common/eal_common_thread.c | 1 +
 lib/librte_eal/include/rte_eal.h          | 3 ++-
 lib/librte_eal/rte_eal_version.map        | 7 +++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 7be80c292e..fd13453fee 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -22,6 +22,7 @@
 #include "eal_thread.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
+RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 static RTE_DEFINE_PER_LCORE(unsigned int, _socket_id) =
 	(unsigned int)SOCKET_ID_ANY;
 static RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
diff --git a/lib/librte_eal/include/rte_eal.h b/lib/librte_eal/include/rte_eal.h
index 2f9ed298de..2edf8c6556 100644
--- a/lib/librte_eal/include/rte_eal.h
+++ b/lib/librte_eal/include/rte_eal.h
@@ -447,6 +447,8 @@ enum rte_intr_mode rte_eal_vfio_intr_mode(void);
  */
 int rte_sys_gettid(void);
 
+RTE_DECLARE_PER_LCORE(int, _thread_id);
+
 /**
  * Get system unique thread id.
  *
@@ -456,7 +458,6 @@ int rte_sys_gettid(void);
  */
 static inline int rte_gettid(void)
 {
-	static RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
 	if (RTE_PER_LCORE(_thread_id) == -1)
 		RTE_PER_LCORE(_thread_id) = rte_sys_gettid();
 	return RTE_PER_LCORE(_thread_id);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 196eef5afa..0d42d44ce9 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,6 +221,13 @@ DPDK_20.0 {
 	local: *;
 };
 
+DPDK_21 {
+	global:
+
+	per_lcore__thread_id;
+
+} DPDK_20.0;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper
    2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id David Marchand
@ 2020-07-06 20:52  3%   ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2020-07-06 20:52 UTC (permalink / raw)
  To: dev
  Cc: jerinjacobk, bruce.richardson, mdr, thomas, arybchenko, ktraynor,
	ian.stokes, i.maximets, olivier.matz, konstantin.ananyev,
	Jerin Jacob, Sunil Kumar Kori, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

This is a preparation step for dynamically unregistering threads.

Since we explicitly allocate a per thread trace buffer in
__rte_thread_init, add an internal helper to free this buffer.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changes since v5:
- fixed windows build,

Changes since v4:
- renamed rte_thread_uninit and moved to eal_private.h,
- hid freeing helper,

Changes since v2:
- added missing stub for windows tracing support,
- moved free symbol to exported (experimental) ABI as a counterpart of
  the alloc symbol we already had,

Changes since v1:
- rebased on master, removed Windows workaround wrt traces support,

---
 lib/librte_eal/common/eal_common_thread.c |  9 +++++
 lib/librte_eal/common/eal_common_trace.c  | 49 +++++++++++++++++++----
 lib/librte_eal/common/eal_private.h       |  5 +++
 lib/librte_eal/common/eal_trace.h         |  1 +
 lib/librte_eal/windows/eal.c              |  7 +++-
 5 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index fb06f8f802..6d1c87b1c2 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -20,6 +20,7 @@
 #include "eal_internal_cfg.h"
 #include "eal_private.h"
 #include "eal_thread.h"
+#include "eal_trace.h"
 
 RTE_DEFINE_PER_LCORE(unsigned int, _lcore_id) = LCORE_ID_ANY;
 RTE_DEFINE_PER_LCORE(int, _thread_id) = -1;
@@ -161,6 +162,14 @@ __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset)
 	__rte_trace_mem_per_thread_alloc();
 }
 
+void
+__rte_thread_uninit(void)
+{
+	trace_mem_per_thread_free();
+
+	RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY;
+}
+
 struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
diff --git a/lib/librte_eal/common/eal_common_trace.c b/lib/librte_eal/common/eal_common_trace.c
index 875553d7e5..b6da5537fe 100644
--- a/lib/librte_eal/common/eal_common_trace.c
+++ b/lib/librte_eal/common/eal_common_trace.c
@@ -101,7 +101,7 @@ eal_trace_fini(void)
 {
 	if (!rte_trace_is_enabled())
 		return;
-	trace_mem_per_thread_free();
+	trace_mem_free();
 	trace_metadata_destroy();
 	eal_trace_args_free();
 }
@@ -370,24 +370,59 @@ __rte_trace_mem_per_thread_alloc(void)
 	rte_spinlock_unlock(&trace->lock);
 }
 
+static void
+trace_mem_per_thread_free_unlocked(struct thread_mem_meta *meta)
+{
+	if (meta->area == TRACE_AREA_HUGEPAGE)
+		eal_free_no_trace(meta->mem);
+	else if (meta->area == TRACE_AREA_HEAP)
+		free(meta->mem);
+}
+
 void
 trace_mem_per_thread_free(void)
+{
+	struct trace *trace = trace_obj_get();
+	struct __rte_trace_header *header;
+	uint32_t count;
+
+	header = RTE_PER_LCORE(trace_mem);
+	if (header == NULL)
+		return;
+
+	rte_spinlock_lock(&trace->lock);
+	for (count = 0; count < trace->nb_trace_mem_list; count++) {
+		if (trace->lcore_meta[count].mem == header)
+			break;
+	}
+	if (count != trace->nb_trace_mem_list) {
+		struct thread_mem_meta *meta = &trace->lcore_meta[count];
+
+		trace_mem_per_thread_free_unlocked(meta);
+		if (count != trace->nb_trace_mem_list - 1) {
+			memmove(meta, meta + 1,
+				sizeof(*meta) *
+				 (trace->nb_trace_mem_list - count - 1));
+		}
+		trace->nb_trace_mem_list--;
+	}
+	rte_spinlock_unlock(&trace->lock);
+}
+
+void
+trace_mem_free(void)
 {
 	struct trace *trace = trace_obj_get();
 	uint32_t count;
-	void *mem;
 
 	if (!rte_trace_is_enabled())
 		return;
 
 	rte_spinlock_lock(&trace->lock);
 	for (count = 0; count < trace->nb_trace_mem_list; count++) {
-		mem = trace->lcore_meta[count].mem;
-		if (trace->lcore_meta[count].area == TRACE_AREA_HUGEPAGE)
-			eal_free_no_trace(mem);
-		else if (trace->lcore_meta[count].area == TRACE_AREA_HEAP)
-			free(mem);
+		trace_mem_per_thread_free_unlocked(&trace->lcore_meta[count]);
 	}
+	trace->nb_trace_mem_list = 0;
 	rte_spinlock_unlock(&trace->lock);
 }
 
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 5d8b53882d..a77ac7a963 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -709,4 +709,9 @@ eal_get_application_usage_hook(void);
  */
 void __rte_thread_init(unsigned int lcore_id, rte_cpuset_t *cpuset);
 
+/**
+ * Uninitialize per-lcore info for current thread.
+ */
+void __rte_thread_uninit(void);
+
 #endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/eal_trace.h b/lib/librte_eal/common/eal_trace.h
index 8f60616156..92c5951c3a 100644
--- a/lib/librte_eal/common/eal_trace.h
+++ b/lib/librte_eal/common/eal_trace.h
@@ -106,6 +106,7 @@ int trace_metadata_create(void);
 void trace_metadata_destroy(void);
 int trace_mkdir(void);
 int trace_epoch_time_save(void);
+void trace_mem_free(void);
 void trace_mem_per_thread_free(void);
 
 /* EAL interface */
diff --git a/lib/librte_eal/windows/eal.c b/lib/librte_eal/windows/eal.c
index 9f5d019e64..addac62ae5 100644
--- a/lib/librte_eal/windows/eal.c
+++ b/lib/librte_eal/windows/eal.c
@@ -17,10 +17,10 @@
 #include <eal_filesystem.h>
 #include <eal_options.h>
 #include <eal_private.h>
-#include <rte_trace_point.h>
 #include <rte_vfio.h>
 
 #include "eal_hugepages.h"
+#include "eal_trace.h"
 #include "eal_windows.h"
 
 #define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
@@ -215,6 +215,11 @@ __rte_trace_mem_per_thread_alloc(void)
 {
 }
 
+void
+trace_mem_per_thread_free(void)
+{
+}
+
 void
 __rte_trace_point_emit_field(size_t sz, const char *field,
 	const char *type)
-- 
2.23.0


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7] sched: make RED scaling configurable
  @ 2020-07-06 23:09  3%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 23:09 UTC (permalink / raw)
  To: Alan Dewar, Alan Dewar
  Cc: dev, Yigit, Ferruh, Kantecki, Tomasz, Stephen Hemminger, dev,
	Dumitrescu, Cristian, jasvinder.singh, david.marchand,
	bruce.richardson

08/04/2019 15:29, Dumitrescu, Cristian:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 08/04/2019 10:24, Alan Dewar:
> > > On Fri, Apr 5, 2019 at 4:36 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > > > On 1/16/2018 4:07 PM, alangordondewar@gmail.com wrote:
> > > > > From: Alan Dewar <alan.dewar@att.com>
> > > > >
> > > > > The RED code stores the weighted moving average in a 32-bit integer as
> > > > > a pseudo fixed-point floating number with 10 fractional bits.  Twelve
> > > > > other bits are used to encode the filter weight, leaving just 10 bits
> > > > > for the queue length.  This limits the maximum queue length supported
> > > > > by RED queues to 1024 packets.
> > > > >
> > > > > Introduce a new API to allow the RED scaling factor to be configured
> > > > > based upon maximum queue length.  If this API is not called, the RED
> > > > > scaling factor remains at its default value.
> > > > >
> > > > > Added some new RED scaling unit-tests to test with RED queue-lengths
> > > > > up to 8192 packets long.
> > > > >
> > > > > Signed-off-by: Alan Dewar <alan.dewar@att.com>
> > > >
> > > > Hi Cristian, Alan,
> > > >
> > > > The v7 of this patch is sting without any comment for more than a year.
> > > > What is the status of this patch? Is it still valid? What is blocking it?
> > > >
> > > > For reference patch:
> > > > https://patches.dpdk.org/patch/33837/
> > >
> > > We are still using this patch against DPDK 17.11 and 18.11 as part of
> > > the AT&T Vyatta NOS.   It is needed to make WRED queues longer than
> > > 1024 packets work correctly.  I'm afraid that I have no idea what is
> > > holding it up from being merged.
> > 
> > It will be in a release when it will be merged in the git tree
> > dpdk-next-qos, managed by Cristian.
> 
> I was hoping to get a review & ACK from Tomasz Kantecki, the author of the WRED code in DPDK, hence the lack of progress on this patch.

It seems nobody was able to provide an feedback after two years,
and it was never merged in the QoS git tree.
The handling of this patch is really a shame.

Alan, please rebase this patch.
If nothing is wrong in CI (including ABI check),
I will merge the next version.



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning on Windows
  2020-07-06 12:22  0%       ` Bruce Richardson
@ 2020-07-06 23:16  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-06 23:16 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Fady Bader, dev, tbashar, talshn, yohadt, dmitry.kozliuk,
	harini.ramakrishnan, ocardona, pallavi.kadam, ranjit.menon,
	olivier.matz, arybchenko, mdr, nhorman

06/07/2020 14:22, Bruce Richardson:
> On Mon, Jul 06, 2020 at 02:32:39PM +0300, Fady Bader wrote:
> > Function versioning implementation is not supported by Windows.
> > Function versioning is disabled on Windows.
> > 
> > Signed-off-by: Fady Bader <fady@mellanox.com>
> > ---
> >  doc/guides/windows_gsg/intro.rst | 4 ++++
> >  lib/meson.build                  | 6 +++++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> > index a0285732df..58c6246404 100644
> > --- a/doc/guides/windows_gsg/intro.rst
> > +++ b/doc/guides/windows_gsg/intro.rst
> > @@ -18,3 +18,7 @@ DPDK for Windows is currently a work in progress. Not all DPDK source files
> >  compile. Support is being added in pieces so as to limit the overall scope
> >  of any individual patch series. The goal is to be able to run any DPDK
> >  application natively on Windows.
> > +
> > +The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> > +Minor ABI versions may be incompatible
> > +because function versioning is not supported on Windows.
> > diff --git a/lib/meson.build b/lib/meson.build
> > index c1b9e1633f..dadf151f78 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -107,6 +107,10 @@ foreach l:libraries
> >  			shared_dep = declare_dependency(include_directories: includes)
> >  			static_dep = shared_dep
> >  		else
> > +			if is_windows and use_function_versioning
> > +				message('@0@: Function versioning is not supported by Windows.'
> > +				.format(name))
> > +			endif
> >  
> 
> This is ok here, but I think it might be better just moved to somewhere
> like config/meson.build, so that it is always just printed once for each
> build. I don't see an issue with having it printed even if there is no
> function versioning in the build itself.

Moving such message in config/meson.build is the same
as moving it to the doc.
I prefer having a message each time a library compatibility
is required but not possible.

> With or without the code move above, which is just a suggestion,
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

OK thanks, I'll merge as is.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-06  6:23  3%       ` Kinsella, Ray
@ 2020-07-07  3:19  3%         ` Feifei Wang
  2020-07-07  7:40  0%           ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Feifei Wang @ 2020-07-07  3:19 UTC (permalink / raw)
  To: Kinsella, Ray, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd, nd



> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: 2020年7月6日 14:23
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
> 
> 
> 
> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>
> >> On 03/07/2020 11:26, Feifei Wang wrote:
> >>> Remove the experimental tag for rte_ring_reset API that have been
> >>> around for 4 releases.
> >>>
> >>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> >>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >>> ---
> >>>  lib/librte_ring/rte_ring.h           | 3 ---
> >>>  lib/librte_ring/rte_ring_version.map | 4 +---
> >>>  2 files changed, 1 insertion(+), 6 deletions(-)
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> >>> index f67141482..7181c33b4 100644
> >>> --- a/lib/librte_ring/rte_ring.h
> >>> +++ b/lib/librte_ring/rte_ring.h
> >>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
> **obj_p)
> >>>   *
> >>>   * This function flush all the elements in a ring
> >>>   *
> >>> - * @b EXPERIMENTAL: this API may change without prior notice
> >>> - *
> >>>   * @warning
> >>>   * Make sure the ring is not in use while calling this function.
> >>>   *
> >>>   * @param r
> >>>   *   A pointer to the ring structure.
> >>>   */
> >>> -__rte_experimental
> >>>  void
> >>>  rte_ring_reset(struct rte_ring *r);
> >>>
> >>> diff --git a/lib/librte_ring/rte_ring_version.map
> >>> b/lib/librte_ring/rte_ring_version.map
> >>> index e88c143cf..aec6f3820 100644
> >>> --- a/lib/librte_ring/rte_ring_version.map
> >>> +++ b/lib/librte_ring/rte_ring_version.map
> >>> @@ -8,6 +8,7 @@ DPDK_20.0 {
> >>>  	rte_ring_init;
> >>>  	rte_ring_list_dump;
> >>>  	rte_ring_lookup;
> >>> +	rte_ring_reset;
> >>>
> >>>  	local: *;
> >>>  };
> >>> @@ -15,9 +16,6 @@ DPDK_20.0 {
> >>>  EXPERIMENTAL {
> >>>  	global:
> >>>
> >>> -	# added in 19.08
> >>> -	rte_ring_reset;
> >>> -
> >>>  	# added in 20.02
> >>>  	rte_ring_create_elem;
> >>>  	rte_ring_get_memsize_elem;
> >>
> >> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
> >> the v20.0 ABI.
> > Thanks Ray for clarifying this.
> >
Thanks very much for pointing this.
> >>
> >> The way to solve is to add it the DPDK_21 ABI in the map file.
> >> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
> experimental
> >> if necessary.
> > Is using VERSION_SYMBOL_EXPERIMENTAL a must?
> 
> Purely at the discretion of the contributor and maintainer.
> If it has been around for a while, applications are using it and changing the
> symbol will break them.
> 
> You may choose to provide the alias or not.
Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
version, it is still needed to be removed in the near future.

Thanks very much for your review.
> 
> > The documentation also seems to be vague. It says " The macro is used
> when a symbol matures to become part of the stable ABI, to provide an alias
> to experimental for some time". What does 'some time' mean?
> 
> "Some time" is a bit vague alright, should be "until the next major ABI
> version" - I will fix.
> 
> >
> >>
> >> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
> >> ng-
> >> macros

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API
  2020-07-07  3:19  3%         ` Feifei Wang
@ 2020-07-07  7:40  0%           ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07  7:40 UTC (permalink / raw)
  To: Feifei Wang, Honnappa Nagarahalli, Konstantin Ananyev, Neil Horman
  Cc: dev, nd



On 07/07/2020 04:19, Feifei Wang wrote:
> 
> 
>> -----Original Message-----
>> From: Kinsella, Ray <mdr@ashroe.eu>
>> Sent: 2020年7月6日 14:23
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Feifei Wang
>> <Feifei.Wang2@arm.com>; Konstantin Ananyev
>> <konstantin.ananyev@intel.com>; Neil Horman <nhorman@tuxdriver.com>
>> Cc: dev@dpdk.org; nd <nd@arm.com>
>> Subject: Re: [PATCH 1/3] ring: remove experimental tag for ring reset API
>>
>>
>>
>> On 03/07/2020 19:46, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>
>>>> On 03/07/2020 11:26, Feifei Wang wrote:
>>>>> Remove the experimental tag for rte_ring_reset API that have been
>>>>> around for 4 releases.
>>>>>
>>>>> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
>>>>> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>>>>> ---
>>>>>  lib/librte_ring/rte_ring.h           | 3 ---
>>>>>  lib/librte_ring/rte_ring_version.map | 4 +---
>>>>>  2 files changed, 1 insertion(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
>>>>> index f67141482..7181c33b4 100644
>>>>> --- a/lib/librte_ring/rte_ring.h
>>>>> +++ b/lib/librte_ring/rte_ring.h
>>>>> @@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void
>> **obj_p)
>>>>>   *
>>>>>   * This function flush all the elements in a ring
>>>>>   *
>>>>> - * @b EXPERIMENTAL: this API may change without prior notice
>>>>> - *
>>>>>   * @warning
>>>>>   * Make sure the ring is not in use while calling this function.
>>>>>   *
>>>>>   * @param r
>>>>>   *   A pointer to the ring structure.
>>>>>   */
>>>>> -__rte_experimental
>>>>>  void
>>>>>  rte_ring_reset(struct rte_ring *r);
>>>>>
>>>>> diff --git a/lib/librte_ring/rte_ring_version.map
>>>>> b/lib/librte_ring/rte_ring_version.map
>>>>> index e88c143cf..aec6f3820 100644
>>>>> --- a/lib/librte_ring/rte_ring_version.map
>>>>> +++ b/lib/librte_ring/rte_ring_version.map
>>>>> @@ -8,6 +8,7 @@ DPDK_20.0 {
>>>>>  	rte_ring_init;
>>>>>  	rte_ring_list_dump;
>>>>>  	rte_ring_lookup;
>>>>> +	rte_ring_reset;
>>>>>
>>>>>  	local: *;
>>>>>  };
>>>>> @@ -15,9 +16,6 @@ DPDK_20.0 {
>>>>>  EXPERIMENTAL {
>>>>>  	global:
>>>>>
>>>>> -	# added in 19.08
>>>>> -	rte_ring_reset;
>>>>> -
>>>>>  	# added in 20.02
>>>>>  	rte_ring_create_elem;
>>>>>  	rte_ring_get_memsize_elem;
>>>>
>>>> So strictly speaking, rte_ring_reset is part of the DPDK_21 ABI, not
>>>> the v20.0 ABI.
>>> Thanks Ray for clarifying this.
>>>
> Thanks very much for pointing this.
>>>>
>>>> The way to solve is to add it the DPDK_21 ABI in the map file.
>>>> And then use the VERSION_SYMBOL_EXPERIMENTAL to alias to
>> experimental
>>>> if necessary.
>>> Is using VERSION_SYMBOL_EXPERIMENTAL a must?
>>
>> Purely at the discretion of the contributor and maintainer.
>> If it has been around for a while, applications are using it and changing the
>> symbol will break them.
>>
>> You may choose to provide the alias or not.
> Ok, in the new patch version, I will add it into the DPDK_21 ABI but the 
> VERSION_SYMBOL_EXPERIMENTAL will not be added, because if it is added in this
> version, it is still needed to be removed in the near future.
> 
> Thanks very much for your review.

That is 100%

>>
>>> The documentation also seems to be vague. It says " The macro is used
>> when a symbol matures to become part of the stable ABI, to provide an alias
>> to experimental for some time". What does 'some time' mean?
>>
>> "Some time" is a bit vague alright, should be "until the next major ABI
>> version" - I will fix.
>>
>>>
>>>>
>>>> https://doc.dpdk.org/guides/contributing/abi_versioning.html#versioni
>>>> ng-
>>>> macros

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
    2020-07-03 15:38  3% ` David Marchand
@ 2020-07-07 10:10  3% ` Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
                     ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Phil Yang @ 2020-07-07 10:10 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: drc, Honnappa.Nagarahalli, olivier.matz, ruifeng.wang, nd

Use C11 atomics with explicit ordering instead of rte_atomic ops which
enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..4a7a98c 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return (uint16_t)(__atomic_add_fetch((int16_t *)&m->refcnt, value,
+					__ATOMIC_ACQ_REL));
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return (uint16_t)(__atomic_add_fetch((int16_t *)&shinfo->refcnt_atomic,
+					    value, __ATOMIC_ACQ_REL));
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch((int *)&shinfo->refcnt_atomic, -1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..806313a 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -18,7 +18,6 @@
 
 #include <stdint.h>
 #include <rte_compat.h>
-#include <generic/rte_atomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -495,12 +494,8 @@ struct rte_mbuf {
 	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
 	 * config option.
 	 */
-	RTE_STD_C11
-	union {
-		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
-		/** Non-atomically accessed refcnt */
-		uint16_t refcnt;
-	};
+	uint16_t refcnt;
+
 	uint16_t nb_segs;         /**< Number of segments. */
 
 	/** Input port (16 bits to support more than 256 virtual ports).
@@ -679,7 +674,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  @ 2020-07-07 11:13  4%     ` Phil Yang
  2020-07-07 14:29  0%       ` Jerin Jacob
    1 sibling, 1 reply; 200+ results
From: Phil Yang @ 2020-07-07 11:13 UTC (permalink / raw)
  To: thomas, erik.g.carrillo, dev
  Cc: jerinj, Honnappa.Nagarahalli, drc, Ruifeng.Wang, Dharmik.Thakkar,
	nd, david.marchand, mdr, nhorman, dodji

The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
v3:
Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.

v2:
1. Removed implementation-specific opaque data cleanup code.
2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.

 lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index d75415c..eb2c93a 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
 		sw->expired_timers[sw->n_expired_timers++] = tim;
 		sw->stats.evtim_exp_count++;
 
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	int n_lcores;
 	/* Timer list for this lcore is not in use. */
 	uint16_t exp_state = 0;
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 
 	for (i = 0; i < nb_evtims; i++) {
-		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+		} else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
+			     n_state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
 		if (unlikely(ret == -1)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOLATE,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOEARLY,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 					  SINGLE, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELEASE);
 			break;
 		}
 
-		rte_smp_wmb();
 		EVTIM_LOG_DBG("armed an event timer");
-		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
+		/* RELEASE ordering guarantees the adapter specific value
+		 * changes observed before the update of state.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (i < nb_evtims)
@@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	struct rte_timer *timp;
 	uint64_t opaque;
 	struct swtim *sw = swtim_pmd_priv(adapter);
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
+		/* ACQUIRE ordering guarantees the access of implementation
+		 * specific opague data under the correct state.
+		 */
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
+		} else if (n_state != RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EINVAL;
 			break;
 		}
 
-		rte_smp_rmb();
-
 		opaque = evtims[i]->impl_opaque[0];
 		timp = (struct rte_timer *)(uintptr_t)opaque;
 		RTE_ASSERT(timp != NULL);
@@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 		rte_mempool_put(sw->tim_pool, (void **)timp);
 
-		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-
-		rte_smp_wmb();
+		/* The RELEASE ordering here pairs with atomic ordering
+		 * to make sure the state update data observed between
+		 * threads.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				__ATOMIC_RELEASE);
 	}
 
 	return i;
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-07 11:50  0%   ` Olivier Matz
  2020-07-07 12:46  0%     ` Slava Ovsiienko
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-07 11:50 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

Hi Slava,

Few question/comments below.

On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.

Do you mean the same flag will be used for both Rx and Tx?  I wonder if
it's a good idea: if you enable the timestamp on Rx, the packets will be
flagged and it will impact Tx, except if the application explicitly
resets the flag in all mbufs. Wouldn't it be safer to have an Rx flag
and a Tx flag?

> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. It the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.

I think what to do with packets to be send in the "past" could be
configurable through an ethdev API in the future (drop or send).

> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.

This constraint makes sense. At first glance, it looks it is imposed by
a PMD or hw limitation, but thinking more about it, I think it is the
correct behavior to have. Packets are ordered within a PMD queue, and
the ability to set the timestamp for one packet to delay the subsequent
ones looks useful.

Should this behavior be documented somewhere? Maybe in the API comment
documenting the dynamic flag?

> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..fb5477c 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/*
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, these dynamic flag and field will be used to manage
> + * the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME "rte_dynflag_timestamp"
> +

I realize that's not the case for rte_flow_dynfield_metadata, but
I think it would be good to have a doxygen-like comment.



Regards,
Olivier

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 11:50  0%   ` Olivier Matz
@ 2020-07-07 12:46  0%     ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-07 12:46 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Matan Azrad, Raslan Darawsheh, bernard.iremonger, thomas

Hi, Olivier

Thanks a lot for the review.

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, July 7, 2020 14:51
> To: Slava Ovsiienko <viacheslavo@mellanox.com>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Raslan
> Darawsheh <rasland@mellanox.com>; bernard.iremonger@intel.com;
> thomas@mellanox.net
> Subject: Re: [PATCH 1/2] mbuf: introduce accurate packet Tx scheduling
> 
> Hi Slava,
> 
> Few question/comments below.
> 
> On Wed, Jul 01, 2020 at 03:36:26PM +0000, Viacheslav Ovsiienko wrote:
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> 
> Do you mean the same flag will be used for both Rx and Tx?  I wonder if it's a
> good idea: if you enable the timestamp on Rx, the packets will be flagged
> and it will impact Tx, except if the application explicitly resets the flag in all
> mbufs. Wouldn't it be safer to have an Rx flag and a Tx flag?

A little bit difficult to say ambiguously, I thought about and did not make strong decision.
We have the flag sharing for the Rx/Tx metadata and just follow the same approach.
OK, I will promote ones to:
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
RTE_MBUF_DYNFIELD_TX_TIMESTAMP_NAME

And, possible, we should reconsider metadata dynamic flags.
> 
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. It the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> 
> I think what to do with packets to be send in the "past" could be configurable
> through an ethdev API in the future (drop or send).
Yes, currently there is no complete understanding how to handle packets out of the time slot.
In 20.11 we are going to introduce the time-based rte flow API to manage out-of-window packets.
 
> 
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.

"can" should be replaced with "might". Current mlx5 implementation
checks each packet in the burst for the timestamp presence.

> 
> This constraint makes sense. At first glance, it looks it is imposed by a PMD or
> hw limitation, but thinking more about it, I think it is the correct behavior to
> have. Packets are ordered within a PMD queue, and the ability to set the
> timestamp for one packet to delay the subsequent ones looks useful.
> 
> Should this behavior be documented somewhere? Maybe in the API
> comment documenting the dynamic flag?

It is documented in mlx5.rst (coming soon), do you think it should be
more common? OK, will update.

> 
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 16 ++++++++++++++++
> >  3 files changed, 21 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..fb5477c 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,20 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/*
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > +defined
> > + * but are maintained always the same for a given port. Some devices
> > +allow
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic timestamp flag tells whether the field
> > +contains
> > + * actual timestamp value. For the packets being sent this value can
> > +be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, these dynamic flag and field will be used to
> > +manage
> > + * the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> > +#define RTE_MBUF_DYNFLAG_TIMESTAMP_NAME
> "rte_dynflag_timestamp"
> > +
> 
> I realize that's not the case for rte_flow_dynfield_metadata, but I think it
> would be good to have a doxygen-like comment.
OK, will extend comment  with expected PMD behavior and replace "/*" with "/**"
> 
> 
> 
> Regards,
> Olivier

With best regards, Slava


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/2] mbuf: introduce accurate packet Tx scheduling
      2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
@ 2020-07-07 12:59  2% ` Viacheslav Ovsiienko
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 12:59 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..834acdd 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwaring and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (2 preceding siblings ...)
  2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
@ 2020-07-07 13:08  2% ` Viacheslav Ovsiienko
  2020-07-07 14:32  0%   ` Olivier Matz
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 13:08 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..5b2f3da 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared timestamp field will be used to handle the
+ * timestamps on receiving datapath as well. Having the dedicated flags
+ * for Rx/Tx timstamps allows applications not to perform explicit flags
+ * reset on forwarding and not to promote received timestamps to the
+ * transmitting datapath by default.
+ *
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according timestamps is supposed,
+ * neither within packet burst, nor between packets, it is an entirely
+ * application responsibility to generate packets and its timestamps in
+ * desired order. The timestamps might be put only in the first packet in
+ * the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  2020-07-07 11:13  4%     ` [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
@ 2020-07-07 14:29  0%       ` Jerin Jacob
  2020-07-07 15:56  0%         ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-07 14:29 UTC (permalink / raw)
  To: Phil Yang
  Cc: Thomas Monjalon, Erik Gabriel Carrillo, dpdk-dev, Jerin Jacob,
	Honnappa Nagarahalli, David Christensen,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji

On Tue, Jul 7, 2020 at 4:45 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The impl_opaque field is shared between the timer arm and cancel
> operations. Meanwhile, the state flag acts as a guard variable to
> make sure the update of impl_opaque is synchronized. The original
> code uses rte_smp barriers to achieve that. This patch uses C11
> atomics with an explicit one-way memory barrier instead of full
> barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.
>
> Since compilers can generate the same instructions for volatile and
> non-volatile variable in C11 __atomics built-ins, so remain the volatile
> keyword in front of state enum to avoid the ABI break issue.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>


Could you fix the following:

WARNING:TYPO_SPELLING: 'opague' may be misspelled - perhaps 'opaque'?
#184: FILE: lib/librte_eventdev/rte_event_timer_adapter.c:1161:
+ * specific opague data under the correct state.

total: 0 errors, 1 warnings, 124 lines checked


> ---
> v3:
> Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.
>
> v2:
> 1. Removed implementation-specific opaque data cleanup code.
> 2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.
>
>  lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
>  1 file changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
> index d75415c..eb2c93a 100644
> --- a/lib/librte_eventdev/rte_event_timer_adapter.c
> +++ b/lib/librte_eventdev/rte_event_timer_adapter.c
> @@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
>                 sw->expired_timers[sw->n_expired_timers++] = tim;
>                 sw->stats.evtim_exp_count++;
>
> -               evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
> +               __atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         if (event_buffer_batch_ready(&sw->buffer)) {
> @@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>         int n_lcores;
>         /* Timer list for this lcore is not in use. */
>         uint16_t exp_state = 0;
> +       enum rte_event_timer_state n_state;
>
>  #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>         /* Check that the service is running. */
> @@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>         }
>
>         for (i = 0; i < nb_evtims; i++) {
> -               /* Don't modify the event timer state in these cases */
> -               if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
> +               n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
> +               if (n_state == RTE_EVENT_TIMER_ARMED) {
>                         rte_errno = EALREADY;
>                         break;
> -               } else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
> -                            evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
> +               } else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
> +                            n_state == RTE_EVENT_TIMER_CANCELED)) {
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
>                 ret = check_timeout(evtims[i], adapter);
>                 if (unlikely(ret == -1)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR_TOOLATE,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 } else if (unlikely(ret == -2)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR_TOOEARLY,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
>                 if (unlikely(check_destination_event_queue(evtims[i],
>                                                            adapter) < 0)) {
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR,
> +                                       __ATOMIC_RELAXED);
>                         rte_errno = EINVAL;
>                         break;
>                 }
> @@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
>                                           SINGLE, lcore_id, NULL, evtims[i]);
>                 if (ret < 0) {
>                         /* tim was in RUNNING or CONFIG state */
> -                       evtims[i]->state = RTE_EVENT_TIMER_ERROR;
> +                       __atomic_store_n(&evtims[i]->state,
> +                                       RTE_EVENT_TIMER_ERROR,
> +                                       __ATOMIC_RELEASE);
>                         break;
>                 }
>
> -               rte_smp_wmb();
>                 EVTIM_LOG_DBG("armed an event timer");
> -               evtims[i]->state = RTE_EVENT_TIMER_ARMED;
> +               /* RELEASE ordering guarantees the adapter specific value
> +                * changes observed before the update of state.
> +                */
> +               __atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         if (i < nb_evtims)
> @@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>         struct rte_timer *timp;
>         uint64_t opaque;
>         struct swtim *sw = swtim_pmd_priv(adapter);
> +       enum rte_event_timer_state n_state;
>
>  #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
>         /* Check that the service is running. */
> @@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>
>         for (i = 0; i < nb_evtims; i++) {
>                 /* Don't modify the event timer state in these cases */
> -               if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
> +               /* ACQUIRE ordering guarantees the access of implementation
> +                * specific opague data under the correct state.
> +                */
> +               n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
> +               if (n_state == RTE_EVENT_TIMER_CANCELED) {
>                         rte_errno = EALREADY;
>                         break;
> -               } else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
> +               } else if (n_state != RTE_EVENT_TIMER_ARMED) {
>                         rte_errno = EINVAL;
>                         break;
>                 }
>
> -               rte_smp_rmb();
> -
>                 opaque = evtims[i]->impl_opaque[0];
>                 timp = (struct rte_timer *)(uintptr_t)opaque;
>                 RTE_ASSERT(timp != NULL);
> @@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
>
>                 rte_mempool_put(sw->tim_pool, (void **)timp);
>
> -               evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
> -
> -               rte_smp_wmb();
> +               /* The RELEASE ordering here pairs with atomic ordering
> +                * to make sure the state update data observed between
> +                * threads.
> +                */
> +               __atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
> +                               __ATOMIC_RELEASE);
>         }
>
>         return i;
> --
> 2.7.4
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
@ 2020-07-07 14:32  0%   ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-07-07 14:32 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 01:08:02PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 32 ++++++++++++++++++++++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..5b2f3da 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,36 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared timestamp field will be used to handle the
> + * timestamps on receiving datapath as well. Having the dedicated flags
> + * for Rx/Tx timstamps allows applications not to perform explicit flags
> + * reset on forwarding and not to promote received timestamps to the
> + * transmitting datapath by default.
> + *
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according timestamps is supposed,

I think there is a typo here

> + * neither within packet burst, nor between packets, it is an entirely
> + * application responsibility to generate packets and its timestamps in
> + * desired order. The timestamps might be put only in the first packet in
> + * the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"

Is it possible to split the comment, to document both
RTE_MBUF_DYNFIELD_TIMESTAMP_NAME and RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME ?  I
didn't try to generate the documentation, but I think, like this, that
RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME will be undocumented.

Apart from that, it looks good to me.


> +
>  #endif
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library
    2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
@ 2020-07-07 14:40  3% ` Ruifeng Wang
  2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-07 14:40 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes
@ 2020-07-07 14:45  8% Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Few documentation fixexs, clarifing the Windows ABI policy and aliases to
experimental mode.

Ray Kinsella (2):
  doc: reword abi policy for windows
  doc: clarify alias to experimental period

 doc/guides/contributing/abi_policy.rst     | 4 +++-
 doc/guides/contributing/abi_versioning.rst | 7 ++++---
 doc/guides/windows_gsg/intro.rst           | 6 +++---
 3 files changed, 10 insertions(+), 7 deletions(-)

--
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
@ 2020-07-07 14:45 24% ` Ray Kinsella
  2020-07-07 15:23  7%   ` Thomas Monjalon
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Minor changes to the abi policy for windows.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_policy.rst | 4 +++-
 doc/guides/windows_gsg/intro.rst       | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index d0affa9..8e70b45 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -40,7 +40,9 @@ General Guidelines
    maintaining ABI stability through one year of DPDK releases starting from
    DPDK 19.11. This policy will be reviewed in 2020, with intention of
    lengthening the stability period. Additional implementation detail can be
-   found in the :ref:`release notes <20_02_abi_changes>`.
+   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
+   policy does not currently apply to the :doc:`Window build
+   <../windows_gsg/intro>`.
 
 What is an ABI?
 ~~~~~~~~~~~~~~~
diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index 58c6246..707afd3 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
 
-The :doc:`../contributing/abi_policy` cannot be respected for Windows.
-Minor ABI versions may be incompatible
-because function versioning is not supported on Windows.
+The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
+function versioning is not supported on Windows, therefore minor ABI versions
+may be incompatible.
-- 
2.7.4


^ permalink raw reply	[relevance 24%]

* [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 14:45 12% ` Ray Kinsella
  2020-07-07 15:26  0%   ` Thomas Monjalon
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 14:45 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Clarify retention period for aliases to experimental.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_versioning.rst | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index 31a9205..e00dfa8 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -158,7 +158,7 @@ The macros exported are:
 * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
   binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
   The macro is used when a symbol matures to become part of the stable ABI, to
-  provide an alias to experimental for some time.
+  provide an alias to experimental until the next major ABI version.
 
 .. _example_abi_macro_usage:
 
@@ -428,8 +428,9 @@ _____________________________
 
 In situations in which an ``experimental`` symbol has been stable for some time,
 and it becomes a candidate for promotion to the stable ABI. At this time, when
-promoting the symbol, maintainer may choose to provide an alias to the
-``experimental`` symbol version, so as not to break consuming applications.
+promoting the symbol, the maintainer may choose to provide an alias to the
+``experimental`` symbol version, so as not to break consuming applications. This
+alias will then typically be dropped in the next major ABI version.
 
 The process to provide an alias to ``experimental`` is similar to that, of
 :ref:`symbol versioning <example_abi_macro_usage>` described above.
-- 
2.7.4


^ permalink raw reply	[relevance 12%]

* [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (3 preceding siblings ...)
  2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
@ 2020-07-07 14:57  2% ` Viacheslav Ovsiienko
  2020-07-07 15:23  0%   ` Olivier Matz
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-07 14:57 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 v1->v4:
    - dedicated dynamic Tx timestamp flag instead of shared with Rx
    - Doxygen-style comment
    - comments update

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };
 
 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
 
+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..7e9f7d2 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
 
+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow4
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value. For the packets being sent this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v7 0/3] RCU integration with LPM library
    2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
  2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-07 15:15  3% ` Ruifeng Wang
    2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-07 15:15 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 120 ++++++-
 lib/librte_lpm/rte_lpm.h           |  59 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 987 insertions(+), 16 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-07 15:23  0%   ` Olivier Matz
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Olivier Matz @ 2020-07-07 15:23 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, matan, rasland, bernard.iremonger, thomas

On Tue, Jul 07, 2020 at 02:57:11PM +0000, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx
>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
>  
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices allow4

allow4 -> allow

> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field contains
> + * actual timestamp value. For the packets being sent this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
> + * packet being sent it tries to synchronize the time of packet appearing
> + * on the wire with the specified packet timestamp. If the specified one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +

Acked-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 15:23  7%   ` Thomas Monjalon
  2020-07-07 16:33  4%     ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 15:23 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon, talshn

07/07/2020 16:45, Ray Kinsella:
> Minor changes to the abi policy for windows.

It looks like you were not fast enough to comment
in the original thread :)
Please add a Fixes line to reference the original commit.

> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>  doc/guides/contributing/abi_policy.rst | 4 +++-
>  doc/guides/windows_gsg/intro.rst       | 6 +++---
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index d0affa9..8e70b45 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -40,7 +40,9 @@ General Guidelines
>     maintaining ABI stability through one year of DPDK releases starting from
>     DPDK 19.11. This policy will be reviewed in 2020, with intention of
>     lengthening the stability period. Additional implementation detail can be
> -   found in the :ref:`release notes <20_02_abi_changes>`.
> +   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
> +   policy does not currently apply to the :doc:`Window build

Window -> Windows

> +   <../windows_gsg/intro>`.
>  
>  What is an ABI?
>  ~~~~~~~~~~~~~~~
> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
> index 58c6246..707afd3 100644
> --- a/doc/guides/windows_gsg/intro.rst
> +++ b/doc/guides/windows_gsg/intro.rst
> @@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
>  of any individual patch series. The goal is to be able to run any DPDK
>  application natively on Windows.
>  
> -The :doc:`../contributing/abi_policy` cannot be respected for Windows.
> -Minor ABI versions may be incompatible
> -because function versioning is not supported on Windows.
> +The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
> +function versioning is not supported on Windows, therefore minor ABI versions
> +may be incompatible.

Please I really prefer we split lines logically rather than filling the space:
The :doc:`../contributing/abi_policy` does not apply to the Windows build,
as function versioning is not supported on Windows,
therefore minor ABI versions may be incompatible.




^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 15:26  0%   ` Thomas Monjalon
  2020-07-07 16:35  3%     ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 15:26 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon, david.marchand,
	nhorman, bruce.richardson

07/07/2020 16:45, Ray Kinsella:
> Clarify retention period for aliases to experimental.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
> --- a/doc/guides/contributing/abi_versioning.rst
> +++ b/doc/guides/contributing/abi_versioning.rst
> @@ -158,7 +158,7 @@ The macros exported are:
>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>    The macro is used when a symbol matures to become part of the stable ABI, to
> -  provide an alias to experimental for some time.
> +  provide an alias to experimental until the next major ABI version.

Why limiting the period for experimental status?
Some API want to remain experimental longer.

[...]
>  In situations in which an ``experimental`` symbol has been stable for some time,
>  and it becomes a candidate for promotion to the stable ABI. At this time, when
> -promoting the symbol, maintainer may choose to provide an alias to the
> -``experimental`` symbol version, so as not to break consuming applications.
> +promoting the symbol, the maintainer may choose to provide an alias to the
> +``experimental`` symbol version, so as not to break consuming applications. This

Please start a sentence on a new line.

> +alias will then typically be dropped in the next major ABI version.

I don't see the need for the time estimation.



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics
  @ 2020-07-07 15:54  4%       ` Phil Yang
  2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
  1 sibling, 0 replies; 200+ results
From: Phil Yang @ 2020-07-07 15:54 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, erik.g.carrillo, Honnappa.Nagarahalli, drc, Ruifeng.Wang,
	Dharmik.Thakkar, nd, david.marchand, mdr, nhorman, dodji, stable

The impl_opaque field is shared between the timer arm and cancel
operations. Meanwhile, the state flag acts as a guard variable to
make sure the update of impl_opaque is synchronized. The original
code uses rte_smp barriers to achieve that. This patch uses C11
atomics with an explicit one-way memory barrier instead of full
barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.

Since compilers can generate the same instructions for volatile and
non-volatile variable in C11 __atomics built-ins, so remain the volatile
keyword in front of state enum to avoid the ABI break issue.

Cc: stable@dpdk.org

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
v4:
1. Fix typo.
2. Cc to stable release. (Honnappa)

v3:
Fix ABI issue: revert to 'volatile enum rte_event_timer_state type state'.

v2:
1. Removed implementation-specific opaque data cleanup code.
2. Replaced thread fence with atomic ACQURE/RELEASE ordering on state access.

 lib/librte_eventdev/rte_event_timer_adapter.c | 55 ++++++++++++++++++---------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
index aa01b4d..4c5e49e 100644
--- a/lib/librte_eventdev/rte_event_timer_adapter.c
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -629,7 +629,8 @@ swtim_callback(struct rte_timer *tim)
 		sw->expired_timers[sw->n_expired_timers++] = tim;
 		sw->stats.evtim_exp_count++;
 
-		evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+		__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -1020,6 +1021,7 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	int n_lcores;
 	/* Timer list for this lcore is not in use. */
 	uint16_t exp_state = 0;
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1060,30 +1062,36 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 	}
 
 	for (i = 0; i < nb_evtims; i++) {
-		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_ARMED) {
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (!(evtims[i]->state == RTE_EVENT_TIMER_NOT_ARMED ||
-			     evtims[i]->state == RTE_EVENT_TIMER_CANCELED)) {
+		} else if (!(n_state == RTE_EVENT_TIMER_NOT_ARMED ||
+			     n_state == RTE_EVENT_TIMER_CANCELED)) {
 			rte_errno = EINVAL;
 			break;
 		}
 
 		ret = check_timeout(evtims[i], adapter);
 		if (unlikely(ret == -1)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOLATE;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOLATE,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR_TOOEARLY;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR_TOOEARLY,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELAXED);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1099,13 +1107,18 @@ __swtim_arm_burst(const struct rte_event_timer_adapter *adapter,
 					  SINGLE, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			evtims[i]->state = RTE_EVENT_TIMER_ERROR;
+			__atomic_store_n(&evtims[i]->state,
+					RTE_EVENT_TIMER_ERROR,
+					__ATOMIC_RELEASE);
 			break;
 		}
 
-		rte_smp_wmb();
 		EVTIM_LOG_DBG("armed an event timer");
-		evtims[i]->state = RTE_EVENT_TIMER_ARMED;
+		/* RELEASE ordering guarantees the adapter specific value
+		 * changes observed before the update of state.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				__ATOMIC_RELEASE);
 	}
 
 	if (i < nb_evtims)
@@ -1132,6 +1145,7 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 	struct rte_timer *timp;
 	uint64_t opaque;
 	struct swtim *sw = swtim_pmd_priv(adapter);
+	enum rte_event_timer_state n_state;
 
 #ifdef RTE_LIBRTE_EVENTDEV_DEBUG
 	/* Check that the service is running. */
@@ -1143,16 +1157,18 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 	for (i = 0; i < nb_evtims; i++) {
 		/* Don't modify the event timer state in these cases */
-		if (evtims[i]->state == RTE_EVENT_TIMER_CANCELED) {
+		/* ACQUIRE ordering guarantees the access of implementation
+		 * specific opaque data under the correct state.
+		 */
+		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
-		} else if (evtims[i]->state != RTE_EVENT_TIMER_ARMED) {
+		} else if (n_state != RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EINVAL;
 			break;
 		}
 
-		rte_smp_rmb();
-
 		opaque = evtims[i]->impl_opaque[0];
 		timp = (struct rte_timer *)(uintptr_t)opaque;
 		RTE_ASSERT(timp != NULL);
@@ -1166,9 +1182,12 @@ swtim_cancel_burst(const struct rte_event_timer_adapter *adapter,
 
 		rte_mempool_put(sw->tim_pool, (void **)timp);
 
-		evtims[i]->state = RTE_EVENT_TIMER_CANCELED;
-
-		rte_smp_wmb();
+		/* The RELEASE ordering here pairs with atomic ordering
+		 * to make sure the state update data observed between
+		 * threads.
+		 */
+		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				__ATOMIC_RELEASE);
 	}
 
 	return i;
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics
  2020-07-07 14:29  0%       ` Jerin Jacob
@ 2020-07-07 15:56  0%         ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-07 15:56 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: thomas, Erik Gabriel Carrillo, dpdk-dev, jerinj,
	Honnappa Nagarahalli, David Christensen, Ruifeng Wang,
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Tuesday, July 7, 2020 10:30 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: thomas@monjalon.net; Erik Gabriel Carrillo <erik.g.carrillo@intel.com>;
> dpdk-dev <dev@dpdk.org>; jerinj@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; David Christensen
> <drc@linux.vnet.ibm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>;
> Dharmik Thakkar <Dharmik.Thakkar@arm.com>; nd <nd@arm.com>; David
> Marchand <david.marchand@redhat.com>; Ray Kinsella <mdr@ashroe.eu>;
> Neil Horman <nhorman@tuxdriver.com>; dodji@redhat.com
> Subject: Re: [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with
> C11 atomics
> 
> On Tue, Jul 7, 2020 at 4:45 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > The impl_opaque field is shared between the timer arm and cancel
> > operations. Meanwhile, the state flag acts as a guard variable to
> > make sure the update of impl_opaque is synchronized. The original
> > code uses rte_smp barriers to achieve that. This patch uses C11
> > atomics with an explicit one-way memory barrier instead of full
> > barriers rte_smp_w/rmb() to avoid the unnecessary barrier on aarch64.
> >
> > Since compilers can generate the same instructions for volatile and
> > non-volatile variable in C11 __atomics built-ins, so remain the volatile
> > keyword in front of state enum to avoid the ABI break issue.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> 
> Could you fix the following:
> 
> WARNING:TYPO_SPELLING: 'opague' may be misspelled - perhaps 'opaque'?
> #184: FILE: lib/librte_eventdev/rte_event_timer_adapter.c:1161:
> + * specific opague data under the correct state.
Done. 

Thanks,
Phil

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows
  2020-07-07 15:23  7%   ` Thomas Monjalon
@ 2020-07-07 16:33  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:33 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, talshn



On 07/07/2020 16:23, Thomas Monjalon wrote:
> 07/07/2020 16:45, Ray Kinsella:
>> Minor changes to the abi policy for windows.
> 
> It looks like you were not fast enough to comment
> in the original thread :)
> Please add a Fixes line to reference the original commit.
> 
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> ---
>>  doc/guides/contributing/abi_policy.rst | 4 +++-
>>  doc/guides/windows_gsg/intro.rst       | 6 +++---
>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
>> index d0affa9..8e70b45 100644
>> --- a/doc/guides/contributing/abi_policy.rst
>> +++ b/doc/guides/contributing/abi_policy.rst
>> @@ -40,7 +40,9 @@ General Guidelines
>>     maintaining ABI stability through one year of DPDK releases starting from
>>     DPDK 19.11. This policy will be reviewed in 2020, with intention of
>>     lengthening the stability period. Additional implementation detail can be
>> -   found in the :ref:`release notes <20_02_abi_changes>`.
>> +   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
>> +   policy does not currently apply to the :doc:`Window build
> 
> Window -> Windows

ACK

> 
>> +   <../windows_gsg/intro>`.
>>  
>>  What is an ABI?
>>  ~~~~~~~~~~~~~~~
>> diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
>> index 58c6246..707afd3 100644
>> --- a/doc/guides/windows_gsg/intro.rst
>> +++ b/doc/guides/windows_gsg/intro.rst
>> @@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
>>  of any individual patch series. The goal is to be able to run any DPDK
>>  application natively on Windows.
>>  
>> -The :doc:`../contributing/abi_policy` cannot be respected for Windows.
>> -Minor ABI versions may be incompatible
>> -because function versioning is not supported on Windows.
>> +The :doc:`../contributing/abi_policy` does not apply to the Windows build, as
>> +function versioning is not supported on Windows, therefore minor ABI versions
>> +may be incompatible.
> 
> Please I really prefer we split lines logically rather than filling the space:
> The :doc:`../contributing/abi_policy` does not apply to the Windows build,
> as function versioning is not supported on Windows,
> therefore minor ABI versions may be incompatible.
> 
That is a single line though :-)
 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 15:26  0%   ` Thomas Monjalon
@ 2020-07-07 16:35  3%     ` Kinsella, Ray
  2020-07-07 16:36  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 16:26, Thomas Monjalon wrote:
> 07/07/2020 16:45, Ray Kinsella:
>> Clarify retention period for aliases to experimental.
>>
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> ---
>> --- a/doc/guides/contributing/abi_versioning.rst
>> +++ b/doc/guides/contributing/abi_versioning.rst
>> @@ -158,7 +158,7 @@ The macros exported are:
>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>    The macro is used when a symbol matures to become part of the stable ABI, to
>> -  provide an alias to experimental for some time.
>> +  provide an alias to experimental until the next major ABI version.
> 
> Why limiting the period for experimental status?
> Some API want to remain experimental longer.
> 
> [...]
>>  In situations in which an ``experimental`` symbol has been stable for some time,
>>  and it becomes a candidate for promotion to the stable ABI. At this time, when
>> -promoting the symbol, maintainer may choose to provide an alias to the
>> -``experimental`` symbol version, so as not to break consuming applications.
>> +promoting the symbol, the maintainer may choose to provide an alias to the
>> +``experimental`` symbol version, so as not to break consuming applications. This
> 
> Please start a sentence on a new line.

ACK

> 
>> +alias will then typically be dropped in the next major ABI version.
> 
> I don't see the need for the time estimation.
> 
> 

Will reword to ...

"This alias will then be dropped in the next major ABI version."

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:35  3%     ` Kinsella, Ray
@ 2020-07-07 16:36  0%       ` Thomas Monjalon
  2020-07-07 16:37  0%         ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 16:36 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 18:35, Kinsella, Ray:
> On 07/07/2020 16:26, Thomas Monjalon wrote:
> > 07/07/2020 16:45, Ray Kinsella:
> >> Clarify retention period for aliases to experimental.
> >>
> >> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >> ---
> >> --- a/doc/guides/contributing/abi_versioning.rst
> >> +++ b/doc/guides/contributing/abi_versioning.rst
> >> @@ -158,7 +158,7 @@ The macros exported are:
> >>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>    The macro is used when a symbol matures to become part of the stable ABI, to
> >> -  provide an alias to experimental for some time.
> >> +  provide an alias to experimental until the next major ABI version.
> > 
> > Why limiting the period for experimental status?
> > Some API want to remain experimental longer.
> > 
> > [...]
> >>  In situations in which an ``experimental`` symbol has been stable for some time,
> >>  and it becomes a candidate for promotion to the stable ABI. At this time, when
> >> -promoting the symbol, maintainer may choose to provide an alias to the
> >> -``experimental`` symbol version, so as not to break consuming applications.
> >> +promoting the symbol, the maintainer may choose to provide an alias to the
> >> +``experimental`` symbol version, so as not to break consuming applications. This
> > 
> > Please start a sentence on a new line.
> 
> ACK
> 
> > 
> >> +alias will then typically be dropped in the next major ABI version.
> > 
> > I don't see the need for the time estimation.
> > 
> > 
> 
> Will reword to ...
> 
> "This alias will then be dropped in the next major ABI version."

It is not addressing my first comment. Please see above.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:36  0%       ` Thomas Monjalon
@ 2020-07-07 16:37  0%         ` Kinsella, Ray
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
  2020-07-07 16:57  0%           ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 16:37 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 17:36, Thomas Monjalon wrote:
> 07/07/2020 18:35, Kinsella, Ray:
>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>> 07/07/2020 16:45, Ray Kinsella:
>>>> Clarify retention period for aliases to experimental.
>>>>
>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>> ---
>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
>>>> -  provide an alias to experimental for some time.
>>>> +  provide an alias to experimental until the next major ABI version.
>>>
>>> Why limiting the period for experimental status?
>>> Some API want to remain experimental longer.
>>>
>>> [...]
>>>>  In situations in which an ``experimental`` symbol has been stable for some time,
>>>>  and it becomes a candidate for promotion to the stable ABI. At this time, when
>>>> -promoting the symbol, maintainer may choose to provide an alias to the
>>>> -``experimental`` symbol version, so as not to break consuming applications.
>>>> +promoting the symbol, the maintainer may choose to provide an alias to the
>>>> +``experimental`` symbol version, so as not to break consuming applications. This
>>>
>>> Please start a sentence on a new line.
>>
>> ACK
>>
>>>
>>>> +alias will then typically be dropped in the next major ABI version.
>>>
>>> I don't see the need for the time estimation.
>>>
>>>
>>
>> Will reword to ...
>>
>> "This alias will then be dropped in the next major ABI version."
> 
> It is not addressing my first comment. Please see above.
> 

Thank you, I don't necessarily agree with the first comment :-)
We need to say when the alias should be dropped no?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:37  0%         ` Kinsella, Ray
@ 2020-07-07 16:55  0%           ` Honnappa Nagarahalli
  2020-07-07 17:00  0%             ` Thomas Monjalon
  2020-07-07 16:57  0%           ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-07 16:55 UTC (permalink / raw)
  To: Kinsella, Ray, thomas
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, Honnappa Nagarahalli, nd, nd

<snip>

> Subject: Re: [PATCH v1 2/2] doc: clarify alias to experimental period
> 
> 
> 
> On 07/07/2020 17:36, Thomas Monjalon wrote:
> > 07/07/2020 18:35, Kinsella, Ray:
> >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>> 07/07/2020 16:45, Ray Kinsella:
> >>>> Clarify retention period for aliases to experimental.
> >>>>
> >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>> ---
> >>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
> table entry
> >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
> function ``be``.
> >>>>    The macro is used when a symbol matures to become part of the
> >>>> stable ABI, to
> >>>> -  provide an alias to experimental for some time.
> >>>> +  provide an alias to experimental until the next major ABI version.
> >>>
> >>> Why limiting the period for experimental status?
> >>> Some API want to remain experimental longer.
This is not limiting the period. This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place for a symbol after the experimental tag is removed for the symbol.

> >>>
> >>> [...]
> >>>>  In situations in which an ``experimental`` symbol has been stable
> >>>> for some time,  and it becomes a candidate for promotion to the
> >>>> stable ABI. At this time, when -promoting the symbol, maintainer
> >>>> may choose to provide an alias to the -``experimental`` symbol version,
> so as not to break consuming applications.
> >>>> +promoting the symbol, the maintainer may choose to provide an
> >>>> +alias to the ``experimental`` symbol version, so as not to break
> >>>> +consuming applications. This
> >>>
> >>> Please start a sentence on a new line.
> >>
> >> ACK
> >>
> >>>
> >>>> +alias will then typically be dropped in the next major ABI version.
> >>>
> >>> I don't see the need for the time estimation.
I prefer this wording as it clarifying what should be done while creating a patch.

> >>>
> >>>
> >>
> >> Will reword to ...
> >>
> >> "This alias will then be dropped in the next major ABI version."
> >
> > It is not addressing my first comment. Please see above.
> >
> 
> Thank you, I don't necessarily agree with the first comment :-) We need to say
> when the alias should be dropped no?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:37  0%         ` Kinsella, Ray
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
@ 2020-07-07 16:57  0%           ` Thomas Monjalon
  2020-07-07 17:01  4%             ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 16:57 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 18:37, Kinsella, Ray:
> 
> On 07/07/2020 17:36, Thomas Monjalon wrote:
> > 07/07/2020 18:35, Kinsella, Ray:
> >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>> 07/07/2020 16:45, Ray Kinsella:
> >>>> Clarify retention period for aliases to experimental.
> >>>>
> >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>> ---
> >>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>>>    The macro is used when a symbol matures to become part of the stable ABI, to
> >>>> -  provide an alias to experimental for some time.
> >>>> +  provide an alias to experimental until the next major ABI version.
> >>>
> >>> Why limiting the period for experimental status?
> >>> Some API want to remain experimental longer.
> >>>
> >>> [...]
> >>>> +alias will then typically be dropped in the next major ABI version.
> >>>
> >>> I don't see the need for the time estimation.
> >>
> >> Will reword to ...
> >>
> >> "This alias will then be dropped in the next major ABI version."
> > 
> > It is not addressing my first comment. Please see above.
> 
> Thank you, I don't necessarily agree with the first comment :-)

You don't have to agree. But in this case we must discuss :-)

> We need to say when the alias should be dropped no?

I don't think so.
Until now, it is let to the appreciation of the maintainer.
If we want to change the rule, especially for experimental period,
it must be said clearly and debated.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:55  0%           ` Honnappa Nagarahalli
@ 2020-07-07 17:00  0%             ` Thomas Monjalon
  2020-07-07 17:01  0%               ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-07 17:00 UTC (permalink / raw)
  To: Kinsella, Ray, Honnappa Nagarahalli
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, nd

07/07/2020 18:55, Honnappa Nagarahalli:
> > On 07/07/2020 17:36, Thomas Monjalon wrote:
> > > 07/07/2020 18:35, Kinsella, Ray:
> > >> On 07/07/2020 16:26, Thomas Monjalon wrote:
> > >>> 07/07/2020 16:45, Ray Kinsella:
> > >>>> Clarify retention period for aliases to experimental.
> > >>>>
> > >>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> > >>>> ---
> > >>>> --- a/doc/guides/contributing/abi_versioning.rst
> > >>>> +++ b/doc/guides/contributing/abi_versioning.rst
> > >>>> @@ -158,7 +158,7 @@ The macros exported are:
> > >>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
> > table entry
> > >>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
> > function ``be``.
> > >>>>    The macro is used when a symbol matures to become part of the
> > >>>> stable ABI, to
> > >>>> -  provide an alias to experimental for some time.
> > >>>> +  provide an alias to experimental until the next major ABI version.
> > >>>
> > >>> Why limiting the period for experimental status?
> > >>> Some API want to remain experimental longer.
> 
> This is not limiting the period.
> This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place
> for a symbol after the experimental tag is removed for the symbol.

Oh wait, I was wrong. It is only about the alias which is set
AFTER the experimental period.

> > >>> [...]
> > >>>>  In situations in which an ``experimental`` symbol has been stable
> > >>>> for some time,  and it becomes a candidate for promotion to the
> > >>>> stable ABI. At this time, when -promoting the symbol, maintainer
> > >>>> may choose to provide an alias to the -``experimental`` symbol version,
> > so as not to break consuming applications.
> > >>>> +promoting the symbol, the maintainer may choose to provide an
> > >>>> +alias to the ``experimental`` symbol version, so as not to break
> > >>>> +consuming applications. This
> > >>>
> > >>> Please start a sentence on a new line.
> > >>
> > >> ACK
> > >>
> > >>>
> > >>>> +alias will then typically be dropped in the next major ABI version.
> > >>>
> > >>> I don't see the need for the time estimation.
> 
> I prefer this wording as it clarifying what should be done while creating a patch.

Yes, after a second read, I am OK.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 16:57  0%           ` Thomas Monjalon
@ 2020-07-07 17:01  4%             ` Kinsella, Ray
  2020-07-07 17:08  0%               ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-07 17:01 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson



On 07/07/2020 17:57, Thomas Monjalon wrote:
> 07/07/2020 18:37, Kinsella, Ray:
>>
>> On 07/07/2020 17:36, Thomas Monjalon wrote:
>>> 07/07/2020 18:35, Kinsella, Ray:
>>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>>>> 07/07/2020 16:45, Ray Kinsella:
>>>>>> Clarify retention period for aliases to experimental.
>>>>>>
>>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>> ---
>>>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
>>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
>>>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
>>>>>> -  provide an alias to experimental for some time.
>>>>>> +  provide an alias to experimental until the next major ABI version.
>>>>>
>>>>> Why limiting the period for experimental status?
>>>>> Some API want to remain experimental longer.
>>>>>
>>>>> [...]
>>>>>> +alias will then typically be dropped in the next major ABI version.
>>>>>
>>>>> I don't see the need for the time estimation.
>>>>
>>>> Will reword to ...
>>>>
>>>> "This alias will then be dropped in the next major ABI version."
>>>
>>> It is not addressing my first comment. Please see above.
>>
>> Thank you, I don't necessarily agree with the first comment :-)
> 
> You don't have to agree. But in this case we must discuss :-)
> 
>> We need to say when the alias should be dropped no?
> 
> I don't think so.
> Until now, it is let to the appreciation of the maintainer.
> If we want to change the rule, especially for experimental period,
> it must be said clearly and debated.

It doesn't make _any_ sense to maintain an alias after the new ABI.

The alias is there to maintain ABI compatibility, 
there is no reason to maintain compatibility in the new ABI - so it should be dropped

 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 17:00  0%             ` Thomas Monjalon
@ 2020-07-07 17:01  0%               ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-07 17:01 UTC (permalink / raw)
  To: Thomas Monjalon, Honnappa Nagarahalli
  Cc: dev, fady, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	david.marchand, bruce.richardson, nd



On 07/07/2020 18:00, Thomas Monjalon wrote:
> 07/07/2020 18:55, Honnappa Nagarahalli:
>>> On 07/07/2020 17:36, Thomas Monjalon wrote:
>>>> 07/07/2020 18:35, Kinsella, Ray:
>>>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
>>>>>> 07/07/2020 16:45, Ray Kinsella:
>>>>>>> Clarify retention period for aliases to experimental.
>>>>>>>
>>>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>>>>>>> ---
>>>>>>> --- a/doc/guides/contributing/abi_versioning.rst
>>>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
>>>>>>> @@ -158,7 +158,7 @@ The macros exported are:
>>>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version
>>> table entry
>>>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal
>>> function ``be``.
>>>>>>>    The macro is used when a symbol matures to become part of the
>>>>>>> stable ABI, to
>>>>>>> -  provide an alias to experimental for some time.
>>>>>>> +  provide an alias to experimental until the next major ABI version.
>>>>>>
>>>>>> Why limiting the period for experimental status?
>>>>>> Some API want to remain experimental longer.
>>
>> This is not limiting the period.
>> This is about how long VERSION_SYMBOL_EXPERIMENTAL should be in place
>> for a symbol after the experimental tag is removed for the symbol.
> 
> Oh wait, I was wrong. It is only about the alias which is set
> AFTER the experimental period.
> 
>>>>>> [...]
>>>>>>>  In situations in which an ``experimental`` symbol has been stable
>>>>>>> for some time,  and it becomes a candidate for promotion to the
>>>>>>> stable ABI. At this time, when -promoting the symbol, maintainer
>>>>>>> may choose to provide an alias to the -``experimental`` symbol version,
>>> so as not to break consuming applications.
>>>>>>> +promoting the symbol, the maintainer may choose to provide an
>>>>>>> +alias to the ``experimental`` symbol version, so as not to break
>>>>>>> +consuming applications. This
>>>>>>
>>>>>> Please start a sentence on a new line.
>>>>>
>>>>> ACK
>>>>>
>>>>>>
>>>>>>> +alias will then typically be dropped in the next major ABI version.
>>>>>>
>>>>>> I don't see the need for the time estimation.
>>
>> I prefer this wording as it clarifying what should be done while creating a patch.
> 
> Yes, after a second read, I am OK.
> 
perfect, I will sort out the other bits. 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period
  2020-07-07 17:01  4%             ` Kinsella, Ray
@ 2020-07-07 17:08  0%               ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-07 17:08 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon, david.marchand, bruce.richardson

07/07/2020 19:01, Kinsella, Ray:
> On 07/07/2020 17:57, Thomas Monjalon wrote:
> > 07/07/2020 18:37, Kinsella, Ray:
> >> On 07/07/2020 17:36, Thomas Monjalon wrote:
> >>> 07/07/2020 18:35, Kinsella, Ray:
> >>>> On 07/07/2020 16:26, Thomas Monjalon wrote:
> >>>>> 07/07/2020 16:45, Ray Kinsella:
> >>>>>> Clarify retention period for aliases to experimental.
> >>>>>>
> >>>>>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> >>>>>> ---
> >>>>>> --- a/doc/guides/contributing/abi_versioning.rst
> >>>>>> +++ b/doc/guides/contributing/abi_versioning.rst
> >>>>>> @@ -158,7 +158,7 @@ The macros exported are:
> >>>>>>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
> >>>>>>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
> >>>>>>    The macro is used when a symbol matures to become part of the stable ABI, to
> >>>>>> -  provide an alias to experimental for some time.
> >>>>>> +  provide an alias to experimental until the next major ABI version.
> >>>>>
> >>>>> Why limiting the period for experimental status?
> >>>>> Some API want to remain experimental longer.
> >>>>>
> >>>>> [...]
> >>>>>> +alias will then typically be dropped in the next major ABI version.
> >>>>>
> >>>>> I don't see the need for the time estimation.
> >>>>
> >>>> Will reword to ...
> >>>>
> >>>> "This alias will then be dropped in the next major ABI version."
> >>>
> >>> It is not addressing my first comment. Please see above.
> >>
> >> Thank you, I don't necessarily agree with the first comment :-)
> > 
> > You don't have to agree. But in this case we must discuss :-)
> > 
> >> We need to say when the alias should be dropped no?
> > 
> > I don't think so.
> > Until now, it is let to the appreciation of the maintainer.
> > If we want to change the rule, especially for experimental period,
> > it must be said clearly and debated.
> 
> It doesn't make _any_ sense to maintain an alias after the new ABI.
> 
> The alias is there to maintain ABI compatibility, 
> there is no reason to maintain compatibility in the new ABI - so it should be dropped

Yes I was wrong, sorry.



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
@ 2020-07-07 17:51 24%   ` Ray Kinsella
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 17:51 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Minor changes to the abi policy for windows.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_policy.rst | 4 +++-
 doc/guides/windows_gsg/intro.rst       | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index d0affa9..4452362 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -40,7 +40,9 @@ General Guidelines
    maintaining ABI stability through one year of DPDK releases starting from
    DPDK 19.11. This policy will be reviewed in 2020, with intention of
    lengthening the stability period. Additional implementation detail can be
-   found in the :ref:`release notes <20_02_abi_changes>`.
+   found in the :ref:`release notes <20_02_abi_changes>`. Please note that this
+   policy does not currently apply to the :doc:`Windows build
+   <../windows_gsg/intro>`.
 
 What is an ABI?
 ~~~~~~~~~~~~~~~
diff --git a/doc/guides/windows_gsg/intro.rst b/doc/guides/windows_gsg/intro.rst
index 58c6246..4ac7f97 100644
--- a/doc/guides/windows_gsg/intro.rst
+++ b/doc/guides/windows_gsg/intro.rst
@@ -19,6 +19,6 @@ compile. Support is being added in pieces so as to limit the overall scope
 of any individual patch series. The goal is to be able to run any DPDK
 application natively on Windows.
 
-The :doc:`../contributing/abi_policy` cannot be respected for Windows.
-Minor ABI versions may be incompatible
-because function versioning is not supported on Windows.
+The :doc:`../contributing/abi_policy` does not apply to the Windows build,
+as function versioning is not supported on Windows,
+therefore minor ABI versions may be incompatible.
-- 
2.7.4


^ permalink raw reply	[relevance 24%]

* [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
  2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 17:50  8% ` Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
                     ` (2 more replies)
  2 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2020-07-07 17:50 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Few documentation fixexs, clarifing the Windows ABI policy and aliases to
experimental mode.

Ray Kinsella (2):
  doc: reword abi policy for windows
  doc: clarify alias to experimental period

v2:
  Addressed feedback from Thomas Monjalon.

 doc/guides/contributing/abi_policy.rst     | 4 +++-
 doc/guides/contributing/abi_versioning.rst | 5 +++--
 doc/guides/windows_gsg/intro.rst           | 6 +++---
 3 files changed, 9 insertions(+), 6 deletions(-)

--
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
@ 2020-07-07 17:51 12%   ` Ray Kinsella
  2020-07-07 18:44  0%     ` Honnappa Nagarahalli
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2020-07-07 17:51 UTC (permalink / raw)
  To: dev
  Cc: fady, thomas, Honnappa.Nagarahalli, Ray Kinsella, Neil Horman,
	John McNamara, Marko Kovacevic, Harini Ramakrishnan,
	Omar Cardona, Pallavi Kadam, Ranjit Menon

Clarify retention period for aliases to experimental.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_versioning.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/guides/contributing/abi_versioning.rst b/doc/guides/contributing/abi_versioning.rst
index 31a9205..b1d09c7 100644
--- a/doc/guides/contributing/abi_versioning.rst
+++ b/doc/guides/contributing/abi_versioning.rst
@@ -158,7 +158,7 @@ The macros exported are:
 * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table entry
   binding versioned symbol ``b@EXPERIMENTAL`` to the internal function ``be``.
   The macro is used when a symbol matures to become part of the stable ABI, to
-  provide an alias to experimental for some time.
+  provide an alias to experimental until the next major ABI version.
 
 .. _example_abi_macro_usage:
 
@@ -428,8 +428,9 @@ _____________________________
 
 In situations in which an ``experimental`` symbol has been stable for some time,
 and it becomes a candidate for promotion to the stable ABI. At this time, when
-promoting the symbol, maintainer may choose to provide an alias to the
+promoting the symbol, the maintainer may choose to provide an alias to the
 ``experimental`` symbol version, so as not to break consuming applications.
+This alias is then dropped in the next major ABI version.
 
 The process to provide an alias to ``experimental`` is similar to that, of
 :ref:`symbol versioning <example_abi_macro_usage>` described above.
-- 
2.7.4


^ permalink raw reply	[relevance 12%]

* Re: [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-07 18:44  0%     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2020-07-07 18:44 UTC (permalink / raw)
  To: Ray Kinsella, dev
  Cc: fady, thomas, Neil Horman, John McNamara, Marko Kovacevic,
	Harini Ramakrishnan, Omar Cardona, Pallavi Kadam, Ranjit Menon,
	Honnappa Nagarahalli, nd, nd

<snip>

> Subject: [PATCH v2 2/2] doc: clarify alias to experimental period
> 
> Clarify retention period for aliases to experimental.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  doc/guides/contributing/abi_versioning.rst | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/doc/guides/contributing/abi_versioning.rst
> b/doc/guides/contributing/abi_versioning.rst
> index 31a9205..b1d09c7 100644
> --- a/doc/guides/contributing/abi_versioning.rst
> +++ b/doc/guides/contributing/abi_versioning.rst
> @@ -158,7 +158,7 @@ The macros exported are:
>  * ``VERSION_SYMBOL_EXPERIMENTAL(b, e)``: Creates a symbol version table
> entry
>    binding versioned symbol ``b@EXPERIMENTAL`` to the internal function
> ``be``.
>    The macro is used when a symbol matures to become part of the stable ABI,
> to
> -  provide an alias to experimental for some time.
> +  provide an alias to experimental until the next major ABI version.
> 
>  .. _example_abi_macro_usage:
> 
> @@ -428,8 +428,9 @@ _____________________________
> 
>  In situations in which an ``experimental`` symbol has been stable for some
> time,  and it becomes a candidate for promotion to the stable ABI. At this
> time, when -promoting the symbol, maintainer may choose to provide an
> alias to the
> +promoting the symbol, the maintainer may choose to provide an alias to
> +the
>  ``experimental`` symbol version, so as not to break consuming applications.
> +This alias is then dropped in the next major ABI version.
> 
>  The process to provide an alias to ``experimental`` is similar to that,
> of  :ref:`symbol versioning <example_abi_macro_usage>` described above.
> --
> 2.7.4


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
  2020-06-29 23:56  0%           ` Dmitry Kozlyuk
@ 2020-07-08  1:09  0%             ` Dmitry Kozlyuk
  0 siblings, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2020-07-08  1:09 UTC (permalink / raw)
  To: Tal Shnaiderman
  Cc: Ranjit Menon, Fady Bader, dev, Dmitry Malloy,
	Narcisa Ana Maria Vasile, Thomas Monjalon, Olivier Matz

On Tue, 30 Jun 2020 02:56:20 +0300, Dmitry Kozlyuk wrote:
> On Mon, 29 Jun 2020 08:12:51 +0000, Tal Shnaiderman wrote:
> > > From: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > > Subject: Re: [dpdk-dev] [PATCH 6/7] cmdline: support Windows
> > > 
> > > On Sun, 28 Jun 2020 23:23:11 -0700, Ranjit Menon wrote:    
[snip]
> > > > The issue is that UINT8, UINT16, INT32, INT64 etc. are reserved types
> > > > in Windows headers for integer types. We found that it is easier to
> > > > change the enum in cmdline_parse_num.h than try to play with the
> > > > include order of headers. AFAIK, the enums were only used to determine
> > > > the type in a series of switch() statements in librte_cmdline, so we
> > > > simply renamed the enums. Not sure, if that will be acceptable here.    
> > > 
> > > +1 for renaming enum values. It's not a problem of librte_cmdline itself
> > > +but a
> > > problem of its consumption on Windows, however renaming enum values
> > > doesn't break ABI and winn make librte_cmdline API "namespaced".
> > > 
[snip]
> > 
> > test_pmd redefine BOOLEAN and PATTERN in the index enum, I'm not sure how many more conflicts we will face because of this huge include.
> >
> > Also, DPDK applications will inherit it unknowingly, not sure if this is common for windows libraries.  
> 
> I never hit these particular conflicts, but you're right that there will be
> more, e.g. I remember particularly nasty clashes in failsafe PMD, unrelated
> to cmdline token names.

Still, I'd go for renaming, with or without additional steps to hide
<windows.h>. Although I wouldn't include it in this series: renaming will
touch numerous places and require much more reviewers.

> We could take the same approach as with networking headers: copy required
> declarations instead of including them from SDK. Here's a list of what
> pthread.h uses:

While this will resolve the issue for DPDK code, applications using DPDK
headers can easily hit it by including <windows.h> on their own. On the other
hand, they can always split translation units and I don't know how practical
it is to use system and DPDK networking headers at the same time.

-- 
Dmitry Kozlyuk

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
@ 2020-07-08  5:11  3%   ` Phil Yang
  2020-07-08 11:44  0%   ` Olivier Matz
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-08  5:11 UTC (permalink / raw)
  To: Phil Yang, david.marchand, dev
  Cc: drc, Honnappa Nagarahalli, olivier.matz, Ruifeng Wang, nd

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Phil Yang
> Sent: Tuesday, July 7, 2020 6:11 PM
> To: david.marchand@redhat.com; dev@dpdk.org
> Cc: drc@linux.vnet.ibm.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; olivier.matz@6wind.com; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
> 
> Use C11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 

<snip>

> diff --git a/lib/librte_mbuf/rte_mbuf_core.h
> b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..806313a 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -18,7 +18,6 @@
> 
>  #include <stdint.h>
>  #include <rte_compat.h>
> -#include <generic/rte_atomic.h>
> 
>  #ifdef __cplusplus
>  extern "C" {
> @@ -495,12 +494,8 @@ struct rte_mbuf {
>  	 * or non-atomic) is controlled by the
> CONFIG_RTE_MBUF_REFCNT_ATOMIC
>  	 * config option.
>  	 */
> -	RTE_STD_C11
> -	union {
> -		rte_atomic16_t refcnt_atomic; /**< Atomically accessed
> refcnt */
> -		/** Non-atomically accessed refcnt */
> -		uint16_t refcnt;
> -	};
> +	uint16_t refcnt;
> +
>  	uint16_t nb_segs;         /**< Number of segments. */
> 
>  	/** Input port (16 bits to support more than 256 virtual ports).
> @@ -679,7 +674,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */

It still causes an ABI check failure in Travis CI on this type change.
I think we need an exception in libabigail.abignore for this. 

Thanks,
Phil
>  };
> 
>  /**< Maximum number of nb_segs allowed. */
> --
> 2.7.4


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
@ 2020-07-08 10:22 25% David Marchand
  2020-07-08 13:09  7% ` Kinsella, Ray
                   ` (4 more replies)
  0 siblings, 5 replies; 200+ results
From: David Marchand @ 2020-07-08 10:22 UTC (permalink / raw)
  To: dev; +Cc: thomas, dodji, Ray Kinsella, Neil Horman

abidiff can provide some more information about the ABI difference it
detected.
In all cases, a discussion on the mailing must happen but we can give
some hints to know if this is a problem with the script calling abidiff,
a potential ABI breakage or an unambiguous ABI breakage.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 devtools/check-abi.sh | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index e17fedbd9f..521e2cce7c 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
 		error=1
 		continue
 	fi
-	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
+	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
+		abiret=$?
 		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
 		error=1
-	fi
+		echo
+		if [ $(($abiret & 3)) != 0 ]; then
+			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
+		fi
+		if [ $(($abiret & 4)) != 0 ]; then
+			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
+		fi
+		if [ $(($abiret & 8)) != 0 ]; then
+			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
+		fi
+		echo
+	}
 done
 
 [ -z "$error" ] || [ -n "$warnonly" ]
-- 
2.23.0


^ permalink raw reply	[relevance 25%]

* Re: [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
  2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
  2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
@ 2020-07-08 10:32  7%   ` Thomas Monjalon
  2020-07-08 12:02  4%     ` Kinsella, Ray
  2 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-08 10:32 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon

07/07/2020 19:50, Ray Kinsella:
> Few documentation fixexs, clarifing the Windows ABI policy and aliases to
> experimental mode.
> 
> Ray Kinsella (2):
>   doc: reword abi policy for windows
>   doc: clarify alias to experimental period
> 
> v2:
>   Addressed feedback from Thomas Monjalon.

One more sentence needs to start on its line,
avoiding to split a link on two lines.

Reworded titles with uppercases as well:
	doc: reword ABI policy for Windows
	doc: clarify period of alias to experimental symbol

Applied with above changes, thanks



^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
@ 2020-07-08 11:44  0%   ` Olivier Matz
  2020-07-09 10:00  3%     ` Phil Yang
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-08 11:44 UTC (permalink / raw)
  To: Phil Yang
  Cc: david.marchand, dev, drc, Honnappa.Nagarahalli, ruifeng.wang, nd

Hi,

On Tue, Jul 07, 2020 at 06:10:33PM +0800, Phil Yang wrote:
> Use C11 atomics with explicit ordering instead of rte_atomic ops which
> enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 
>  lib/librte_mbuf/rte_mbuf.c      |  1 -
>  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
>  lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
>  3 files changed, 13 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index ae91ae2..8a456e5 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -22,7 +22,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_mbuf.h>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f8e492e..4a7a98c 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -37,7 +37,6 @@
>  #include <rte_config.h>
>  #include <rte_mempool.h>
>  #include <rte_memory.h>
> -#include <rte_atomic.h>
>  #include <rte_prefetch.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_byteorder.h>
> @@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
>  static inline uint16_t
>  rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  {
> -	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
> +	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  static inline void
>  rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  {
> -	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /* internal */
>  static inline uint16_t
>  __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>  {
> -	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
> +	return (uint16_t)(__atomic_add_fetch((int16_t *)&m->refcnt, value,
> +					__ATOMIC_ACQ_REL));
>  }
>  
>  /**
> @@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  static inline uint16_t
>  rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>  {
> -	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
> +	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -481,7 +481,7 @@ static inline void
>  rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>  	uint16_t new_value)
>  {
> -	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
>  		return (uint16_t)value;
>  	}
>  
> -	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
> +	return (uint16_t)(__atomic_add_fetch((int16_t *)&shinfo->refcnt_atomic,
> +					    value, __ATOMIC_ACQ_REL));
>  }
>  
>  /** Mbuf prefetch */
> @@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>  	 * Direct usage of add primitive to avoid
>  	 * duplication of comparing with one.
>  	 */
> -	if (likely(rte_atomic16_add_return
> -			(&shinfo->refcnt_atomic, -1)))
> +	if (likely(__atomic_add_fetch((int *)&shinfo->refcnt_atomic, -1,
> +				     __ATOMIC_ACQ_REL)))
>  		return 1;
>  
>  	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..806313a 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -18,7 +18,6 @@
>  
>  #include <stdint.h>
>  #include <rte_compat.h>
> -#include <generic/rte_atomic.h>
>  
>  #ifdef __cplusplus
>  extern "C" {
> @@ -495,12 +494,8 @@ struct rte_mbuf {
>  	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
>  	 * config option.
>  	 */
> -	RTE_STD_C11
> -	union {
> -		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
> -		/** Non-atomically accessed refcnt */
> -		uint16_t refcnt;
> -	};
> +	uint16_t refcnt;
> +

It seems this patch does 2 things:
- remove refcnt_atomic
- use C11 atomics

The first change is an API break. I think it should be announced in a deprecation
notice. The one about atomic does not talk about it.

So I suggest to keep refcnt_atomic until next version.


>  	uint16_t nb_segs;         /**< Number of segments. */
>  
>  	/** Input port (16 bits to support more than 256 virtual ports).
> @@ -679,7 +674,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
>  };
>  
>  /**< Maximum number of nb_segs allowed. */
> -- 
> 2.7.4
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes
  2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
@ 2020-07-08 12:02  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 12:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, fady, Honnappa.Nagarahalli, Neil Horman, John McNamara,
	Marko Kovacevic, Harini Ramakrishnan, Omar Cardona,
	Pallavi Kadam, Ranjit Menon



On 08/07/2020 11:32, Thomas Monjalon wrote:
> 07/07/2020 19:50, Ray Kinsella:
>> Few documentation fixexs, clarifing the Windows ABI policy and aliases to
>> experimental mode.
>>
>> Ray Kinsella (2):
>>   doc: reword abi policy for windows
>>   doc: clarify alias to experimental period
>>
>> v2:
>>   Addressed feedback from Thomas Monjalon.
> 
> One more sentence needs to start on its line,
> avoiding to split a link on two lines.

ah yes, missed that one sorry.
> 
> Reworded titles with uppercases as well:
> 	doc: reword ABI policy for Windows
> 	doc: clarify period of alias to experimental symbol
> 
> Applied with above changes, thanks
> 
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  @ 2020-07-08 12:29  3%   ` David Marchand
  2020-07-08 13:43  0%     ` Aaron Conole
  2020-07-08 15:04  0%     ` Kinsella, Ray
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2020-07-08 12:29 UTC (permalink / raw)
  To: Phil Yang, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Ray Kinsella, Harman Kalra

On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared
> between threads. Use c11 atomics with explicit ordering instead
> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>  2 files changed, 34 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
> index 773a34a..b1e8a29 100644
> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> @@ -59,7 +59,7 @@ enum {
>
>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>  struct rte_epoll_event {
> -       volatile uint32_t status;  /**< OUT: event status */
> +       uint32_t status;           /**< OUT: event status */
>         int fd;                    /**< OUT: event fd */
>         int epfd;       /**< OUT: epoll instance the ev associated with */
>         struct rte_epoll_data epdata;

I got a reject from the ABI check in my env.

1 function with some indirect sub-type change:

  [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_pci_device*' has sub-type changes:
      in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
        type size hasn't changed
        1 data member changes (2 filtered):
         type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
           type size hasn't changed
           1 data member change:
            type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
              array element type 'struct rte_epoll_event' changed:
                type size hasn't changed
                1 data member change:
                 type of 'volatile uint32_t rte_epoll_event::status' changed:
                   entity changed from 'volatile uint32_t' to 'typedef
uint32_t' at stdint-uintn.h:26:1
                   type size hasn't changed

              type size hasn't changed


This is probably harmless in our case (going from volatile to non
volatile), but it won't pass the check in the CI without an exception
rule.

Note: checking on the test-report ml, I saw nothing, but ovsrobot did
catch the issue with this change too, Aaron?


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
@ 2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-08 13:15  4%   ` David Marchand
  2020-07-08 13:45  7%   ` Aaron Conole
  2020-07-09 15:52  4% ` Dodji Seketeli
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 13:09 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, dodji, Neil Horman, Aaron Conole

+ Aaron

On 08/07/2020 11:22, David Marchand wrote:
> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  devtools/check-abi.sh | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> index e17fedbd9f..521e2cce7c 100755
> --- a/devtools/check-abi.sh
> +++ b/devtools/check-abi.sh
> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>  		error=1
>  		continue
>  	fi
> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> +		abiret=$?
>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>  		error=1
> -	fi
> +		echo
> +		if [ $(($abiret & 3)) != 0 ]; then
> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> +		fi
> +		if [ $(($abiret & 4)) != 0 ]; then
> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> +		fi
> +		if [ $(($abiret & 8)) != 0 ]; then
> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> +		fi
> +		echo
> +	}
>  done
>  
>  [ -z "$error" ] || [ -n "$warnonly" ]
> 

This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
At the moment it takes time to find the failure reason in the Travis log.

Ray K

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:09  7% ` Kinsella, Ray
@ 2020-07-08 13:15  4%   ` David Marchand
  2020-07-08 13:22  4%     ` Kinsella, Ray
  2020-07-08 13:45  7%   ` Aaron Conole
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-07-08 13:15 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: dev, Thomas Monjalon, Dodji Seketeli, Neil Horman, Aaron Conole

On Wed, Jul 8, 2020 at 3:09 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
> + Aaron
>
> On 08/07/2020 11:22, David Marchand wrote:
> > abidiff can provide some more information about the ABI difference it
> > detected.
> > In all cases, a discussion on the mailing must happen but we can give
> > some hints to know if this is a problem with the script calling abidiff,
> > a potential ABI breakage or an unambiguous ABI breakage.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > ---
> >  devtools/check-abi.sh | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> > index e17fedbd9f..521e2cce7c 100755
> > --- a/devtools/check-abi.sh
> > +++ b/devtools/check-abi.sh
> > @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
> >               error=1
> >               continue
> >       fi
> > -     if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> > +     abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> > +             abiret=$?
> >               echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
> >               error=1
> > -     fi
> > +             echo
> > +             if [ $(($abiret & 3)) != 0 ]; then
> > +                     echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."

Forgot to --amend.
Hopefully yes, this will be reported to dev@dpdk.org... I wanted to
highlight this could be a script or env issue.


> > +             fi
> > +             if [ $(($abiret & 4)) != 0 ]; then
> > +                     echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> > +             fi
> > +             if [ $(($abiret & 8)) != 0 ]; then
> > +                     echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> > +             fi
> > +             echo
> > +     }
> >  done
> >
> >  [ -z "$error" ] || [ -n "$warnonly" ]
> >
>
> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
> At the moment it takes time to find the failure reason in the Travis log.

I usually look for "FILES_TO" to get to the last error.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:15  4%   ` David Marchand
@ 2020-07-08 13:22  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 13:22 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Dodji Seketeli, Neil Horman, Aaron Conole



On 08/07/2020 14:15, David Marchand wrote:
> On Wed, Jul 8, 2020 at 3:09 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>> + Aaron
>>
>> On 08/07/2020 11:22, David Marchand wrote:
>>> abidiff can provide some more information about the ABI difference it
>>> detected.
>>> In all cases, a discussion on the mailing must happen but we can give
>>> some hints to know if this is a problem with the script calling abidiff,
>>> a potential ABI breakage or an unambiguous ABI breakage.
>>>
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>>> index e17fedbd9f..521e2cce7c 100755
>>> --- a/devtools/check-abi.sh
>>> +++ b/devtools/check-abi.sh
>>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>>               error=1
>>>               continue
>>>       fi
>>> -     if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>>> +     abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>>> +             abiret=$?
>>>               echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>>               error=1
>>> -     fi
>>> +             echo
>>> +             if [ $(($abiret & 3)) != 0 ]; then
>>> +                     echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> 
> Forgot to --amend.
> Hopefully yes, this will be reported to dev@dpdk.org... I wanted to
> highlight this could be a script or env issue.
> 
> 
>>> +             fi
>>> +             if [ $(($abiret & 4)) != 0 ]; then
>>> +                     echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>>> +             fi
>>> +             if [ $(($abiret & 8)) != 0 ]; then
>>> +                     echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>>> +             fi
>>> +             echo
>>> +     }
>>>  done
>>>
>>>  [ -z "$error" ] || [ -n "$warnonly" ]
>>>
>>
>> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
>> At the moment it takes time to find the failure reason in the Travis log.
> 
> I usually look for "FILES_TO" to get to the last error.
> 
Right, but there is hopefully a better way to give Travis some clues ...
 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter
    2020-07-07 15:54  4%       ` [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
@ 2020-07-08 13:30  4%       ` Jerin Jacob
  2020-07-08 15:01  0%         ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2020-07-08 13:30 UTC (permalink / raw)
  To: Phil Yang
  Cc: Jerin Jacob, dpdk-dev, Thomas Monjalon, Erik Gabriel Carrillo,
	Honnappa Nagarahalli, David Christensen,
	Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji, dpdk stable

On Tue, Jul 7, 2020 at 9:25 PM Phil Yang <phil.yang@arm.com> wrote:
>
> The n_poll_lcores counter and poll_lcore array are shared between lcores
> and the update of these variables are out of the protection of spinlock
> on each lcore timer list. The read-modify-write operations of the counter
> are not atomic, so it has the potential of race condition between lcores.
>
> Use c11 atomics with RELAXED ordering to prevent confliction.
>
> Fixes: cc7b73ea9e3b ("eventdev: add new software timer adapter")
> Cc: erik.g.carrillo@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

Hi Thomas,

The latest version does not have ABI breakage issue.

I have added the ABI verifier in my local patch verification setup.

Series applied to dpdk-next-eventdev/master.

Please pull this series from dpdk-next-eventdev/master. Thanks.

I am marking this patch series as "Awaiting Upstream" in patchwork
status to reflect the actual status.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 12:29  3%   ` David Marchand
@ 2020-07-08 13:43  0%     ` Aaron Conole
  2020-07-08 15:04  0%     ` Kinsella, Ray
  1 sibling, 0 replies; 200+ results
From: Aaron Conole @ 2020-07-08 13:43 UTC (permalink / raw)
  To: David Marchand
  Cc: Phil Yang, dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Ray Kinsella, Harman Kalra

David Marchand <david.marchand@redhat.com> writes:

> On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>>
>> The event status is defined as a volatile variable and shared
>> between threads. Use c11 atomics with explicit ordering instead
>> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>>  2 files changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
>> index 773a34a..b1e8a29 100644
>> --- a/lib/librte_eal/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
>> @@ -59,7 +59,7 @@ enum {
>>
>>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>>  struct rte_epoll_event {
>> -       volatile uint32_t status;  /**< OUT: event status */
>> +       uint32_t status;           /**< OUT: event status */
>>         int fd;                    /**< OUT: event fd */
>>         int epfd;       /**< OUT: epoll instance the ev associated with */
>>         struct rte_epoll_data epdata;
>
> I got a reject from the ABI check in my env.
>
> 1 function with some indirect sub-type change:
>
>   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_pci_device*' has sub-type changes:
>       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
>         type size hasn't changed
>         1 data member changes (2 filtered):
>          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
>            type size hasn't changed
>            1 data member change:
>             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
>               array element type 'struct rte_epoll_event' changed:
>                 type size hasn't changed
>                 1 data member change:
>                  type of 'volatile uint32_t rte_epoll_event::status' changed:
>                    entity changed from 'volatile uint32_t' to 'typedef
> uint32_t' at stdint-uintn.h:26:1
>                    type size hasn't changed
>
>               type size hasn't changed
>
>
> This is probably harmless in our case (going from volatile to non
> volatile), but it won't pass the check in the CI without an exception
> rule.
>
> Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> catch the issue with this change too, Aaron?

I don't have archives back to Jun 11 on the robot server.  I think it
doesn't preserve forever (and the archives seem to go back only until
Jul 03).  I will update it.

I do see that we have a failed travis job:

https://travis-ci.org/github/ovsrobot/dpdk/builds/697180855

I'm surprised this didn't go out.  Have we seen other failures to report
of the ovs robot recently?  I can double check the job config.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-08 13:15  4%   ` David Marchand
@ 2020-07-08 13:45  7%   ` Aaron Conole
  2020-07-08 14:01  4%     ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: Aaron Conole @ 2020-07-08 13:45 UTC (permalink / raw)
  To: Kinsella, Ray; +Cc: David Marchand, dev, thomas, dodji, Neil Horman

"Kinsella, Ray" <mdr@ashroe.eu> writes:

> + Aaron
>
> On 08/07/2020 11:22, David Marchand wrote:
>> abidiff can provide some more information about the ABI difference it
>> detected.
>> In all cases, a discussion on the mailing must happen but we can give
>> some hints to know if this is a problem with the script calling abidiff,
>> a potential ABI breakage or an unambiguous ABI breakage.
>> 
>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>> ---
>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>> 
>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>> index e17fedbd9f..521e2cce7c 100755
>> --- a/devtools/check-abi.sh
>> +++ b/devtools/check-abi.sh
>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>  		error=1
>>  		continue
>>  	fi
>> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>> +		abiret=$?
>>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>  		error=1
>> -	fi
>> +		echo
>> +		if [ $(($abiret & 3)) != 0 ]; then
>> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
>> +		fi
>> +		if [ $(($abiret & 4)) != 0 ]; then
>> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>> +		fi
>> +		if [ $(($abiret & 8)) != 0 ]; then
>> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>> +		fi
>> +		echo
>> +	}
>>  done
>>  
>>  [ -z "$error" ] || [ -n "$warnonly" ]
>> 
>
> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
> At the moment it takes time to find the failure reason in the Travis log.

That's a problem even for non-ABI failures.  I was considering pulling
the travis log for each failed build and attaching it, but even that
isn't a great solution (very large emails aren't much easier to search).

I'm open to suggestions.

> Ray K


^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 13:45  7%   ` Aaron Conole
@ 2020-07-08 14:01  4%     ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-08 14:01 UTC (permalink / raw)
  To: Aaron Conole; +Cc: David Marchand, dev, thomas, dodji, Neil Horman



On 08/07/2020 14:45, Aaron Conole wrote:
> "Kinsella, Ray" <mdr@ashroe.eu> writes:
> 
>> + Aaron
>>
>> On 08/07/2020 11:22, David Marchand wrote:
>>> abidiff can provide some more information about the ABI difference it
>>> detected.
>>> In all cases, a discussion on the mailing must happen but we can give
>>> some hints to know if this is a problem with the script calling abidiff,
>>> a potential ABI breakage or an unambiguous ABI breakage.
>>>
>>> Signed-off-by: David Marchand <david.marchand@redhat.com>
>>> ---
>>>  devtools/check-abi.sh | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
>>> index e17fedbd9f..521e2cce7c 100755
>>> --- a/devtools/check-abi.sh
>>> +++ b/devtools/check-abi.sh
>>> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>>>  		error=1
>>>  		continue
>>>  	fi
>>> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
>>> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
>>> +		abiret=$?
>>>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>>>  		error=1
>>> -	fi
>>> +		echo
>>> +		if [ $(($abiret & 3)) != 0 ]; then
>>> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
>>> +		fi
>>> +		if [ $(($abiret & 4)) != 0 ]; then
>>> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
>>> +		fi
>>> +		if [ $(($abiret & 8)) != 0 ]; then
>>> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
>>> +		fi
>>> +		echo
>>> +	}
>>>  done
>>>  
>>>  [ -z "$error" ] || [ -n "$warnonly" ]
>>>
>>
>> This look good to me, my only thought was can we do anything to help the ABI checks play nice with Travis.
>> At the moment it takes time to find the failure reason in the Travis log.
> 
> That's a problem even for non-ABI failures.  I was considering pulling
> the travis log for each failed build and attaching it, but even that
> isn't a great solution (very large emails aren't much easier to search).
> 
> I'm open to suggestions.

For me the problem arises when you log on to the Travis interface,
you need to search for ERROR etc ... there must a better way.

> 
>> Ray K
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
  2020-07-07 15:23  0%   ` Olivier Matz
@ 2020-07-08 14:16  0%   ` Morten Brørup
  2020-07-08 14:54  0%     ` Slava Ovsiienko
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-08 14:16 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> Ovsiienko
> Sent: Tuesday, July 7, 2020 4:57 PM
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without
> involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be
> introduced
> at the moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> current device clock value and provide the reference for the
> timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  v1->v4:
>     - dedicated dynamic Tx timestamp flag instead of shared with Rx

The detailed description above should be updated to reflect that it is now two flags.

>     - Doxygen-style comment
>     - comments update
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c
> b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h
> b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
>  /**< Device supports Rx queue setup after device started*/
>  #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
> diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h
> b/lib/librte_mbuf/rte_mbuf_dyn.h
> index 96c3631..7e9f7d2 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
>  #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly
> defined
> + * but are maintained always the same for a given port. Some devices
> allow4
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> contains
> + * actual timestamp value. For the packets being sent this value can
> be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"

The description above should not say anything about the dynamic TX timestamp flag.

Please elaborate "some timing information", e.g. add "... about when the packet was received".

> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on
> the
> + * packet being sent it tries to synchronize the time of packet
> appearing
> + * on the wire with the specified packet timestamp. If the specified
> one
> + * is in the past it should be ignored, if one is in the distant
> future
> + * it should be capped with some reasonable value (in range of
> seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> supposed,
> + * neither for packet within the burst, nor for the whole bursts, it
> is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst
> scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1
> 

It may be worth adding some documentation about how the clocks of the NICs are out of sync with the clock of the CPU, and are all drifting relatively.

And those clocks are also out of sync with the actual time (NTP clock).

Preferably, some sort of cookbook for handling this should be provided. PCAP could be used as an example.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  @ 2020-07-08 14:30  2%     ` David Marchand
  2020-07-08 15:34  5%       ` Ruifeng Wang
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-07-08 14:30 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd

On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
> index b9d49ac87..7889f21b3 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2020 Arm Limited
>   */
>
>  #ifndef _RTE_LPM_H_
> @@ -20,6 +21,7 @@
>  #include <rte_memory.h>
>  #include <rte_common.h>
>  #include <rte_vect.h>
> +#include <rte_rcu_qsbr.h>
>
>  #ifdef __cplusplus
>  extern "C" {
> @@ -62,6 +64,17 @@ extern "C" {
>  /** Bitmask used to indicate successful lookup */
>  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
>
> +/** @internal Default RCU defer queue entries to reclaim in one go. */
> +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> +
> +/** RCU reclamation modes */
> +enum rte_lpm_qsbr_mode {
> +       /** Create defer queue for reclaim. */
> +       RTE_LPM_QSBR_MODE_DQ = 0,
> +       /** Use blocking mode reclaim. No defer queue created. */
> +       RTE_LPM_QSBR_MODE_SYNC
> +};
> +
>  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>  /** @internal Tbl24 entry structure. */
>  __extension__
> @@ -130,6 +143,28 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> +#ifdef ALLOW_EXPERIMENTAL_API
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +#endif
> +};

I can see failures in travis reports for v7 and v6.
I reproduced them in my env.

1 function with some indirect sub-type change:

  [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
at rte_lpm.c:764:1 has some indirect sub-type changes:
    parameter 1 of type 'rte_lpm*' has sub-type changes:
      in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
        type size hasn't changed
        3 data member insertions:
          'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
rte_lpm.h:148:1
          'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664
(in bits) at rte_lpm.h:149:1
          'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
bits) at rte_lpm.h:150:1


Going back to my proposal of hiding what does not need to be seen.

Disclaimer, *this is quick & dirty* but it builds and passes ABI check:

$ git diff
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index d498ba761..7109aef6a 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
        return l;
 }

+struct internal_lpm {
+       /* Public object */
+       struct rte_lpm lpm;
+       /* RCU config. */
+       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
+       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
+};
+
 /*
  * Allocates memory for LPM object
  */
@@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
                const struct rte_lpm_config *config)
 {
        char mem_name[RTE_LPM_NAMESIZE];
+       struct internal_lpm *internal = NULL;
        struct rte_lpm *lpm = NULL;
        struct rte_tailq_entry *te;
        uint32_t mem_size, rules_size, tbl8s_size;
@@ -141,12 +151,6 @@ rte_lpm_create(const char *name, int socket_id,

        snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);

-       /* Determine the amount of memory to allocate. */
-       mem_size = sizeof(*lpm);
-       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
        rte_mcfg_tailq_write_lock();

        /* guarantee there's no existing */
@@ -170,16 +174,23 @@ rte_lpm_create(const char *name, int socket_id,
                goto exit;
        }

+       /* Determine the amount of memory to allocate. */
+       mem_size = sizeof(*internal);
+       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
        /* Allocate memory to store the LPM data structures. */
-       lpm = rte_zmalloc_socket(mem_name, mem_size,
+       internal = rte_zmalloc_socket(mem_name, mem_size,
                        RTE_CACHE_LINE_SIZE, socket_id);
-       if (lpm == NULL) {
+       if (internal == NULL) {
                RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
                rte_free(te);
                rte_errno = ENOMEM;
                goto exit;
        }

+       lpm = &internal->lpm;
        lpm->rules_tbl = rte_zmalloc_socket(NULL,
                        (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);

@@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal;
        struct rte_lpm_list *lpm_list;
        struct rte_tailq_entry *te;

@@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)

        rte_mcfg_tailq_write_unlock();

-       if (lpm->dq)
-               rte_rcu_qsbr_dq_delete(lpm->dq);
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->dq != NULL)
+               rte_rcu_qsbr_dq_delete(internal->dq);
        rte_free(lpm->tbl8);
        rte_free(lpm->rules_tbl);
        rte_free(lpm);
@@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
 {
        char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
        struct rte_rcu_qsbr_dq_parameters params = {0};
+       struct internal_lpm *internal;

-       if ((lpm == NULL) || (cfg == NULL)) {
+       if (lpm == NULL || cfg == NULL) {
                rte_errno = EINVAL;
                return 1;
        }

-       if (lpm->v) {
+       internal = container_of(lpm, struct internal_lpm, lpm);
+       if (internal->v != NULL) {
                rte_errno = EEXIST;
                return 1;
        }
@@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
rte_lpm_rcu_config *cfg,
                params.free_fn = __lpm_rcu_qsbr_free_resource;
                params.p = lpm;
                params.v = cfg->v;
-               lpm->dq = rte_rcu_qsbr_dq_create(&params);
-               if (lpm->dq == NULL) {
-                       RTE_LOG(ERR, LPM,
-                                       "LPM QS defer queue creation failed\n");
+               internal->dq = rte_rcu_qsbr_dq_create(&params);
+               if (internal->dq == NULL) {
+                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
failed\n");
                        return 1;
                }
                if (dq)
-                       *dq = lpm->dq;
+                       *dq = internal->dq;
        } else {
                rte_errno = EINVAL;
                return 1;
        }
-       lpm->rcu_mode = cfg->mode;
-       lpm->v = cfg->v;
+       internal->rcu_mode = cfg->mode;
+       internal->v = cfg->v;

        return 0;
 }
@@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)
 static int32_t
 tbl8_alloc(struct rte_lpm *lpm)
 {
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);
        int32_t group_idx; /* tbl8 group index. */

        group_idx = _tbl8_alloc(lpm);
-       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
+       if (group_idx == -ENOSPC && internal->dq != NULL) {
                /* If there are no tbl8 groups try to reclaim one. */
-               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
+               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
NULL, NULL) == 0)
                        group_idx = _tbl8_alloc(lpm);
        }

@@ -518,20 +533,21 @@ static void
 tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
        struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+       struct internal_lpm *internal = container_of(lpm, struct
internal_lpm, lpm);

-       if (!lpm->v) {
+       if (internal->v == NULL) {
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
                /* Wait for quiescent state change. */
-               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
+               rte_rcu_qsbr_synchronize(internal->v, RTE_QSBR_THRID_INVALID);
                /* Set tbl8 group invalid*/
                __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
                                __ATOMIC_RELAXED);
-       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
                /* Push into QSBR defer queue. */
-               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
+               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
*)&tbl8_group_start);
        }
 }

diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index 7889f21b3..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -143,12 +143,6 @@ struct rte_lpm {
                        __rte_cache_aligned; /**< LPM tbl24 table. */
        struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
        struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
-#ifdef ALLOW_EXPERIMENTAL_API
-       /* RCU config. */
-       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
-       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
-       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
-#endif
 };

 /** LPM RCU QSBR configuration structure. */




-- 
David Marchand


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
@ 2020-07-08 14:54  0%     ` Slava Ovsiienko
  2020-07-08 15:27  0%       ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Slava Ovsiienko @ 2020-07-08 14:54 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Thank you for the comments. Please, see below.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 17:16
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > Ovsiienko
> > Sent: Tuesday, July 7, 2020 4:57 PM
> >
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> >
> > The main objective of this RFC is to specify the way how applications
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well.
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. If the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> > current device clock value and provide the reference for the
> > timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > ---
> >  v1->v4:
> >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> 
> The detailed description above should be updated to reflect that it is now
> two flags.
OK

> 
> >     - Doxygen-style comment
> >     - comments update
> >
> > ---
> >  lib/librte_ethdev/rte_ethdev.c |  1 +  lib/librte_ethdev/rte_ethdev.h
> > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > +++++++++++++++++++++++++++++++
> >  3 files changed, 36 insertions(+)
> >
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > +
> > +
> >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**<
> > Device supports Rx queue setup after device started*/  #define
> > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > index 96c3631..7e9f7d2 100644
> > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
> >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> >
> > +/**
> > + * The timestamp dynamic field provides some timing information, the
> > + * units and time references (initial phase) are not explicitly
> > defined
> > + * but are maintained always the same for a given port. Some devices
> > allow4
> > + * to query rte_eth_read_clock() that will return the current device
> > + * timestamp. The dynamic Tx timestamp flag tells whether the field
> > contains
> > + * actual timestamp value. For the packets being sent this value can
> > be
> > + * used by PMD to schedule packet sending.
> > + *
> > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> > + * introduced and the shared dynamic timestamp field will be used
> > + * to handle the timestamps on receiving datapath as well.
> > + */
> > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> 
> The description above should not say anything about the dynamic TX
> timestamp flag.
It does not. Or do you mean RX? 
Not sure, field and flag are tightly coupled,
it is nice to mention this relation for better understanding. 
And mentioning the RX explains why it is not like this:
RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

> 
> Please elaborate "some timing information", e.g. add "... about when the
> packet was received".

Sorry, I do not follow,  currently the dynamic field is not
"about when the packet was received". Now it is introduced for Tx
only and just the opportunity to be shared with Rx one in coming releases
is mentioned. "Some" means - not specified (herein) exactly.
And it is elaborated what Is not specified and how it is supposed
to use Tx timestamp.

> 
> > +
> > +/**
> > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on
> > the
> > + * packet being sent it tries to synchronize the time of packet
> > appearing
> > + * on the wire with the specified packet timestamp. If the specified
> > one
> > + * is in the past it should be ignored, if one is in the distant
> > future
> > + * it should be capped with some reasonable value (in range of
> > seconds).
> > + *
> > + * There is no any packet reordering according to timestamps is
> > supposed,
> > + * neither for packet within the burst, nor for the whole bursts, it
> > is
> > + * an entirely application responsibility to generate packets and its
> > + * timestamps in desired order. The timestamps might be put only in
> > + * the first packet in the burst providing the entire burst
> > scheduling.
> > + */
> > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> > +
> >  #endif
> > --
> > 1.8.3.1
> >
> 
> It may be worth adding some documentation about how the clocks of the
> NICs are out of sync with the clock of the CPU, and are all drifting relatively.
> 
> And those clocks are also out of sync with the actual time (NTP clock).

IMO, It is out of scope of this very generic patch.  As for mlx NICs - the internal device
clock might be (or might be not) synchronized with PTP, it can provide timestamps
in real nanoseconds in various formats or just some free running counter.
On some systems the NIC and CPU might share the same clock source (for their PLL inputs
for example) and there will be no any drifts. As we can see - it is a wide and interesting 
opic to discuss, but, IMO,  the comment in header file might be not the most relevant
place to do. As for mlx5 devices clock specifics - it will be documented in PMD chapter.

OK, will add few generic words, the few ones - in order not to make comment wordy, just
point the direction for further thinking.

> 
> Preferably, some sort of cookbook for handling this should be provided.
> PCAP could be used as an example.
> 
testpmd example is included in series, mlx5 PMD patch is prepared and coming soon.

With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter
  2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
@ 2020-07-08 15:01  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2020-07-08 15:01 UTC (permalink / raw)
  To: Phil Yang, Jerin Jacob
  Cc: dev, Erik Gabriel Carrillo, Honnappa Nagarahalli,
	David Christensen, Ruifeng Wang (Arm Technology China),
	Dharmik Thakkar, nd, David Marchand, Ray Kinsella, Neil Horman,
	dodji, dpdk stable, Jerin Jacob

08/07/2020 15:30, Jerin Jacob:
> On Tue, Jul 7, 2020 at 9:25 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > The n_poll_lcores counter and poll_lcore array are shared between lcores
> > and the update of these variables are out of the protection of spinlock
> > on each lcore timer list. The read-modify-write operations of the counter
> > are not atomic, so it has the potential of race condition between lcores.
> >
> > Use c11 atomics with RELAXED ordering to prevent confliction.
> >
> > Fixes: cc7b73ea9e3b ("eventdev: add new software timer adapter")
> > Cc: erik.g.carrillo@intel.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
> 
> Hi Thomas,
> 
> The latest version does not have ABI breakage issue.
> 
> I have added the ABI verifier in my local patch verification setup.
> 
> Series applied to dpdk-next-eventdev/master.
> 
> Please pull this series from dpdk-next-eventdev/master. Thanks.
> 
> I am marking this patch series as "Awaiting Upstream" in patchwork
> status to reflect the actual status.

OK, pulled and marked as Accepted in patchwork.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 12:29  3%   ` David Marchand
  2020-07-08 13:43  0%     ` Aaron Conole
@ 2020-07-08 15:04  0%     ` Kinsella, Ray
  2020-07-09  5:21  0%       ` Phil Yang
  1 sibling, 1 reply; 200+ results
From: Kinsella, Ray @ 2020-07-08 15:04 UTC (permalink / raw)
  To: David Marchand, Phil Yang, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman, Harman Kalra



On 08/07/2020 13:29, David Marchand wrote:
> On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
>>
>> The event status is defined as a volatile variable and shared
>> between threads. Use c11 atomics with explicit ordering instead
>> of rte_atomic ops which enforce unnecessary barriers on aarch64.
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++---------
>>  2 files changed, 34 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
>> index 773a34a..b1e8a29 100644
>> --- a/lib/librte_eal/include/rte_eal_interrupts.h
>> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
>> @@ -59,7 +59,7 @@ enum {
>>
>>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>>  struct rte_epoll_event {
>> -       volatile uint32_t status;  /**< OUT: event status */
>> +       uint32_t status;           /**< OUT: event status */
>>         int fd;                    /**< OUT: event fd */
>>         int epfd;       /**< OUT: epoll instance the ev associated with */
>>         struct rte_epoll_data epdata;
> 
> I got a reject from the ABI check in my env.
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_pci_device*' has sub-type changes:
>       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
>         type size hasn't changed
>         1 data member changes (2 filtered):
>          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
>            type size hasn't changed
>            1 data member change:
>             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
>               array element type 'struct rte_epoll_event' changed:
>                 type size hasn't changed
>                 1 data member change:
>                  type of 'volatile uint32_t rte_epoll_event::status' changed:
>                    entity changed from 'volatile uint32_t' to 'typedef
> uint32_t' at stdint-uintn.h:26:1
>                    type size hasn't changed
> 
>               type size hasn't changed
> 
> 
> This is probably harmless in our case (going from volatile to non
> volatile), but it won't pass the check in the CI without an exception
> rule.
> 
> Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> catch the issue with this change too, Aaron?
> 
> 
Agreed, probably harmless and requires something in libagigail.ignore. 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 14:54  0%     ` Slava Ovsiienko
@ 2020-07-08 15:27  0%       ` Morten Brørup
  2020-07-08 15:51  0%         ` Slava Ovsiienko
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2020-07-08 15:27 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> Sent: Wednesday, July 8, 2020 4:54 PM
> 
> Hi, Morten
> 
> Thank you for the comments. Please, see below.
> 
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, July 8, 2020 17:16
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > bernard.iremonger@intel.com; thomas@mellanox.net
> > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> packet
> > Txscheduling
> >
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > Ovsiienko
> > > Sent: Tuesday, July 7, 2020 4:57 PM
> > >
> > > There is the requirement on some networks for precise traffic
> timing
> > > management. The ability to send (and, generally speaking, receive)
> the
> > > packets at the very precisely specified moment of time provides the
> > > opportunity to support the connections with Time Division
> Multiplexing
> > > using the contemporary general purpose NIC without involving an
> > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > interface is one of the promising features for potentially usage of
> > > the precise time management for the egress packets.
> > >
> > > The main objective of this RFC is to specify the way how
> applications
> > > can provide the moment of time at what the packet transmission must
> be
> > > started and to describe in preliminary the supporting this feature
> > > from
> > > mlx5 PMD side.
> > >
> > > The new dynamic timestamp field is proposed, it provides some
> timing
> > > information, the units and time references (initial phase) are not
> > > explicitly defined but are maintained always the same for a given
> port.
> > > Some devices allow to query rte_eth_read_clock() that will return
> the
> > > current device timestamp. The dynamic timestamp flag tells whether
> the
> > > field contains actual timestamp value. For the packets being sent
> this
> > > value can be used by PMD to schedule packet sending.
> > >
> > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and
> > > obsoleting, these dynamic flag and field will be used to manage the
> > > timestamps on receiving datapath as well.
> > >
> > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > sent it tries to synchronize the time of packet appearing on the
> wire
> > > with the specified packet timestamp. If the specified one is in the
> > > past it should be ignored, if one is in the distant future it
> should
> > > be capped with some reasonable value (in range of seconds). These
> > > specific cases ("too late" and "distant future") can be optionally
> > > reported via device xstats to assist applications to detect the
> > > time-related problems.
> > >
> > > There is no any packet reordering according timestamps is supposed,
> > > neither within packet burst, nor between packets, it is an entirely
> > > application responsibility to generate packets and its timestamps
> in
> > > desired order. The timestamps can be put only in the first packet
> in
> > > the burst providing the entire burst scheduling.
> > >
> > > PMD reports the ability to synchronize packet sending on timestamp
> > > with new offload flag:
> > >
> > > This is palliative and is going to be replaced with new eth_dev API
> > > about reporting/managing the supported dynamic flags and its
> related
> > > features. This API would break ABI compatibility and can't be
> > > introduced at the moment, so is postponed to 20.11.
> > >
> > > For testing purposes it is proposed to update testpmd "txonly"
> > > forwarding mode routine. With this update testpmd application
> > > generates the packets and sets the dynamic timestamps according to
> > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> registered.
> > >
> > > The new testpmd command is proposed to configure sending pattern:
> > >
> > > set tx_times <burst_gap>,<intra_gap>
> > >
> > > <intra_gap> - the delay between the packets within the burst
> > >               specified in the device clock units. The number
> > >               of packets in the burst is defined by txburst
> parameter
> > >
> > > <burst_gap> - the delay between the bursts in the device clock
> units
> > >
> > > As the result the bursts of packet will be transmitted with
> specific
> > > delays between the packets within the burst and specific delay
> between
> > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> the
> > > current device clock value and provide the reference for the
> > > timestamps.
> > >
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > >  v1->v4:
> > >     - dedicated dynamic Tx timestamp flag instead of shared with Rx
> >
> > The detailed description above should be updated to reflect that it
> is now
> > two flags.
> OK
> 
> >
> > >     - Doxygen-style comment
> > >     - comments update
> > >
> > > ---
> > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> lib/librte_ethdev/rte_ethdev.h
> > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > +++++++++++++++++++++++++++++++
> > >  3 files changed, 36 insertions(+)
> > >
> > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > >  };
> > >
> > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > >  /** Device supports outer UDP checksum */  #define
> > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > >
> > > +/** Device supports send on timestamp */ #define
> > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > +
> > > +
> > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> > /**<
> > > Device supports Rx queue setup after device started*/  #define
> > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > index 96c3631..7e9f7d2 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > "rte_flow_dynfield_metadata"
> > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > "rte_flow_dynflag_metadata"
> > >
> > > +/**
> > > + * The timestamp dynamic field provides some timing information,
> the
> > > + * units and time references (initial phase) are not explicitly
> > > defined
> > > + * but are maintained always the same for a given port. Some
> devices
> > > allow4
> > > + * to query rte_eth_read_clock() that will return the current
> device
> > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> field
> > > contains
> > > + * actual timestamp value. For the packets being sent this value
> can
> > > be
> > > + * used by PMD to schedule packet sending.
> > > + *
> > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> deprecation
> > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> be
> > > + * introduced and the shared dynamic timestamp field will be used
> > > + * to handle the timestamps on receiving datapath as well.
> > > + */
> > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > "rte_dynfield_timestamp"
> >
> > The description above should not say anything about the dynamic TX
> > timestamp flag.
> It does not. Or do you mean RX?
> Not sure, field and flag are tightly coupled,
> it is nice to mention this relation for better understanding.
> And mentioning the RX explains why it is not like this:
> RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME

Sorry. I misunderstood its purpose!
It's the name of the field, and the field will not only be used for RX, but in the future also for RX.
(I thought it was the name of the RX flag, reserved for future use.)

> 
> >
> > Please elaborate "some timing information", e.g. add "... about when
> the
> > packet was received".
> 
> Sorry, I do not follow,  currently the dynamic field is not
> "about when the packet was received". Now it is introduced for Tx
> only and just the opportunity to be shared with Rx one in coming
> releases
> is mentioned. "Some" means - not specified (herein) exactly.
> And it is elaborated what Is not specified and how it is supposed
> to use Tx timestamp.

It should be described when it is valid, and how it is being used, e.g. by adding a reference to the "rte_dynflag_tx_timestamp" flag.

> >
> > > +
> > > +/**
> > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> > set on
> > > the
> > > + * packet being sent it tries to synchronize the time of packet
> > > appearing
> > > + * on the wire with the specified packet timestamp. If the
> specified
> > > one
> > > + * is in the past it should be ignored, if one is in the distant
> > > future
> > > + * it should be capped with some reasonable value (in range of
> > > seconds).
> > > + *
> > > + * There is no any packet reordering according to timestamps is
> > > supposed,
> > > + * neither for packet within the burst, nor for the whole bursts,
> it
> > > is
> > > + * an entirely application responsibility to generate packets and
> its
> > > + * timestamps in desired order. The timestamps might be put only
> in
> > > + * the first packet in the burst providing the entire burst
> > > scheduling.
> > > + */
> > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > "rte_dynflag_tx_timestamp"
> > > +
> > >  #endif
> > > --
> > > 1.8.3.1
> > >
> >
> > It may be worth adding some documentation about how the clocks of the
> > NICs are out of sync with the clock of the CPU, and are all drifting
> relatively.
> >
> > And those clocks are also out of sync with the actual time (NTP
> clock).
> 
> IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> the internal device
> clock might be (or might be not) synchronized with PTP, it can provide
> timestamps
> in real nanoseconds in various formats or just some free running
> counter.

Cool!

> On some systems the NIC and CPU might share the same clock source (for
> their PLL inputs
> for example) and there will be no any drifts. As we can see - it is a
> wide and interesting
> opic to discuss, but, IMO,  the comment in header file might be not the
> most relevant
> place to do. As for mlx5 devices clock specifics - it will be
> documented in PMD chapter.
> 
> OK, will add few generic words, the few ones - in order not to make
> comment wordy, just
> point the direction for further thinking.

I agree - we don't want cookbooks in the header files. Only enough description to avoid the worst misunderstandings.

> 
> >
> > Preferably, some sort of cookbook for handling this should be
> provided.
> > PCAP could be used as an example.
> >
> testpmd example is included in series, mlx5 PMD patch is prepared and
> coming soon.

Great.

And I suppose that the more detailed cookbook/example - regarding offset and drift of various clocks - is probably more relevant for the RX side (for various PCAP applications), and thus completely unrelated to this patch.

> 
> With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
  2020-07-08 14:30  2%     ` David Marchand
@ 2020-07-08 15:34  5%       ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-08 15:34 UTC (permalink / raw)
  To: David Marchand
  Cc: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman, dev, Ananyev,
	Konstantin, Honnappa Nagarahalli, nd, nd


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 8, 2020 10:30 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>; dev <dev@dpdk.org>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR
> 
> On Tue, Jul 7, 2020 at 5:16 PM Ruifeng Wang <ruifeng.wang@arm.com>
> wrote:
> > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> > b9d49ac87..7889f21b3 100644
> > --- a/lib/librte_lpm/rte_lpm.h
> > +++ b/lib/librte_lpm/rte_lpm.h
> > @@ -1,5 +1,6 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2020 Arm Limited
> >   */
> >
> >  #ifndef _RTE_LPM_H_
> > @@ -20,6 +21,7 @@
> >  #include <rte_memory.h>
> >  #include <rte_common.h>
> >  #include <rte_vect.h>
> > +#include <rte_rcu_qsbr.h>
> >
> >  #ifdef __cplusplus
> >  extern "C" {
> > @@ -62,6 +64,17 @@ extern "C" {
> >  /** Bitmask used to indicate successful lookup */
> >  #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
> >
> > +/** @internal Default RCU defer queue entries to reclaim in one go. */
> > +#define RTE_LPM_RCU_DQ_RECLAIM_MAX     16
> > +
> > +/** RCU reclamation modes */
> > +enum rte_lpm_qsbr_mode {
> > +       /** Create defer queue for reclaim. */
> > +       RTE_LPM_QSBR_MODE_DQ = 0,
> > +       /** Use blocking mode reclaim. No defer queue created. */
> > +       RTE_LPM_QSBR_MODE_SYNC
> > +};
> > +
> >  #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> >  /** @internal Tbl24 entry structure. */  __extension__ @@ -130,6
> > +143,28 @@ struct rte_lpm {
> >                         __rte_cache_aligned; /**< LPM tbl24 table. */
> >         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
> >         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
> > +#ifdef ALLOW_EXPERIMENTAL_API
> > +       /* RCU config. */
> > +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> > +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> > +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> > +#endif
> > +};
> 
> I can see failures in travis reports for v7 and v6.
> I reproduced them in my env.
> 
> 1 function with some indirect sub-type change:
> 
>   [C]'function int rte_lpm_add(rte_lpm*, uint32_t, uint8_t, uint32_t)'
> at rte_lpm.c:764:1 has some indirect sub-type changes:
>     parameter 1 of type 'rte_lpm*' has sub-type changes:
>       in pointed to type 'struct rte_lpm' at rte_lpm.h:134:1:
>         type size hasn't changed
>         3 data member insertions:
>           'rte_rcu_qsbr* rte_lpm::v', at offset 536873600 (in bits) at
> rte_lpm.h:148:1
>           'rte_lpm_qsbr_mode rte_lpm::rcu_mode', at offset 536873664 (in bits)
> at rte_lpm.h:149:1
>           'rte_rcu_qsbr_dq* rte_lpm::dq', at offset 536873728 (in
> bits) at rte_lpm.h:150:1
> 
Sorry, I thought if ALLOW_EXPERIMENTAL was added, ABI would be kept when experimental was not allowed by user.
ABI and ALLOW_EXPERIMENTAL should be two different things.

> 
> Going back to my proposal of hiding what does not need to be seen.
> 
> Disclaimer, *this is quick & dirty* but it builds and passes ABI check:
> 
> $ git diff
> diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index
> d498ba761..7109aef6a 100644
> --- a/lib/librte_lpm/rte_lpm.c
> +++ b/lib/librte_lpm/rte_lpm.c
I understand your proposal in v5 now. A new data structure encloses rte_lpm and new members that for RCU use.
In this way, rte_lpm ABI is kept. And we can move out other members in rte_lpm that not need to be exposed in 20.11 release.
I will fix the ABI issue in next version.

> @@ -115,6 +115,15 @@ rte_lpm_find_existing(const char *name)
>         return l;
>  }
> 
> +struct internal_lpm {
> +       /* Public object */
> +       struct rte_lpm lpm;
> +       /* RCU config. */
> +       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> +       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> +       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> +};
> +
>  /*
>   * Allocates memory for LPM object
>   */
> @@ -123,6 +132,7 @@ rte_lpm_create(const char *name, int socket_id,
>                 const struct rte_lpm_config *config)  {
>         char mem_name[RTE_LPM_NAMESIZE];
> +       struct internal_lpm *internal = NULL;
>         struct rte_lpm *lpm = NULL;
>         struct rte_tailq_entry *te;
>         uint32_t mem_size, rules_size, tbl8s_size; @@ -141,12 +151,6 @@
> rte_lpm_create(const char *name, int socket_id,
> 
>         snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
> 
> -       /* Determine the amount of memory to allocate. */
> -       mem_size = sizeof(*lpm);
> -       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> -       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> -                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config-
> >number_tbl8s);
> -
>         rte_mcfg_tailq_write_lock();
> 
>         /* guarantee there's no existing */ @@ -170,16 +174,23 @@
> rte_lpm_create(const char *name, int socket_id,
>                 goto exit;
>         }
> 
> +       /* Determine the amount of memory to allocate. */
> +       mem_size = sizeof(*internal);
> +       rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
> +       tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
> +                       RTE_LPM_TBL8_GROUP_NUM_ENTRIES *
> + config->number_tbl8s);
> +
>         /* Allocate memory to store the LPM data structures. */
> -       lpm = rte_zmalloc_socket(mem_name, mem_size,
> +       internal = rte_zmalloc_socket(mem_name, mem_size,
>                         RTE_CACHE_LINE_SIZE, socket_id);
> -       if (lpm == NULL) {
> +       if (internal == NULL) {
>                 RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
>                 rte_free(te);
>                 rte_errno = ENOMEM;
>                 goto exit;
>         }
> 
> +       lpm = &internal->lpm;
>         lpm->rules_tbl = rte_zmalloc_socket(NULL,
>                         (size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
> 
> @@ -226,6 +237,7 @@ rte_lpm_create(const char *name, int socket_id,
> void  rte_lpm_free(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal;
>         struct rte_lpm_list *lpm_list;
>         struct rte_tailq_entry *te;
> 
> @@ -247,8 +259,9 @@ rte_lpm_free(struct rte_lpm *lpm)
> 
>         rte_mcfg_tailq_write_unlock();
> 
> -       if (lpm->dq)
> -               rte_rcu_qsbr_dq_delete(lpm->dq);
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->dq != NULL)
> +               rte_rcu_qsbr_dq_delete(internal->dq);
>         rte_free(lpm->tbl8);
>         rte_free(lpm->rules_tbl);
>         rte_free(lpm);
> @@ -276,13 +289,15 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,  {
>         char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
>         struct rte_rcu_qsbr_dq_parameters params = {0};
> +       struct internal_lpm *internal;
> 
> -       if ((lpm == NULL) || (cfg == NULL)) {
> +       if (lpm == NULL || cfg == NULL) {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> 
> -       if (lpm->v) {
> +       internal = container_of(lpm, struct internal_lpm, lpm);
> +       if (internal->v != NULL) {
>                 rte_errno = EEXIST;
>                 return 1;
>         }
> @@ -305,20 +320,19 @@ rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct
> rte_lpm_rcu_config *cfg,
>                 params.free_fn = __lpm_rcu_qsbr_free_resource;
>                 params.p = lpm;
>                 params.v = cfg->v;
> -               lpm->dq = rte_rcu_qsbr_dq_create(&params);
> -               if (lpm->dq == NULL) {
> -                       RTE_LOG(ERR, LPM,
> -                                       "LPM QS defer queue creation failed\n");
> +               internal->dq = rte_rcu_qsbr_dq_create(&params);
> +               if (internal->dq == NULL) {
> +                       RTE_LOG(ERR, LPM, "LPM QS defer queue creation
> failed\n");
>                         return 1;
>                 }
>                 if (dq)
> -                       *dq = lpm->dq;
> +                       *dq = internal->dq;
>         } else {
>                 rte_errno = EINVAL;
>                 return 1;
>         }
> -       lpm->rcu_mode = cfg->mode;
> -       lpm->v = cfg->v;
> +       internal->rcu_mode = cfg->mode;
> +       internal->v = cfg->v;
> 
>         return 0;
>  }
> @@ -502,12 +516,13 @@ _tbl8_alloc(struct rte_lpm *lpm)  static int32_t
> tbl8_alloc(struct rte_lpm *lpm)  {
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
>         int32_t group_idx; /* tbl8 group index. */
> 
>         group_idx = _tbl8_alloc(lpm);
> -       if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) {
> +       if (group_idx == -ENOSPC && internal->dq != NULL) {
>                 /* If there are no tbl8 groups try to reclaim one. */
> -               if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0)
> +               if (rte_rcu_qsbr_dq_reclaim(internal->dq, 1, NULL,
> NULL, NULL) == 0)
>                         group_idx = _tbl8_alloc(lpm);
>         }
> 
> @@ -518,20 +533,21 @@ static void
>  tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)  {
>         struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
> +       struct internal_lpm *internal = container_of(lpm, struct
> internal_lpm, lpm);
> 
> -       if (!lpm->v) {
> +       if (internal->v == NULL) {
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
>                 /* Wait for quiescent state change. */
> -               rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID);
> +               rte_rcu_qsbr_synchronize(internal->v,
> + RTE_QSBR_THRID_INVALID);
>                 /* Set tbl8 group invalid*/
>                 __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
>                                 __ATOMIC_RELAXED);
> -       } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
> +       } else if (internal->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
>                 /* Push into QSBR defer queue. */
> -               rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start);
> +               rte_rcu_qsbr_dq_enqueue(internal->dq, (void
> *)&tbl8_group_start);
>         }
>  }
> 
> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index
> 7889f21b3..a9568fcdd 100644
> --- a/lib/librte_lpm/rte_lpm.h
> +++ b/lib/librte_lpm/rte_lpm.h
> @@ -143,12 +143,6 @@ struct rte_lpm {
>                         __rte_cache_aligned; /**< LPM tbl24 table. */
>         struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */
>         struct rte_lpm_rule *rules_tbl; /**< LPM rules. */ -#ifdef
> ALLOW_EXPERIMENTAL_API
> -       /* RCU config. */
> -       struct rte_rcu_qsbr *v;         /* RCU QSBR variable. */
> -       enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
> -       struct rte_rcu_qsbr_dq *dq;     /* RCU QSBR defer queue. */
> -#endif
>  };
> 
>  /** LPM RCU QSBR configuration structure. */
> 
> 
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (4 preceding siblings ...)
  2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
@ 2020-07-08 15:47  2% ` Viacheslav Ovsiienko
  2020-07-08 16:05  0%   ` Slava Ovsiienko
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
  2020-07-10 12:39  2% ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
  7 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-08 15:47 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas, mb

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added

---
 lib/librte_ethdev/rte_ethdev.c |  1 +
 lib/librte_ethdev/rte_ethdev.h |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h | 31 +++++++++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 8e10a6f..02157d5 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index a49242b..6f6454c 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling
  2020-07-08 15:27  0%       ` Morten Brørup
@ 2020-07-08 15:51  0%         ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-08 15:51 UTC (permalink / raw)
  To: Morten Brørup, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger, thomas

Hi, Morten

Addressed most of your comments in the v5 commit message.
Header file comments are close to become too wordy,
and I did not dare to elaborate ones more.

With best regards, Slava

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, July 8, 2020 18:27
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@mellanox.net
> Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet
> Txscheduling
> 
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko
> > Sent: Wednesday, July 8, 2020 4:54 PM
> >
> > Hi, Morten
> >
> > Thank you for the comments. Please, see below.
> >
> > > -----Original Message-----
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, July 8, 2020 17:16
> > > To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> > > Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> > > <rasland@mellanox.com>; olivier.matz@6wind.com;
> > > bernard.iremonger@intel.com; thomas@mellanox.net
> > > Subject: RE: [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate
> > packet
> > > Txscheduling
> > >
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Viacheslav
> > > > Ovsiienko
> > > > Sent: Tuesday, July 7, 2020 4:57 PM
> > > >
> > > > There is the requirement on some networks for precise traffic
> > timing
> > > > management. The ability to send (and, generally speaking, receive)
> > the
> > > > packets at the very precisely specified moment of time provides
> > > > the opportunity to support the connections with Time Division
> > Multiplexing
> > > > using the contemporary general purpose NIC without involving an
> > > > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > > > interface is one of the promising features for potentially usage
> > > > of the precise time management for the egress packets.
> > > >
> > > > The main objective of this RFC is to specify the way how
> > applications
> > > > can provide the moment of time at what the packet transmission
> > > > must
> > be
> > > > started and to describe in preliminary the supporting this feature
> > > > from
> > > > mlx5 PMD side.
> > > >
> > > > The new dynamic timestamp field is proposed, it provides some
> > timing
> > > > information, the units and time references (initial phase) are not
> > > > explicitly defined but are maintained always the same for a given
> > port.
> > > > Some devices allow to query rte_eth_read_clock() that will return
> > the
> > > > current device timestamp. The dynamic timestamp flag tells whether
> > the
> > > > field contains actual timestamp value. For the packets being sent
> > this
> > > > value can be used by PMD to schedule packet sending.
> > > >
> > > > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> > and
> > > > obsoleting, these dynamic flag and field will be used to manage
> > > > the timestamps on receiving datapath as well.
> > > >
> > > > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > > > sent it tries to synchronize the time of packet appearing on the
> > wire
> > > > with the specified packet timestamp. If the specified one is in
> > > > the past it should be ignored, if one is in the distant future it
> > should
> > > > be capped with some reasonable value (in range of seconds). These
> > > > specific cases ("too late" and "distant future") can be optionally
> > > > reported via device xstats to assist applications to detect the
> > > > time-related problems.
> > > >
> > > > There is no any packet reordering according timestamps is
> > > > supposed, neither within packet burst, nor between packets, it is
> > > > an entirely application responsibility to generate packets and its
> > > > timestamps
> > in
> > > > desired order. The timestamps can be put only in the first packet
> > in
> > > > the burst providing the entire burst scheduling.
> > > >
> > > > PMD reports the ability to synchronize packet sending on timestamp
> > > > with new offload flag:
> > > >
> > > > This is palliative and is going to be replaced with new eth_dev
> > > > API about reporting/managing the supported dynamic flags and its
> > related
> > > > features. This API would break ABI compatibility and can't be
> > > > introduced at the moment, so is postponed to 20.11.
> > > >
> > > > For testing purposes it is proposed to update testpmd "txonly"
> > > > forwarding mode routine. With this update testpmd application
> > > > generates the packets and sets the dynamic timestamps according to
> > > > specified time pattern if it sees the "rte_dynfield_timestamp" is
> > registered.
> > > >
> > > > The new testpmd command is proposed to configure sending pattern:
> > > >
> > > > set tx_times <burst_gap>,<intra_gap>
> > > >
> > > > <intra_gap> - the delay between the packets within the burst
> > > >               specified in the device clock units. The number
> > > >               of packets in the burst is defined by txburst
> > parameter
> > > >
> > > > <burst_gap> - the delay between the bursts in the device clock
> > units
> > > >
> > > > As the result the bursts of packet will be transmitted with
> > specific
> > > > delays between the packets within the burst and specific delay
> > between
> > > > the bursts. The rte_eth_get_clock is supposed to be engaged to get
> > the
> > > > current device clock value and provide the reference for the
> > > > timestamps.
> > > >
> > > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > > ---
> > > >  v1->v4:
> > > >     - dedicated dynamic Tx timestamp flag instead of shared with
> > > > Rx
> > >
> > > The detailed description above should be updated to reflect that it
> > is now
> > > two flags.
> > OK
> >
> > >
> > > >     - Doxygen-style comment
> > > >     - comments update
> > > >
> > > > ---
> > > >  lib/librte_ethdev/rte_ethdev.c |  1 +
> > lib/librte_ethdev/rte_ethdev.h
> > > > |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h | 31
> > > > +++++++++++++++++++++++++++++++
> > > >  3 files changed, 36 insertions(+)
> > > >
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > > > b/lib/librte_ethdev/rte_ethdev.c index 8e10a6f..02157d5 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.c
> > > > +++ b/lib/librte_ethdev/rte_ethdev.c
> > > > @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
> > > >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> > > >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > > > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> > > >  };
> > > >
> > > >  #undef RTE_TX_OFFLOAD_BIT2STR
> > > > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > > > b/lib/librte_ethdev/rte_ethdev.h index a49242b..6f6454c 100644
> > > > --- a/lib/librte_ethdev/rte_ethdev.h
> > > > +++ b/lib/librte_ethdev/rte_ethdev.h
> > > > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> > > >  /** Device supports outer UDP checksum */  #define
> > > > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> > > >
> > > > +/** Device supports send on timestamp */ #define
> > > > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> > > > +
> > > > +
> > > >  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP
> 0x00000001
> > > /**<
> > > > Device supports Rx queue setup after device started*/  #define
> > > > RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --
> git
> > > > a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > index 96c3631..7e9f7d2 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> > > > @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char
> *name,
> > > > #define RTE_MBUF_DYNFIELD_METADATA_NAME
> > > "rte_flow_dynfield_metadata"
> > > >  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> > > "rte_flow_dynflag_metadata"
> > > >
> > > > +/**
> > > > + * The timestamp dynamic field provides some timing information,
> > the
> > > > + * units and time references (initial phase) are not explicitly
> > > > defined
> > > > + * but are maintained always the same for a given port. Some
> > devices
> > > > allow4
> > > > + * to query rte_eth_read_clock() that will return the current
> > device
> > > > + * timestamp. The dynamic Tx timestamp flag tells whether the
> > field
> > > > contains
> > > > + * actual timestamp value. For the packets being sent this value
> > can
> > > > be
> > > > + * used by PMD to schedule packet sending.
> > > > + *
> > > > + * After PKT_RX_TIMESTAMP flag and fixed timestamp field
> > deprecation
> > > > + * and obsoleting, the dedicated Rx timestamp flag is supposed to
> > be
> > > > + * introduced and the shared dynamic timestamp field will be used
> > > > + * to handle the timestamps on receiving datapath as well.
> > > > + */
> > > > +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> > > "rte_dynfield_timestamp"
> > >
> > > The description above should not say anything about the dynamic TX
> > > timestamp flag.
> > It does not. Or do you mean RX?
> > Not sure, field and flag are tightly coupled, it is nice to mention
> > this relation for better understanding.
> > And mentioning the RX explains why it is not like this:
> > RTE_MBUF_DYNFIELD_[TX]_TIMESTAMP_NAME
> 
> Sorry. I misunderstood its purpose!
> It's the name of the field, and the field will not only be used for RX, but in the
> future also for RX.
> (I thought it was the name of the RX flag, reserved for future use.)
> 
> >
> > >
> > > Please elaborate "some timing information", e.g. add "... about when
> > the
> > > packet was received".
> >
> > Sorry, I do not follow,  currently the dynamic field is not "about
> > when the packet was received". Now it is introduced for Tx only and
> > just the opportunity to be shared with Rx one in coming releases is
> > mentioned. "Some" means - not specified (herein) exactly.
> > And it is elaborated what Is not specified and how it is supposed to
> > use Tx timestamp.
> 
> It should be described when it is valid, and how it is being used, e.g. by
> adding a reference to the "rte_dynflag_tx_timestamp" flag.
> 
> > >
> > > > +
> > > > +/**
> > > > + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> flag
> > > set on
> > > > the
> > > > + * packet being sent it tries to synchronize the time of packet
> > > > appearing
> > > > + * on the wire with the specified packet timestamp. If the
> > specified
> > > > one
> > > > + * is in the past it should be ignored, if one is in the distant
> > > > future
> > > > + * it should be capped with some reasonable value (in range of
> > > > seconds).
> > > > + *
> > > > + * There is no any packet reordering according to timestamps is
> > > > supposed,
> > > > + * neither for packet within the burst, nor for the whole bursts,
> > it
> > > > is
> > > > + * an entirely application responsibility to generate packets and
> > its
> > > > + * timestamps in desired order. The timestamps might be put only
> > in
> > > > + * the first packet in the burst providing the entire burst
> > > > scheduling.
> > > > + */
> > > > +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> > > "rte_dynflag_tx_timestamp"
> > > > +
> > > >  #endif
> > > > --
> > > > 1.8.3.1
> > > >
> > >
> > > It may be worth adding some documentation about how the clocks of
> > > the NICs are out of sync with the clock of the CPU, and are all
> > > drifting
> > relatively.
> > >
> > > And those clocks are also out of sync with the actual time (NTP
> > clock).
> >
> > IMO, It is out of scope of this very generic patch.  As for mlx NICs -
> > the internal device clock might be (or might be not) synchronized with
> > PTP, it can provide timestamps in real nanoseconds in various formats
> > or just some free running counter.
> 
> Cool!
> 
> > On some systems the NIC and CPU might share the same clock source (for
> > their PLL inputs for example) and there will be no any drifts. As we
> > can see - it is a wide and interesting opic to discuss, but, IMO,  the
> > comment in header file might be not the most relevant place to do. As
> > for mlx5 devices clock specifics - it will be documented in PMD
> > chapter.
> >
> > OK, will add few generic words, the few ones - in order not to make
> > comment wordy, just point the direction for further thinking.
> 
> I agree - we don't want cookbooks in the header files. Only enough
> description to avoid the worst misunderstandings.
> 
> >
> > >
> > > Preferably, some sort of cookbook for handling this should be
> > provided.
> > > PCAP could be used as an example.
> > >
> > testpmd example is included in series, mlx5 PMD patch is prepared and
> > coming soon.
> 
> Great.
> 
> And I suppose that the more detailed cookbook/example - regarding offset
> and drift of various clocks - is probably more relevant for the RX side (for
> various PCAP applications), and thus completely unrelated to this patch.
> 
> >
> > With best regards, Slava


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-08 16:05  0%   ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-08 16:05 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger,
	thomas, mb

> promote Acked-bt from previous patch version to maintain patchwork status accordingly

Acked-by: Olivier Matz <olivier.matz@6wind.com>

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Wednesday, July 8, 2020 18:47
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com;
> mb@smartsharesystems.com
> Subject: [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive) the
> packets at the very precisely specified moment of time provides the
> opportunity to support the connections with Time Division Multiplexing using
> the contemporary general purpose NIC without involving an auxiliary
> hardware. For example, the supporting of O-RAN Fronthaul interface is one
> of the promising features for potentially usage of the precise time
> management for the egress packets.
> 
> The main objective of this RFC is to specify the way how applications can
> provide the moment of time at what the packet transmission must be started
> and to describe in preliminary the supporting this feature from
> mlx5 PMD side.
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not explicitly
> defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return the current
> device timestamp. The dynamic timestamp flag tells whether the field
> contains actual timestamp value. For the packets being sent this value can be
> used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are vendor specific
> and might depend on hardware capabilities and configurations. If might (or
> not) be synchronized with real time via PTP, might (or not) be synchronous
> with CPU clock (for example if NIC and CPU share the same clock source
> there might be no any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> obsoleting, these dynamic flag and field will be used to manage the
> timestamps on receiving datapath as well. Having the dedicated flags for
> Rx/Tx timestamps allows applications not to perform explicit flags reset on
> forwarding and not to promote received timestamps to the transmitting
> datapath by default. The static PKT_RX_TIMESTAMP is considered as
> candidate to become the dynamic flag.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
> tries to synchronize the time of packet appearing on the wire with the
> specified packet timestamp. If the specified one is in the past it should be
> ignored, if one is in the distant future it should be capped with some
> reasonable value (in range of seconds). These specific cases ("too late" and
> "distant future") can be optionally reported via device xstats to assist
> applications to detect the time-related problems.
> 
> There is no any packet reordering according timestamps is supposed, neither
> within packet burst, nor between packets, it is an entirely application
> responsibility to generate packets and its timestamps in desired order. The
> timestamps can be put only in the first packet in the burst providing the
> entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp with
> new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API about
> reporting/managing the supported dynamic flags and its related features.
> This API would break ABI compatibility and can't be introduced at the
> moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific delays
> between the packets within the burst and specific delay between the bursts.
> The rte_eth_get_clock is supposed to be engaged to get the current device
> clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
> 
> ---
>  lib/librte_ethdev/rte_ethdev.c |  1 +
>  lib/librte_ethdev/rte_ethdev.h |  4 ++++  lib/librte_mbuf/rte_mbuf_dyn.h |
> 31 +++++++++++++++++++++++++++++++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 8e10a6f..02157d5 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -162,6 +162,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
> 
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index a49242b..6f6454c 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */  #define
> DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> 
> +/** Device supports send on timestamp */ #define
> +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> +
> +
>  #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
> /**< Device supports Rx queue setup after device started*/  #define
> RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002 diff --git
> a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h index
> 96c3631..8407230 100644
> --- a/lib/librte_mbuf/rte_mbuf_dyn.h
> +++ b/lib/librte_mbuf/rte_mbuf_dyn.h
> @@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
> #define RTE_MBUF_DYNFIELD_METADATA_NAME
> "rte_flow_dynfield_metadata"
>  #define RTE_MBUF_DYNFLAG_METADATA_NAME
> "rte_flow_dynflag_metadata"
> 
> +/**
> + * The timestamp dynamic field provides some timing information, the
> + * units and time references (initial phase) are not explicitly defined
> + * but are maintained always the same for a given port. Some devices
> +allow
> + * to query rte_eth_read_clock() that will return the current device
> + * timestamp. The dynamic Tx timestamp flag tells whether the field
> +contains
> + * actual timestamp value for the packets being sent, this value can be
> + * used by PMD to schedule packet sending.
> + *
> + * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> + * and obsoleting, the dedicated Rx timestamp flag is supposed to be
> + * introduced and the shared dynamic timestamp field will be used
> + * to handle the timestamps on receiving datapath as well.
> + */
> +#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME
> "rte_dynfield_timestamp"
> +
> +/**
> + * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag
> set on the
> + * packet being sent it tries to synchronize the time of packet
> +appearing
> + * on the wire with the specified packet timestamp. If the specified
> +one
> + * is in the past it should be ignored, if one is in the distant future
> + * it should be capped with some reasonable value (in range of seconds).
> + *
> + * There is no any packet reordering according to timestamps is
> +supposed,
> + * neither for packet within the burst, nor for the whole bursts, it is
> + * an entirely application responsibility to generate packets and its
> + * timestamps in desired order. The timestamps might be put only in
> + * the first packet in the burst providing the entire burst scheduling.
> + */
> +#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME
> "rte_dynflag_tx_timestamp"
> +
>  #endif
> --
> 1.8.3.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
  2020-07-08 15:04  0%     ` Kinsella, Ray
@ 2020-07-09  5:21  0%       ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09  5:21 UTC (permalink / raw)
  To: Kinsella, Ray, David Marchand, Aaron Conole
  Cc: dev, David Christensen, Honnappa Nagarahalli, Ruifeng Wang, nd,
	Dodji Seketeli, Neil Horman, Harman Kalra

> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Wednesday, July 8, 2020 11:05 PM
> To: David Marchand <david.marchand@redhat.com>; Phil Yang
> <Phil.Yang@arm.com>; Aaron Conole <aconole@redhat.com>
> Cc: dev <dev@dpdk.org>; David Christensen <drc@linux.vnet.ibm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Dodji Seketeli
> <dodji@redhat.com>; Neil Horman <nhorman@tuxdriver.com>; Harman
> Kalra <hkalra@marvell.com>
> Subject: Re: [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status
> 
> 
> 
> On 08/07/2020 13:29, David Marchand wrote:
> > On Thu, Jun 11, 2020 at 12:25 PM Phil Yang <phil.yang@arm.com> wrote:
> >>
> >> The event status is defined as a volatile variable and shared
> >> between threads. Use c11 atomics with explicit ordering instead
> >> of rte_atomic ops which enforce unnecessary barriers on aarch64.
> >>
> >> Signed-off-by: Phil Yang <phil.yang@arm.com>
> >> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> ---
> >>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
> >>  lib/librte_eal/linux/eal_interrupts.c       | 47 ++++++++++++++++++++----
> -----
> >>  2 files changed, 34 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h
> b/lib/librte_eal/include/rte_eal_interrupts.h
> >> index 773a34a..b1e8a29 100644
> >> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> >> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> >> @@ -59,7 +59,7 @@ enum {
> >>
> >>  /** interrupt epoll event obj, taken by epoll_event.ptr */
> >>  struct rte_epoll_event {
> >> -       volatile uint32_t status;  /**< OUT: event status */
> >> +       uint32_t status;           /**< OUT: event status */
> >>         int fd;                    /**< OUT: event fd */
> >>         int epfd;       /**< OUT: epoll instance the ev associated with */
> >>         struct rte_epoll_data epdata;
> >
> > I got a reject from the ABI check in my env.
> >
> > 1 function with some indirect sub-type change:
> >
> >   [C]'function int rte_pci_ioport_map(rte_pci_device*, int,
> > rte_pci_ioport*)' at pci.c:756:1 has some indirect sub-type changes:
> >     parameter 1 of type 'rte_pci_device*' has sub-type changes:
> >       in pointed to type 'struct rte_pci_device' at rte_bus_pci.h:57:1:
> >         type size hasn't changed
> >         1 data member changes (2 filtered):
> >          type of 'rte_intr_handle rte_pci_device::intr_handle' changed:
> >            type size hasn't changed
> >            1 data member change:
> >             type of 'rte_epoll_event rte_intr_handle::elist[512]' changed:
> >               array element type 'struct rte_epoll_event' changed:
> >                 type size hasn't changed
> >                 1 data member change:
> >                  type of 'volatile uint32_t rte_epoll_event::status' changed:
> >                    entity changed from 'volatile uint32_t' to 'typedef
> > uint32_t' at stdint-uintn.h:26:1
> >                    type size hasn't changed
> >
> >               type size hasn't changed
> >
> >
> > This is probably harmless in our case (going from volatile to non
> > volatile), but it won't pass the check in the CI without an exception
> > rule.
> >
> > Note: checking on the test-report ml, I saw nothing, but ovsrobot did
> > catch the issue with this change too, Aaron?
> >
> >
> Agreed, probably harmless and requires something in libagigail.ignore.

OK. Will update libagigail.ignore in the next version.

Thanks,
Phil


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API
  @ 2020-07-09  6:12  3%   ` Feifei Wang
  2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
  1 sibling, 0 replies; 200+ results
From: Feifei Wang @ 2020-07-09  6:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Ruifeng.wang, Feifei Wang

Remove the experimental tag for rte_ring_reset API that have been around
for 4 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v2:
1. add the changed API into DPDK_21 ABI in the map file. (Ray)

 lib/librte_ring/rte_ring.h           | 3 ---
 lib/librte_ring/rte_ring_version.map | 7 +++++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index f67141482..7181c33b4 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -663,15 +663,12 @@ rte_ring_dequeue(struct rte_ring *r, void **obj_p)
  *
  * This function flush all the elements in a ring
  *
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * @warning
  * Make sure the ring is not in use while calling this function.
  *
  * @param r
  *   A pointer to the ring structure.
  */
-__rte_experimental
 void
 rte_ring_reset(struct rte_ring *r);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index e88c143cf..9a6ce4d32 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -12,11 +12,14 @@ DPDK_20.0 {
 	local: *;
 };
 
-EXPERIMENTAL {
+DPDK_21 {
 	global:
 
-	# added in 19.08
 	rte_ring_reset;
+} DPDK_20.0;
+
+EXPERIMENTAL {
+	global:
 
 	# added in 20.02
 	rte_ring_create_elem;
-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs
    2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API Feifei Wang
@ 2020-07-09  6:12  3%   ` Feifei Wang
  1 sibling, 0 replies; 200+ results
From: Feifei Wang @ 2020-07-09  6:12 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Konstantin Ananyev, Ray Kinsella, Neil Horman
  Cc: dev, nd, Ruifeng.wang, Feifei Wang

Remove the experimental tag for rte_ring_xxx_elem APIs that have been
around for 2 releases.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
v2:
1. add the changed API into DPDK_21 ABI in the map file. (Ray)

 lib/librte_ring/rte_ring.h           |  5 +----
 lib/librte_ring/rte_ring_elem.h      |  8 --------
 lib/librte_ring/rte_ring_version.map | 10 ++--------
 3 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 7181c33b4..35f3f8c42 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -40,6 +40,7 @@ extern "C" {
 #endif
 
 #include <rte_ring_core.h>
+#include <rte_ring_elem.h>
 
 /**
  * Calculate the memory size needed for a ring
@@ -401,10 +402,6 @@ rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			RTE_RING_SYNC_ST, free_space);
 }
 
-#ifdef ALLOW_EXPERIMENTAL_API
-#include <rte_ring_elem.h>
-#endif
-
 /**
  * Enqueue several objects on a ring.
  *
diff --git a/lib/librte_ring/rte_ring_elem.h b/lib/librte_ring/rte_ring_elem.h
index 9e5192ae6..69dc51746 100644
--- a/lib/librte_ring/rte_ring_elem.h
+++ b/lib/librte_ring/rte_ring_elem.h
@@ -23,9 +23,6 @@ extern "C" {
 #include <rte_ring_core.h>
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Calculate the memory size needed for a ring with given element size
  *
  * This function returns the number of bytes needed for a ring, given
@@ -43,13 +40,9 @@ extern "C" {
  *   - -EINVAL - esize is not a multiple of 4 or count provided is not a
  *		 power of 2.
  */
-__rte_experimental
 ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Create a new ring named *name* that stores elements with given size.
  *
  * This function uses ``memzone_reserve()`` to allocate memory. Then it
@@ -109,7 +102,6 @@ ssize_t rte_ring_get_memsize_elem(unsigned int esize, unsigned int count);
  *    - EEXIST - a memzone with the same name already exists
  *    - ENOMEM - no appropriate memory area found in which to create memzone
  */
-__rte_experimental
 struct rte_ring *rte_ring_create_elem(const char *name, unsigned int esize,
 			unsigned int count, int socket_id, unsigned int flags);
 
diff --git a/lib/librte_ring/rte_ring_version.map b/lib/librte_ring/rte_ring_version.map
index 9a6ce4d32..ac392f3ca 100644
--- a/lib/librte_ring/rte_ring_version.map
+++ b/lib/librte_ring/rte_ring_version.map
@@ -15,13 +15,7 @@ DPDK_20.0 {
 DPDK_21 {
 	global:
 
-	rte_ring_reset;
-} DPDK_20.0;
-
-EXPERIMENTAL {
-	global:
-
-	# added in 20.02
 	rte_ring_create_elem;
 	rte_ring_get_memsize_elem;
-};
+	rte_ring_reset;
+} DPDK_20.0;
-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins for interrupt status
    @ 2020-07-09  6:46  3% ` Phil Yang
  2020-07-09  8:02  0%   ` Stefan Puiu
  2020-07-09  8:34  3%   ` [dpdk-dev] [PATCH v3] " Phil Yang
  1 sibling, 2 replies; 200+ results
From: Phil Yang @ 2020-07-09  6:46 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: mdr, aconole, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd, dodji,
	nhorman, hkalra

The event status is defined as a volatile variable and shared between
threads. Use c11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
---
v2:
1. Fixed typo.
2. Updated libabigail.abignore to pass ABI check.
3. Merged v1 two patches into one patch.

 devtools/libabigail.abignore                |  4 +++
 lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
 lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 0133f75..daa4631 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -48,6 +48,10 @@
         changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
 [suppress_variable]
         name = rte_crypto_aead_algorithm_strings
+; Ignore updates of epoll event
+[suppress_type]
+        type_kind = struct
+        name = rte_epoll_event
 
 ;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till DPDK 20.11
diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
index 773a34a..b1e8a29 100644
--- a/lib/librte_eal/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/include/rte_eal_interrupts.h
@@ -59,7 +59,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	volatile uint32_t status;  /**< OUT: event status */
+	uint32_t status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
index 84eeaa1..7a50869 100644
--- a/lib/librte_eal/linux/eal_interrupts.c
+++ b/lib/librte_eal/linux/eal_interrupts.c
@@ -26,7 +26,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_log.h>
@@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 {
 	unsigned int i, count = 0;
 	struct rte_epoll_event *rev;
+	uint32_t valid_status;
 
 	for (i = 0; i < n; i++) {
 		rev = evs[i].data.ptr;
-		if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
-						 RTE_EPOLL_EXEC))
+		valid_status =  RTE_EPOLL_VALID;
+		/* ACQUIRE memory ordering here pairs with RELEASE
+		 * ordering bellow acting as a lock to synchronize
+		 * the event data updating.
+		 */
+		if (!rev || !__atomic_compare_exchange_n(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC, 0,
+				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 			rev->epdata.cb_fun(rev->fd,
 					   rev->epdata.cb_arg);
 
-		rte_compiler_barrier();
-		rev->status = RTE_EPOLL_VALID;
+		/* the status update should be observed after
+		 * the other fields changes.
+		 */
+		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELEASE);
 		count++;
 	}
 	return count;
@@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
 static inline void
 eal_epoll_data_safe_free(struct rte_epoll_event *ev)
 {
-	while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
-				    RTE_EPOLL_INVALID))
-		while (ev->status != RTE_EPOLL_VALID)
+	uint32_t valid_status = RTE_EPOLL_VALID;
+	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&ev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
 			rte_pause();
+		valid_status = RTE_EPOLL_VALID;
+	}
 	memset(&ev->epdata, 0, sizeof(ev->epdata));
 	ev->fd = -1;
 	ev->epfd = -1;
@@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		event->status = RTE_EPOLL_VALID;
+		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELAXED);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			event->status = RTE_EPOLL_INVALID;
+			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
+					__ATOMIC_RELAXED);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
+			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status != RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status == RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
 
 	for (i = 0; i < intr_handle->nb_efd; i++) {
 		rev = &intr_handle->elist[i];
-		if (rev->status == RTE_EPOLL_INVALID)
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
 			eal_epoll_data_safe_free(rev);
-			rev->status = RTE_EPOLL_INVALID;
 		}
 	}
 }
-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] devtools: fix ninja break under default DESTDIR path
@ 2020-07-09  6:53  4% Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09  6:53 UTC (permalink / raw)
  To: david.marchand, dev; +Cc: Honnappa.Nagarahalli, Ruifeng.Wang, nd

If DPDK_ABI_REF_DIR is not set, the default DESTDIR is a relative path.
This will break ninja in the ABI check test.

Fixes: 777014e56d07 ("devtools: add ABI checks")

Signed-off-by: Phil Yang <phil.yang@arm.com>
---
 devtools/test-meson-builds.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
index a87de63..2bfcaca 100755
--- a/devtools/test-meson-builds.sh
+++ b/devtools/test-meson-builds.sh
@@ -143,7 +143,7 @@ build () # <directory> <target compiler | cross file> <meson options>
 	config $srcdir $builds_dir/$targetdir $cross --werror $*
 	compile $builds_dir/$targetdir
 	if [ -n "$DPDK_ABI_REF_VERSION" ]; then
-		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
+		abirefdir=${DPDK_ABI_REF_DIR:-$(pwd)/reference}/$DPDK_ABI_REF_VERSION
 		if [ ! -d $abirefdir/$targetdir ]; then
 			# clone current sources
 			if [ ! -d $abirefdir/src ]; then
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
@ 2020-07-09  8:02  0%   ` Stefan Puiu
  2020-07-09  8:34  3%   ` [dpdk-dev] [PATCH v3] " Phil Yang
  1 sibling, 0 replies; 200+ results
From: Stefan Puiu @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Phil Yang
  Cc: david.marchand, dev, mdr, aconole, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd, dodji, Neil Horman, hkalra

Hi,

Noticed 2 typos:

On Thu, Jul 9, 2020 at 9:46 AM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared between
> threads. Use c11 atomic built-ins with explicit ordering instead of
> rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> The event status has been cleaned up by the compare-and-swap operation
> when we free the event data, so there is no need to set it to invalid
> after that.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Harman Kalra <hkalra@marvell.com>
> ---
> v2:
> 1. Fixed typo.
> 2. Updated libabigail.abignore to pass ABI check.
> 3. Merged v1 two patches into one patch.
>
>  devtools/libabigail.abignore                |  4 +++
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
>  3 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 0133f75..daa4631 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -48,6 +48,10 @@
>          changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
>  [suppress_variable]
>          name = rte_crypto_aead_algorithm_strings
> +; Ignore updates of epoll event
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_epoll_event
>
>  ;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till DPDK 20.11
> diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
> index 773a34a..b1e8a29 100644
> --- a/lib/librte_eal/include/rte_eal_interrupts.h
> +++ b/lib/librte_eal/include/rte_eal_interrupts.h
> @@ -59,7 +59,7 @@ enum {
>
>  /** interrupt epoll event obj, taken by epoll_event.ptr */
>  struct rte_epoll_event {
> -       volatile uint32_t status;  /**< OUT: event status */
> +       uint32_t status;           /**< OUT: event status */
>         int fd;                    /**< OUT: event fd */
>         int epfd;       /**< OUT: epoll instance the ev associated with */
>         struct rte_epoll_data epdata;
> diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
> index 84eeaa1..7a50869 100644
> --- a/lib/librte_eal/linux/eal_interrupts.c
> +++ b/lib/librte_eal/linux/eal_interrupts.c
> @@ -26,7 +26,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_debug.h>
>  #include <rte_log.h>
> @@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
>  {
>         unsigned int i, count = 0;
>         struct rte_epoll_event *rev;
> +       uint32_t valid_status;
>
>         for (i = 0; i < n; i++) {
>                 rev = evs[i].data.ptr;
> -               if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
> -                                                RTE_EPOLL_EXEC))
> +               valid_status =  RTE_EPOLL_VALID;
> +               /* ACQUIRE memory ordering here pairs with RELEASE
> +                * ordering bellow acting as a lock to synchronize
s/bellow/below

> +                * the event data updating.
> +                */
> +               if (!rev || !__atomic_compare_exchange_n(&rev->status,
> +                                   &valid_status, RTE_EPOLL_EXEC, 0,
> +                                   __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
>                         continue;
>
>                 events[count].status        = RTE_EPOLL_VALID;
> @@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
>                         rev->epdata.cb_fun(rev->fd,
>                                            rev->epdata.cb_arg);
>
> -               rte_compiler_barrier();
> -               rev->status = RTE_EPOLL_VALID;
> +               /* the status update should be observed after
> +                * the other fields changes.
s/fields changes/fields change/

Thanks,
Stefan.

> +                */
> +               __atomic_store_n(&rev->status, RTE_EPOLL_VALID,
> +                               __ATOMIC_RELEASE);
>                 count++;
>         }
>         return count;
> @@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
>  static inline void
>  eal_epoll_data_safe_free(struct rte_epoll_event *ev)
>  {
> -       while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
> -                                   RTE_EPOLL_INVALID))
> -               while (ev->status != RTE_EPOLL_VALID)
> +       uint32_t valid_status = RTE_EPOLL_VALID;
> +       while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
> +                   RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> +               while (__atomic_load_n(&ev->status,
> +                               __ATOMIC_RELAXED) != RTE_EPOLL_VALID)
>                         rte_pause();
> +               valid_status = RTE_EPOLL_VALID;
> +       }
>         memset(&ev->epdata, 0, sizeof(ev->epdata));
>         ev->fd = -1;
>         ev->epfd = -1;
> @@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
>                 epfd = rte_intr_tls_epfd();
>
>         if (op == EPOLL_CTL_ADD) {
> -               event->status = RTE_EPOLL_VALID;
> +               __atomic_store_n(&event->status, RTE_EPOLL_VALID,
> +                               __ATOMIC_RELAXED);
>                 event->fd = fd;  /* ignore fd in event */
>                 event->epfd = epfd;
>                 ev.data.ptr = (void *)event;
> @@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
>                         op, fd, strerror(errno));
>                 if (op == EPOLL_CTL_ADD)
>                         /* rollback status when CTL_ADD fail */
> -                       event->status = RTE_EPOLL_INVALID;
> +                       __atomic_store_n(&event->status, RTE_EPOLL_INVALID,
> +                                       __ATOMIC_RELAXED);
>                 return -1;
>         }
>
> -       if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
> +       if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
> +                       __ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
>                 eal_epoll_data_safe_free(event);
>
>         return 0;
> @@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
>         case RTE_INTR_EVENT_ADD:
>                 epfd_op = EPOLL_CTL_ADD;
>                 rev = &intr_handle->elist[efd_idx];
> -               if (rev->status != RTE_EPOLL_INVALID) {
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
>                         RTE_LOG(INFO, EAL, "Event already been added.\n");
>                         return -EEXIST;
>                 }
> @@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
>         case RTE_INTR_EVENT_DEL:
>                 epfd_op = EPOLL_CTL_DEL;
>                 rev = &intr_handle->elist[efd_idx];
> -               if (rev->status == RTE_EPOLL_INVALID) {
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
>                         RTE_LOG(INFO, EAL, "Event does not exist.\n");
>                         return -EPERM;
>                 }
> @@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
>
>         for (i = 0; i < intr_handle->nb_efd; i++) {
>                 rev = &intr_handle->elist[i];
> -               if (rev->status == RTE_EPOLL_INVALID)
> +               if (__atomic_load_n(&rev->status,
> +                               __ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
>                         continue;
>                 if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
>                         /* force free if the entry valid */
>                         eal_epoll_data_safe_free(rev);
> -                       rev->status = RTE_EPOLL_INVALID;
>                 }
>         }
>  }
> --
> 2.7.4
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library
                     ` (2 preceding siblings ...)
  2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
@ 2020-07-09  8:02  4% ` Ruifeng Wang
  2020-07-09  8:02  2%   ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
  2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 28 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09  8:02  2%   ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-09  8:02 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct warps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 167 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 24 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..4fbf5b6df 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm = NULL;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,22 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -197,8 +211,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
-		lpm = NULL;
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
@@ -225,6 +239,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +261,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +481,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +515,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +644,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +690,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1098,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1114,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
  2020-07-09  8:02  0%   ` Stefan Puiu
@ 2020-07-09  8:34  3%   ` Phil Yang
  2020-07-09 10:30  0%     ` David Marchand
  1 sibling, 1 reply; 200+ results
From: Phil Yang @ 2020-07-09  8:34 UTC (permalink / raw)
  To: david.marchand, dev
  Cc: stefan.puiu, mdr, aconole, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd, dodji, nhorman, hkalra

The event status is defined as a volatile variable and shared between
threads. Use c11 atomic built-ins with explicit ordering instead of
rte_atomic ops which enforce unnecessary barriers on aarch64.

The event status has been cleaned up by the compare-and-swap operation
when we free the event data, so there is no need to set it to invalid
after that.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Harman Kalra <hkalra@marvell.com>
---
v3:
Fixed typo.

v2:
1. Fixed typo.
2. Updated libabigail.abignore to pass ABI check.
3. Merged v1 two patches into one patch.

 devtools/libabigail.abignore                |  4 +++
 lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
 lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 0133f75..daa4631 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -48,6 +48,10 @@
         changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
 [suppress_variable]
         name = rte_crypto_aead_algorithm_strings
+; Ignore updates of epoll event
+[suppress_type]
+        type_kind = struct
+        name = rte_epoll_event
 
 ;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till DPDK 20.11
diff --git a/lib/librte_eal/include/rte_eal_interrupts.h b/lib/librte_eal/include/rte_eal_interrupts.h
index 773a34a..b1e8a29 100644
--- a/lib/librte_eal/include/rte_eal_interrupts.h
+++ b/lib/librte_eal/include/rte_eal_interrupts.h
@@ -59,7 +59,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	volatile uint32_t status;  /**< OUT: event status */
+	uint32_t status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/librte_eal/linux/eal_interrupts.c b/lib/librte_eal/linux/eal_interrupts.c
index 84eeaa1..ad09049 100644
--- a/lib/librte_eal/linux/eal_interrupts.c
+++ b/lib/librte_eal/linux/eal_interrupts.c
@@ -26,7 +26,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 #include <rte_log.h>
@@ -1221,11 +1220,18 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 {
 	unsigned int i, count = 0;
 	struct rte_epoll_event *rev;
+	uint32_t valid_status;
 
 	for (i = 0; i < n; i++) {
 		rev = evs[i].data.ptr;
-		if (!rev || !rte_atomic32_cmpset(&rev->status, RTE_EPOLL_VALID,
-						 RTE_EPOLL_EXEC))
+		valid_status =  RTE_EPOLL_VALID;
+		/* ACQUIRE memory ordering here pairs with RELEASE
+		 * ordering below acting as a lock to synchronize
+		 * the event data updating.
+		 */
+		if (!rev || !__atomic_compare_exchange_n(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC, 0,
+				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1237,8 +1243,11 @@ eal_epoll_process_event(struct epoll_event *evs, unsigned int n,
 			rev->epdata.cb_fun(rev->fd,
 					   rev->epdata.cb_arg);
 
-		rte_compiler_barrier();
-		rev->status = RTE_EPOLL_VALID;
+		/* the status update should be observed after
+		 * the other fields change.
+		 */
+		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELEASE);
 		count++;
 	}
 	return count;
@@ -1308,10 +1317,14 @@ rte_epoll_wait(int epfd, struct rte_epoll_event *events,
 static inline void
 eal_epoll_data_safe_free(struct rte_epoll_event *ev)
 {
-	while (!rte_atomic32_cmpset(&ev->status, RTE_EPOLL_VALID,
-				    RTE_EPOLL_INVALID))
-		while (ev->status != RTE_EPOLL_VALID)
+	uint32_t valid_status = RTE_EPOLL_VALID;
+	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+		while (__atomic_load_n(&ev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
 			rte_pause();
+		valid_status = RTE_EPOLL_VALID;
+	}
 	memset(&ev->epdata, 0, sizeof(ev->epdata));
 	ev->fd = -1;
 	ev->epfd = -1;
@@ -1333,7 +1346,8 @@ rte_epoll_ctl(int epfd, int op, int fd,
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		event->status = RTE_EPOLL_VALID;
+		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
+				__ATOMIC_RELAXED);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1345,11 +1359,13 @@ rte_epoll_ctl(int epfd, int op, int fd,
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			event->status = RTE_EPOLL_INVALID;
+			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
+					__ATOMIC_RELAXED);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && event->status != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
+			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1378,7 +1394,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status != RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1401,7 +1418,8 @@ rte_intr_rx_ctl(struct rte_intr_handle *intr_handle, int epfd,
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = &intr_handle->elist[efd_idx];
-		if (rev->status == RTE_EPOLL_INVALID) {
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1426,12 +1444,12 @@ rte_intr_free_epoll_fd(struct rte_intr_handle *intr_handle)
 
 	for (i = 0; i < intr_handle->nb_efd; i++) {
 		rev = &intr_handle->elist[i];
-		if (rev->status == RTE_EPOLL_INVALID)
+		if (__atomic_load_n(&rev->status,
+				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
 			eal_epoll_data_safe_free(rev);
-			rev->status = RTE_EPOLL_INVALID;
 		}
 	}
 }
-- 
2.7.4


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mbuf: use C11 atomics for refcnt operations
  2020-07-08 11:44  0%   ` Olivier Matz
@ 2020-07-09 10:00  3%     ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09 10:00 UTC (permalink / raw)
  To: Olivier Matz
  Cc: david.marchand, dev, drc, Honnappa Nagarahalli, Ruifeng Wang, nd

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Wednesday, July 8, 2020 7:44 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: david.marchand@redhat.com; dev@dpdk.org; drc@linux.vnet.ibm.com;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v2] mbuf: use C11 atomics for refcnt operations
> 
> Hi,
> 
> On Tue, Jul 07, 2020 at 06:10:33PM +0800, Phil Yang wrote:
> > Use C11 atomics with explicit ordering instead of rte_atomic ops which
> > enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v2:
> > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> > to refcnt_atomic.
> >
> >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> >  lib/librte_mbuf/rte_mbuf_core.h | 11 +++--------
> >  3 files changed, 13 insertions(+), 18 deletions(-)
> >

<snip>

> 
> It seems this patch does 2 things:
> - remove refcnt_atomic
> - use C11 atomics
> 
> The first change is an API break. I think it should be announced in a
> deprecation
> notice. The one about atomic does not talk about it.
> 
> So I suggest to keep refcnt_atomic until next version.

Agreed.
I did a local test, this approach doesn't have any ABI breakage issue.
I will update in the next version. 

Thanks,
Phil

> 
> 
> >  	uint16_t nb_segs;         /**< Number of segments. */
> >
> >  	/** Input port (16 bits to support more than 256 virtual ports).
> > @@ -679,7 +674,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> >  struct rte_mbuf_ext_shared_info {
> >  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
> >  	void *fcb_opaque;                        /**< Free callback argument */
> > -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> >  };
> >
> >  /**< Maximum number of nb_segs allowed. */
> > --
> > 2.7.4
> >

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
  2020-07-08  5:11  3%   ` Phil Yang
  2020-07-08 11:44  0%   ` Olivier Matz
@ 2020-07-09 10:10  4%   ` Phil Yang
  2020-07-09 11:03  3%     ` Olivier Matz
  2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
  2 siblings, 2 replies; 200+ results
From: Phil Yang @ 2020-07-09 10:10 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: stephen, david.marchand, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd

Use C11 atomic built-ins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
1.Fix ABI breakage.
2.Simplify data type cast.

v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h |  2 +-
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..c1c0956 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return __atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)-1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..d65d1c8 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -679,7 +679,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3] eal: use c11 atomic built-ins for interrupt status
  2020-07-09  8:34  3%   ` [dpdk-dev] [PATCH v3] " Phil Yang
@ 2020-07-09 10:30  0%     ` David Marchand
  2020-07-10  7:18  3%       ` Dodji Seketeli
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-07-09 10:30 UTC (permalink / raw)
  To: Phil Yang, Ray Kinsella, Harman Kalra
  Cc: dev, stefan.puiu, Aaron Conole, David Christensen,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Neil Horman

On Thu, Jul 9, 2020 at 10:35 AM Phil Yang <phil.yang@arm.com> wrote:
>
> The event status is defined as a volatile variable and shared between
> threads. Use c11 atomic built-ins with explicit ordering instead of
> rte_atomic ops which enforce unnecessary barriers on aarch64.
>
> The event status has been cleaned up by the compare-and-swap operation
> when we free the event data, so there is no need to set it to invalid
> after that.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Harman Kalra <hkalra@marvell.com>
> ---
> v3:
> Fixed typo.
>
> v2:
> 1. Fixed typo.
> 2. Updated libabigail.abignore to pass ABI check.
> 3. Merged v1 two patches into one patch.
>
>  devtools/libabigail.abignore                |  4 +++
>  lib/librte_eal/include/rte_eal_interrupts.h |  2 +-
>  lib/librte_eal/linux/eal_interrupts.c       | 48 ++++++++++++++++++++---------
>  3 files changed, 38 insertions(+), 16 deletions(-)
>
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 0133f75..daa4631 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -48,6 +48,10 @@
>          changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
>  [suppress_variable]
>          name = rte_crypto_aead_algorithm_strings
> +; Ignore updates of epoll event
> +[suppress_type]
> +        type_kind = struct
> +        name = rte_epoll_event

In general, ignoring all changes on a structure is risky.
But the risk is acceptable as long as we remember this for the rest of
the 20.08 release (and we will start from scratch for 20.11).


Without any comment from others, I'll merge this by the end of (my) day.

Thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
@ 2020-07-09 11:03  3%     ` Olivier Matz
  2020-07-09 13:00  3%       ` Phil Yang
  2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
  1 sibling, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-09 11:03 UTC (permalink / raw)
  To: Phil Yang
  Cc: dev, stephen, david.marchand, drc, Honnappa.Nagarahalli,
	Ruifeng.Wang, nd

Hi Phil,

On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> ops which enforce unnecessary barriers on aarch64.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v3:
> 1.Fix ABI breakage.
> 2.Simplify data type cast.
> 
> v2:
> Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> to refcnt_atomic.
> 
>  lib/librte_mbuf/rte_mbuf.c      |  1 -
>  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
>  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index ae91ae2..8a456e5 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -22,7 +22,6 @@
>  #include <rte_eal.h>
>  #include <rte_per_lcore.h>
>  #include <rte_lcore.h>
> -#include <rte_atomic.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_mempool.h>
>  #include <rte_mbuf.h>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index f8e492e..c1c0956 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -37,7 +37,6 @@
>  #include <rte_config.h>
>  #include <rte_mempool.h>
>  #include <rte_memory.h>
> -#include <rte_atomic.h>
>  #include <rte_prefetch.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_byteorder.h>
> @@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
>  static inline uint16_t
>  rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  {
> -	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
> +	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>  static inline void
>  rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  {
> -	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /* internal */
>  static inline uint16_t
>  __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>  {
> -	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
> +	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
> +				 __ATOMIC_ACQ_REL);
>  }
>  
>  /**
> @@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>  static inline uint16_t
>  rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>  {
> -	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
> +	return __atomic_load_n(&shinfo->refcnt_atomic, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -481,7 +481,7 @@ static inline void
>  rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>  	uint16_t new_value)
>  {
> -	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
> +	__atomic_store_n(&shinfo->refcnt_atomic, new_value, __ATOMIC_RELAXED);
>  }
>  
>  /**
> @@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
>  		return (uint16_t)value;
>  	}
>  
> -	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
> +	return __atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)value,
> +				 __ATOMIC_ACQ_REL);
>  }
>  
>  /** Mbuf prefetch */
> @@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>  	 * Direct usage of add primitive to avoid
>  	 * duplication of comparing with one.
>  	 */
> -	if (likely(rte_atomic16_add_return
> -			(&shinfo->refcnt_atomic, -1)))
> +	if (likely(__atomic_add_fetch(&shinfo->refcnt_atomic, (uint16_t)-1,
> +				     __ATOMIC_ACQ_REL)))
>  		return 1;
>  
>  	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
> index 16600f1..d65d1c8 100644
> --- a/lib/librte_mbuf/rte_mbuf_core.h
> +++ b/lib/librte_mbuf/rte_mbuf_core.h
> @@ -679,7 +679,7 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
>  struct rte_mbuf_ext_shared_info {
>  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>  	void *fcb_opaque;                        /**< Free callback argument */
> -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
>  };

To avoid an API breakage (i.e. currently, an application that accesses
to refcnt_atomic expects that its type is rte_atomic16_t), I suggest to
do the same than in the mbuf struct:

	union {
		rte_atomic16_t refcnt_atomic;
		uint16_t refcnt;
	};

I hope the ABI checker won't complain.

It will also be better for 20.11 when the deprecated fields will be
renamed: the remaining one will be called 'refcnt' in both mbuf and
mbuf_ext_shared_info.


Olivier

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (5 preceding siblings ...)
  2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
@ 2020-07-09 12:36  2% ` Viacheslav Ovsiienko
  2020-07-09 23:47  0%   ` Ferruh Yigit
  2020-07-10 12:39  2% ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
  7 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-09 12:36 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this RFC is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature from
mlx5 PMD side.

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
and obsoleting, these dynamic flag and field will be used to manage
the timestamps on receiving datapath as well. Having the dedicated
flags for Rx/Tx timestamps allows applications not to perform explicit
flags reset on forwarding and not to promote received timestamps
to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
is considered as candidate to become the dynamic flag.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and is going to be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_get_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added
  v5->v6:
     - release notes are updated
---
 doc/guides/rel_notes/release_20_08.rst |  6 ++++++
 lib/librte_ethdev/rte_ethdev.c         |  1 +
 lib/librte_ethdev/rte_ethdev.h         |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 988474c..5527bab 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -200,6 +200,12 @@ New Features
   See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
   details of this parameter usage.

+* **Introduced send packet scheduling on the timestamps.**
+
+  Added the new mbuf dynamic field and flag to provide timestamp on what packet
+  transmitting can be synchronized. The device Tx offload flag is added to
+  indicate the PMD supports send scheduling.
+

 Removed Items
 -------------
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7022bd7..c48ca2a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 631b146..97313a0 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 11:03  3%     ` Olivier Matz
@ 2020-07-09 13:00  3%       ` Phil Yang
  2020-07-09 13:31  0%         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Phil Yang @ 2020-07-09 13:00 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Honnappa Nagarahalli,
	Ruifeng Wang, nd

Hi Oliver,

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Thursday, July 9, 2020 7:04 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: dev@dpdk.org; stephen@networkplumber.org;
> david.marchand@redhat.com; drc@linux.vnet.ibm.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
> 
> Hi Phil,
> 
> On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> > ops which enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v3:
> > 1.Fix ABI breakage.
> > 2.Simplify data type cast.
> >
> > v2:
> > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
> > to refcnt_atomic.
> >
> >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> >  3 files changed, 11 insertions(+), 11 deletions(-)
> >
<snip>
> >
> >  	/* Reinitialize counter before mbuf freeing. */
> > diff --git a/lib/librte_mbuf/rte_mbuf_core.h
> b/lib/librte_mbuf/rte_mbuf_core.h
> > index 16600f1..d65d1c8 100644
> > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > @@ -679,7 +679,7 @@ typedef void
> (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> >  struct rte_mbuf_ext_shared_info {
> >  	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> function */
> >  	void *fcb_opaque;                        /**< Free callback argument */
> > -	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > +	uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> >  };
> 
> To avoid an API breakage (i.e. currently, an application that accesses
> to refcnt_atomic expects that its type is rte_atomic16_t), I suggest to
> do the same than in the mbuf struct:
> 
> 	union {
> 		rte_atomic16_t refcnt_atomic;
> 		uint16_t refcnt;
> 	};
> 
> I hope the ABI checker won't complain.
> 
> It will also be better for 20.11 when the deprecated fields will be
> renamed: the remaining one will be called 'refcnt' in both mbuf and
> mbuf_ext_shared_info.

Got it. I agree with you.
It should work. In my local test machine, the ABI checker happy with this approach. 
Once the test is done, I will upstream the new patch.

Appreciate your comments.

Thanks,
Phil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 13:00  3%       ` Phil Yang
@ 2020-07-09 13:31  0%         ` Honnappa Nagarahalli
  2020-07-09 14:10  0%           ` Phil Yang
  0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2020-07-09 13:31 UTC (permalink / raw)
  To: Phil Yang, Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Ruifeng Wang, nd,
	Honnappa Nagarahalli, nd

<snip>

> >
> > Hi Phil,
> >
> > On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > > Use C11 atomic built-ins with explicit ordering instead of
> > > rte_atomic ops which enforce unnecessary barriers on aarch64.
> > >
> > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > v3:
> > > 1.Fix ABI breakage.
> > > 2.Simplify data type cast.
> > >
> > > v2:
> > > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt
> > > field to refcnt_atomic.
> > >
> > >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> > >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> > >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> > >  3 files changed, 11 insertions(+), 11 deletions(-)
> > >
> <snip>
> > >
> > >  /* Reinitialize counter before mbuf freeing. */ diff --git
> > > a/lib/librte_mbuf/rte_mbuf_core.h
> > b/lib/librte_mbuf/rte_mbuf_core.h
> > > index 16600f1..d65d1c8 100644
> > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > @@ -679,7 +679,7 @@ typedef void
> > (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> > >  struct rte_mbuf_ext_shared_info {
> > >  rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> > function */
> > >  void *fcb_opaque;                        /**< Free callback argument */
> > > -rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > > +uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> > >  };
> >
> > To avoid an API breakage (i.e. currently, an application that accesses
> > to refcnt_atomic expects that its type is rte_atomic16_t), I suggest
> > to do the same than in the mbuf struct:
> >
> > union {
> > rte_atomic16_t refcnt_atomic;
> > uint16_t refcnt;
> > };
> >
> > I hope the ABI checker won't complain.
> >
> > It will also be better for 20.11 when the deprecated fields will be
> > renamed: the remaining one will be called 'refcnt' in both mbuf and
> > mbuf_ext_shared_info.
Does this need a deprecation notice in 20.08?

> 
> Got it. I agree with you.
> It should work. In my local test machine, the ABI checker happy with this
> approach.
> Once the test is done, I will upstream the new patch.
> 
> Appreciate your comments.
> 
> Thanks,
> Phil


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 13:31  0%         ` Honnappa Nagarahalli
@ 2020-07-09 14:10  0%           ` Phil Yang
  0 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-09 14:10 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Olivier Matz
  Cc: dev, stephen, david.marchand, drc, Ruifeng Wang, nd, nd

 <snip>

> 
> > >
> > > Hi Phil,
> > >
> > > On Thu, Jul 09, 2020 at 06:10:42PM +0800, Phil Yang wrote:
> > > > Use C11 atomic built-ins with explicit ordering instead of
> > > > rte_atomic ops which enforce unnecessary barriers on aarch64.
> > > >
> > > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > > v3:
> > > > 1.Fix ABI breakage.
> > > > 2.Simplify data type cast.
> > > >
> > > > v2:
> > > > Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt
> > > > field to refcnt_atomic.
> > > >
> > > >  lib/librte_mbuf/rte_mbuf.c      |  1 -
> > > >  lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
> > > >  lib/librte_mbuf/rte_mbuf_core.h |  2 +-
> > > >  3 files changed, 11 insertions(+), 11 deletions(-)
> > > >
> > <snip>
> > > >
> > > >  /* Reinitialize counter before mbuf freeing. */ diff --git
> > > > a/lib/librte_mbuf/rte_mbuf_core.h
> > > b/lib/librte_mbuf/rte_mbuf_core.h
> > > > index 16600f1..d65d1c8 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf_core.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf_core.h
> > > > @@ -679,7 +679,7 @@ typedef void
> > > (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
> > > >  struct rte_mbuf_ext_shared_info {
> > > >  rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback
> > > function */
> > > >  void *fcb_opaque;                        /**< Free callback argument */
> > > > -rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
> > > > +uint16_t refcnt_atomic;              /**< Atomically accessed refcnt */
> > > >  };
> > >
> > > To avoid an API breakage (i.e. currently, an application that accesses
> > > to refcnt_atomic expects that its type is rte_atomic16_t), I suggest
> > > to do the same than in the mbuf struct:
> > >
> > > union {
> > > rte_atomic16_t refcnt_atomic;
> > > uint16_t refcnt;
> > > };
> > >
> > > I hope the ABI checker won't complain.
> > >
> > > It will also be better for 20.11 when the deprecated fields will be
> > > renamed: the remaining one will be called 'refcnt' in both mbuf and
> > > mbuf_ext_shared_info.
> Does this need a deprecation notice in 20.08?

Yes. We'd better do that.
I will add a notice for it in this patch. Thanks.

> 
> >
> > Got it. I agree with you.
> > It should work. In my local test machine, the ABI checker happy with this
> > approach.
> > Once the test is done, I will upstream the new patch.
> >
> > Appreciate your comments.
> >
> > Thanks,
> > Phil


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs
@ 2020-07-09 15:20  4% Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

This patchset proposes some internal and externally-visible changes to the
rawdev API. If consensus is in favour, I will submit a deprecation notice
for the changes for the 20.08 release, so that these ABI/API-breaking
changes can be merged in 20.11

The changes are in two areas:
* For any APIs which take a void * parameter for driver-specific structs,
  add an additional parameter to provide the struct length. This allows
  some runtime type-checking, as well as possible ABI-compatibility support
  in the future as structure change generally involve a change in the size
  of the structure.
* Ensure all APIs which can return error values have int type, rather than
  void. Since functions like info_get and queue_default_get can now do some
  typechecking, they need to be modified to allow them to return error
  codes on failure.

Bruce Richardson (5):
  rawdev: add private data length parameter to info fn
  rawdev: allow drivers to return error from info function
  rawdev: add private data length parameter to config fn
  rawdev: add private data length parameter to queue fns
  rawdev: allow queue default config query to return error

 drivers/bus/ifpga/ifpga_bus.c               |  2 +-
 drivers/raw/ifpga/ifpga_rawdev.c            | 23 +++++-----
 drivers/raw/ioat/ioat_rawdev.c              | 17 ++++---
 drivers/raw/ioat/ioat_rawdev_test.c         |  6 +--
 drivers/raw/ntb/ntb.c                       | 49 ++++++++++++++++-----
 drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c |  7 +--
 drivers/raw/octeontx2_dma/otx2_dpi_test.c   |  3 +-
 drivers/raw/octeontx2_ep/otx2_ep_rawdev.c   |  7 +--
 drivers/raw/octeontx2_ep/otx2_ep_test.c     |  2 +-
 drivers/raw/skeleton/skeleton_rawdev.c      | 34 ++++++++------
 drivers/raw/skeleton/skeleton_rawdev_test.c | 32 ++++++++------
 examples/ioat/ioatfwd.c                     |  4 +-
 examples/ntb/ntb_fwd.c                      |  7 +--
 lib/librte_rawdev/rte_rawdev.c              | 27 +++++++-----
 lib/librte_rawdev/rte_rawdev.h              | 27 ++++++++++--
 lib/librte_rawdev/rte_rawdev_pmd.h          | 22 ++++++---
 16 files changed, 178 insertions(+), 91 deletions(-)

-- 
2.25.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2020-07-12 14:13  0%   ` Xu, Rosen
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

Currently with the rawdev API there is no way to check that the structure
passed in via the dev_private pointer in the dev_info structure is of the
correct type - it's just checked that it is non-NULL. Adding in the length
of the expected structure provides a measure of typechecking, and can also
be used for ABI compatibility in future, since ABI changes involving
structs almost always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/bus/ifpga/ifpga_bus.c               |  2 +-
 drivers/raw/ifpga/ifpga_rawdev.c            |  5 +++--
 drivers/raw/ioat/ioat_rawdev.c              |  5 +++--
 drivers/raw/ioat/ioat_rawdev_test.c         |  4 ++--
 drivers/raw/ntb/ntb.c                       |  8 +++++++-
 drivers/raw/skeleton/skeleton_rawdev.c      |  5 +++--
 drivers/raw/skeleton/skeleton_rawdev_test.c | 19 ++++++++++++-------
 examples/ioat/ioatfwd.c                     |  2 +-
 examples/ntb/ntb_fwd.c                      |  2 +-
 lib/librte_rawdev/rte_rawdev.c              |  6 ++++--
 lib/librte_rawdev/rte_rawdev.h              |  9 ++++++++-
 lib/librte_rawdev/rte_rawdev_pmd.h          |  5 ++++-
 12 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 6b16a20bb..bb8b3dcfb 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -162,7 +162,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
 	afu_dev->id.port      = afu_pr_conf.afu_id.port;
 
 	if (rawdev->dev_ops && rawdev->dev_ops->dev_info_get)
-		rawdev->dev_ops->dev_info_get(rawdev, afu_dev);
+		rawdev->dev_ops->dev_info_get(rawdev, afu_dev, sizeof(*afu_dev));
 
 	if (rawdev->dev_ops &&
 		rawdev->dev_ops->dev_start &&
diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c
index cc25c662b..47cfa3877 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -605,7 +605,8 @@ ifpga_fill_afu_dev(struct opae_accelerator *acc,
 
 static void
 ifpga_rawdev_info_get(struct rte_rawdev *dev,
-				     rte_rawdev_obj_t dev_info)
+		      rte_rawdev_obj_t dev_info,
+		      size_t dev_info_size)
 {
 	struct opae_adapter *adapter;
 	struct opae_accelerator *acc;
@@ -617,7 +618,7 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
 
 	IFPGA_RAWDEV_PMD_FUNC_TRACE();
 
-	if (!dev_info) {
+	if (!dev_info || dev_info_size != sizeof(*afu_dev)) {
 		IFPGA_RAWDEV_PMD_ERR("Invalid request");
 		return;
 	}
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index f876ffc3f..8dd856c55 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -113,12 +113,13 @@ ioat_dev_stop(struct rte_rawdev *dev)
 }
 
 static void
-ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
+ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
 {
 	struct rte_ioat_rawdev_config *cfg = dev_info;
 	struct rte_ioat_rawdev *ioat = dev->dev_private;
 
-	if (cfg != NULL)
+	if (cfg != NULL && dev_info_size == sizeof(*cfg))
 		cfg->ring_size = ioat->ring_size;
 }
 
diff --git a/drivers/raw/ioat/ioat_rawdev_test.c b/drivers/raw/ioat/ioat_rawdev_test.c
index d99f1bd6b..90f5974cd 100644
--- a/drivers/raw/ioat/ioat_rawdev_test.c
+++ b/drivers/raw/ioat/ioat_rawdev_test.c
@@ -157,7 +157,7 @@ ioat_rawdev_test(uint16_t dev_id)
 		return TEST_SKIPPED;
 	}
 
-	rte_rawdev_info_get(dev_id, &info);
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
 	if (p.ring_size != expected_ring_size[dev_id]) {
 		printf("Error, initial ring size is not as expected (Actual: %d, Expected: %d)\n",
 				(int)p.ring_size, expected_ring_size[dev_id]);
@@ -169,7 +169,7 @@ ioat_rawdev_test(uint16_t dev_id)
 		printf("Error with rte_rawdev_configure()\n");
 		return -1;
 	}
-	rte_rawdev_info_get(dev_id, &info);
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
 	if (p.ring_size != IOAT_TEST_RINGSIZE) {
 		printf("Error, ring size is not %d (%d)\n",
 				IOAT_TEST_RINGSIZE, (int)p.ring_size);
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index e40412bb7..4676c6f8f 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -801,11 +801,17 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,
 }
 
 static void
-ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
+ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_dev_info *info = dev_info;
 
+	if (dev_info_size != sizeof(*info)){
+		NTB_LOG(ERR, "Invalid size parameter to %s", __func__);
+		return;
+	}
+
 	info->mw_cnt = hw->mw_cnt;
 	info->mw_size = hw->mw_size;
 
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index 72ece887a..dc05f3ecf 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -42,14 +42,15 @@ static struct queue_buffers queue_buf[SKELETON_MAX_QUEUES] = {};
 static void clear_queue_bufs(int queue_id);
 
 static void skeleton_rawdev_info_get(struct rte_rawdev *dev,
-				     rte_rawdev_obj_t dev_info)
+				     rte_rawdev_obj_t dev_info,
+				     size_t dev_info_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_conf *skeldev_conf;
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev_info) {
+	if (!dev_info || dev_info_size != sizeof(*skeldev_conf)) {
 		SKELETON_PMD_ERR("Invalid request");
 		return;
 	}
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 9ecfdee81..9b8390dfb 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -106,12 +106,12 @@ test_rawdev_info_get(void)
 	struct rte_rawdev_info rdev_info = {0};
 	struct skeleton_rawdev_conf skel_conf = {0};
 
-	ret = rte_rawdev_info_get(test_dev_id, NULL);
+	ret = rte_rawdev_info_get(test_dev_id, NULL, 0);
 	RTE_TEST_ASSERT(ret == -EINVAL, "Expected -EINVAL, %d", ret);
 
 	rdev_info.dev_private = &skel_conf;
 
-	ret = rte_rawdev_info_get(test_dev_id, &rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, &rdev_info, sizeof(skel_conf));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get raw dev info");
 
 	return TEST_SUCCESS;
@@ -142,7 +142,8 @@ test_rawdev_configure(void)
 
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -170,7 +171,8 @@ test_rawdev_queue_default_conf_get(void)
 	/* Get the current configuration */
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to obtain rawdev configuration (%d)",
 				ret);
 
@@ -218,7 +220,8 @@ test_rawdev_queue_setup(void)
 	/* Get the current configuration */
 	rdev_info.dev_private = &rdev_conf_get;
 	ret = rte_rawdev_info_get(test_dev_id,
-				  (rte_rawdev_obj_t)&rdev_info);
+				  (rte_rawdev_obj_t)&rdev_info,
+				  sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -327,7 +330,8 @@ test_rawdev_start_stop(void)
 	dummy_firmware = NULL;
 
 	rte_rawdev_start(test_dev_id);
-	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
@@ -336,7 +340,8 @@ test_rawdev_start_stop(void)
 			      rdev_conf_get.device_state);
 
 	rte_rawdev_stop(test_dev_id);
-	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_info_get(test_dev_id, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(rdev_conf_get));
 	RTE_TEST_ASSERT_SUCCESS(ret,
 				"Failed to obtain rawdev configuration (%d)",
 				ret);
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index b66ee73bc..5c631da1b 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -757,7 +757,7 @@ assign_rawdevs(void)
 			do {
 				if (rdev_id == rte_rawdev_count())
 					goto end;
-				rte_rawdev_info_get(rdev_id++, &rdev_info);
+				rte_rawdev_info_get(rdev_id++, &rdev_info, 0);
 			} while (rdev_info.driver_name == NULL ||
 					strcmp(rdev_info.driver_name,
 						IOAT_PMD_RAWDEV_NAME_STR) != 0);
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index eba8ebf9f..11e224451 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1389,7 +1389,7 @@ main(int argc, char **argv)
 	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME, num_queues);
 	printf("Set queue number as %u.\n", num_queues);
 	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
-	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
+	rte_rawdev_info_get(dev_id, &ntb_rawdev_info, sizeof(ntb_info));
 
 	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
 		  MEMPOOL_CACHE_SIZE;
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index 8f84d0b22..a57689035 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -78,7 +78,8 @@ rte_rawdev_socket_id(uint16_t dev_id)
 }
 
 int
-rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info)
+rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
+		size_t dev_private_size)
 {
 	struct rte_rawdev *rawdev;
 
@@ -89,7 +90,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info)
 
 	if (dev_info->dev_private != NULL) {
 		RTE_FUNC_PTR_OR_ERR_RET(*rawdev->dev_ops->dev_info_get, -ENOTSUP);
-		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info->dev_private);
+		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info->dev_private,
+				dev_private_size);
 	}
 
 	dev_info->driver_name = rawdev->driver_name;
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index 32f6b8bb0..cf6acfd26 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -82,13 +82,20 @@ struct rte_rawdev_info;
  *   will be returned. This can be used to safely query the type of a rawdev
  *   instance without needing to know the size of the private data to return.
  *
+ * @param dev_private_size
+ *   The length of the memory space pointed to by dev_private in dev_info.
+ *   This should be set to the size of the expected private structure to be
+ *   returned, and may be checked by drivers to ensure the expected struct
+ *   type is provided.
+ *
  * @return
  *   - 0: Success, driver updates the contextual information of the raw device
  *   - <0: Error code returned by the driver info get function.
  *
  */
 int
-rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info);
+rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
+		size_t dev_private_size);
 
 /**
  * Configure a raw device.
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 4395a2182..0e72a9205 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -138,12 +138,15 @@ rte_rawdev_pmd_is_valid_dev(uint8_t dev_id)
  *   Raw device pointer
  * @param dev_info
  *   Raw device information structure
+ * @param dev_private_size
+ *   The size of the structure pointed to by dev_info->dev_private
  *
  * @return
  *   Returns 0 on success
  */
 typedef void (*rawdev_info_get_t)(struct rte_rawdev *dev,
-				  rte_rawdev_obj_t dev_info);
+				  rte_rawdev_obj_t dev_info,
+				  size_t dev_private_size);
 
 /**
  * Configure a device.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2020-07-12 14:13  0%   ` Xu, Rosen
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

Currently with the rawdev API there is no way to check that the structure
passed in via the dev_private pointer in the structure passed to configure
API is of the correct type - it's just checked that it is non-NULL. Adding
in the length of the expected structure provides a measure of typechecking,
and can also be used for ABI compatibility in future, since ABI changes
involving structs almost always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/raw/ifpga/ifpga_rawdev.c            | 3 ++-
 drivers/raw/ioat/ioat_rawdev.c              | 5 +++--
 drivers/raw/ioat/ioat_rawdev_test.c         | 2 +-
 drivers/raw/ntb/ntb.c                       | 6 +++++-
 drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 7 ++++---
 drivers/raw/octeontx2_dma/otx2_dpi_test.c   | 3 ++-
 drivers/raw/octeontx2_ep/otx2_ep_rawdev.c   | 7 ++++---
 drivers/raw/octeontx2_ep/otx2_ep_test.c     | 2 +-
 drivers/raw/skeleton/skeleton_rawdev.c      | 5 +++--
 drivers/raw/skeleton/skeleton_rawdev_test.c | 5 +++--
 examples/ioat/ioatfwd.c                     | 2 +-
 examples/ntb/ntb_fwd.c                      | 2 +-
 lib/librte_rawdev/rte_rawdev.c              | 6 ++++--
 lib/librte_rawdev/rte_rawdev.h              | 8 +++++++-
 lib/librte_rawdev/rte_rawdev_pmd.h          | 3 ++-
 15 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/drivers/raw/ifpga/ifpga_rawdev.c b/drivers/raw/ifpga/ifpga_rawdev.c
index 32a2b96c9..a50173264 100644
--- a/drivers/raw/ifpga/ifpga_rawdev.c
+++ b/drivers/raw/ifpga/ifpga_rawdev.c
@@ -684,7 +684,8 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
 
 static int
 ifpga_rawdev_configure(const struct rte_rawdev *dev,
-		rte_rawdev_obj_t config)
+		rte_rawdev_obj_t config,
+		size_t config_size __rte_unused)
 {
 	IFPGA_RAWDEV_PMD_FUNC_TRACE();
 
diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
index 6a336795d..b29ff983f 100644
--- a/drivers/raw/ioat/ioat_rawdev.c
+++ b/drivers/raw/ioat/ioat_rawdev.c
@@ -41,7 +41,8 @@ RTE_LOG_REGISTER(ioat_pmd_logtype, rawdev.ioat, INFO);
 #define COMPLETION_SZ sizeof(__m128i)
 
 static int
-ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct rte_ioat_rawdev_config *params = config;
 	struct rte_ioat_rawdev *ioat = dev->dev_private;
@@ -51,7 +52,7 @@ ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 	if (dev->started)
 		return -EBUSY;
 
-	if (params == NULL)
+	if (params == NULL || config_size != sizeof(*params))
 		return -EINVAL;
 
 	if (params->ring_size > 4096 || params->ring_size < 64 ||
diff --git a/drivers/raw/ioat/ioat_rawdev_test.c b/drivers/raw/ioat/ioat_rawdev_test.c
index 90f5974cd..e5b50ae9f 100644
--- a/drivers/raw/ioat/ioat_rawdev_test.c
+++ b/drivers/raw/ioat/ioat_rawdev_test.c
@@ -165,7 +165,7 @@ ioat_rawdev_test(uint16_t dev_id)
 	}
 
 	p.ring_size = IOAT_TEST_RINGSIZE;
-	if (rte_rawdev_configure(dev_id, &info) != 0) {
+	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
 		printf("Error with rte_rawdev_configure()\n");
 		return -1;
 	}
diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index eaeb67b74..c181094d5 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -837,13 +837,17 @@ ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
 }
 
 static int
-ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct ntb_dev_config *conf = config;
 	struct ntb_hw *hw = dev->dev_private;
 	uint32_t xstats_num;
 	int ret;
 
+	if (conf == NULL || config_size != sizeof(*conf))
+		return -EINVAL;
+
 	hw->queue_pairs	= conf->num_queues;
 	hw->queue_size = conf->queue_size;
 	hw->used_mw_num = conf->mz_num;
diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
index e398abb75..5b496446c 100644
--- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
+++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
@@ -294,7 +294,8 @@ otx2_dpi_rawdev_reset(struct rte_rawdev *dev)
 }
 
 static int
-otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct dpi_rawdev_conf_s *conf = config;
 	struct dpi_vf_s *dpivf = NULL;
@@ -302,8 +303,8 @@ otx2_dpi_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
 	uintptr_t pool;
 	uint32_t gaura;
 
-	if (conf == NULL) {
-		otx2_dpi_dbg("NULL configuration");
+	if (conf == NULL || config_size != sizeof(*conf)) {
+		otx2_dpi_dbg("NULL or invalid configuration");
 		return -EINVAL;
 	}
 	dpivf = (struct dpi_vf_s *)dev->dev_private;
diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_test.c b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
index 276658af0..cec6ca91b 100644
--- a/drivers/raw/octeontx2_dma/otx2_dpi_test.c
+++ b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
@@ -182,7 +182,8 @@ test_otx2_dma_rawdev(uint16_t val)
 	/* Configure rawdev ports */
 	conf.chunk_pool = dpi_create_mempool();
 	rdev_info.dev_private = &conf;
-	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info);
+	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info,
+			sizeof(conf));
 	if (ret) {
 		otx2_dpi_dbg("Unable to configure DPIVF %d", i);
 		return -ENODEV;
diff --git a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
index 0778603d5..2b78a7941 100644
--- a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
+++ b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
@@ -224,13 +224,14 @@ sdp_rawdev_close(struct rte_rawdev *dev)
 }
 
 static int
-sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
+sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
 {
 	struct sdp_rawdev_info *app_info = (struct sdp_rawdev_info *)config;
 	struct sdp_device *sdpvf;
 
-	if (app_info == NULL) {
-		otx2_err("Application config info [NULL]");
+	if (app_info == NULL || config_size != sizeof(*app_info)) {
+		otx2_err("Application config info [NULL] or incorrect size");
 		return -EINVAL;
 	}
 
diff --git a/drivers/raw/octeontx2_ep/otx2_ep_test.c b/drivers/raw/octeontx2_ep/otx2_ep_test.c
index 091f1827c..b876275f7 100644
--- a/drivers/raw/octeontx2_ep/otx2_ep_test.c
+++ b/drivers/raw/octeontx2_ep/otx2_ep_test.c
@@ -108,7 +108,7 @@ sdp_rawdev_selftest(uint16_t dev_id)
 
 	dev_info.dev_private = &app_info;
 
-	ret = rte_rawdev_configure(dev_id, &dev_info);
+	ret = rte_rawdev_configure(dev_id, &dev_info, sizeof(app_info));
 	if (ret) {
 		otx2_err("Unable to configure SDP_VF %d", dev_id);
 		rte_mempool_free(ioq_mpool);
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index dce300c35..531d0450c 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -68,7 +68,8 @@ static int skeleton_rawdev_info_get(struct rte_rawdev *dev,
 }
 
 static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
-				     rte_rawdev_obj_t config)
+				     rte_rawdev_obj_t config,
+				     size_t config_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_conf *skeldev_conf;
@@ -77,7 +78,7 @@ static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
 
 	RTE_FUNC_PTR_OR_ERR_RET(dev, -EINVAL);
 
-	if (!config) {
+	if (config == NULL || config_size != sizeof(*skeldev_conf)) {
 		SKELETON_PMD_ERR("Invalid configuration");
 		return -EINVAL;
 	}
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 9b8390dfb..7dc7c7684 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -126,7 +126,7 @@ test_rawdev_configure(void)
 	struct skeleton_rawdev_conf rdev_conf_get = {0};
 
 	/* Check invalid configuration */
-	ret = rte_rawdev_configure(test_dev_id, NULL);
+	ret = rte_rawdev_configure(test_dev_id, NULL, 0);
 	RTE_TEST_ASSERT(ret == -EINVAL,
 			"Null configure; Expected -EINVAL, got %d", ret);
 
@@ -137,7 +137,8 @@ test_rawdev_configure(void)
 
 	rdev_info.dev_private = &rdev_conf_set;
 	ret = rte_rawdev_configure(test_dev_id,
-				   (rte_rawdev_obj_t)&rdev_info);
+				   (rte_rawdev_obj_t)&rdev_info,
+				   sizeof(rdev_conf_set));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to configure rawdev (%d)", ret);
 
 	rdev_info.dev_private = &rdev_conf_get;
diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c
index 5c631da1b..8e9513e44 100644
--- a/examples/ioat/ioatfwd.c
+++ b/examples/ioat/ioatfwd.c
@@ -734,7 +734,7 @@ configure_rawdev_queue(uint32_t dev_id)
 	struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size };
 	struct rte_rawdev_info info = { .dev_private = &dev_config };
 
-	if (rte_rawdev_configure(dev_id, &info) != 0) {
+	if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) {
 		rte_exit(EXIT_FAILURE,
 			"Error with rte_rawdev_configure()\n");
 	}
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 11e224451..656f73659 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1401,7 +1401,7 @@ main(int argc, char **argv)
 	ntb_conf.num_queues = num_queues;
 	ntb_conf.queue_size = nb_desc;
 	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
-	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
+	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf, sizeof(ntb_conf));
 	if (ret)
 		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
 			"port=%u\n", ret, dev_id);
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index bde33763e..6c4d783cc 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -104,7 +104,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
 }
 
 int
-rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf)
+rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
+		size_t dev_private_size)
 {
 	struct rte_rawdev *dev;
 	int diag;
@@ -123,7 +124,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf)
 	}
 
 	/* Configure the device */
-	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf->dev_private);
+	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf->dev_private,
+			dev_private_size);
 	if (diag != 0)
 		RTE_RDEV_ERR("dev%d dev_configure = %d", dev_id, diag);
 	else
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index cf6acfd26..73e3bd5ae 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -116,13 +116,19 @@ rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
  *   driver/implementation can use to configure the device. It is also assumed
  *   that once the configuration is done, a `queue_id` type field can be used
  *   to refer to some arbitrary internal representation of a queue.
+ * @dev_private_size
+ *   The length of the memory space pointed to by dev_private in dev_info.
+ *   This should be set to the size of the expected private structure to be
+ *   used by the driver, and may be checked by drivers to ensure the expected
+ *   struct type is provided.
  *
  * @return
  *   - 0: Success, device configured.
  *   - <0: Error code returned by the driver configuration function.
  */
 int
-rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf);
+rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
+		size_t dev_private_size);
 
 
 /**
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 89e46412a..050f8b029 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -160,7 +160,8 @@ typedef int (*rawdev_info_get_t)(struct rte_rawdev *dev,
  *   Returns 0 on success
  */
 typedef int (*rawdev_configure_t)(const struct rte_rawdev *dev,
-				  rte_rawdev_obj_t config);
+				  rte_rawdev_obj_t config,
+				  size_t config_size);
 
 /**
  * Start a configured device.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns
  2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
@ 2020-07-09 15:20  3% ` Bruce Richardson
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2020-07-09 15:20 UTC (permalink / raw)
  To: Nipun Gupta, Hemant Agrawal
  Cc: dev, Rosen Xu, Tianfei zhang, Xiaoyun Li, Jingjing Wu, Satha Rao,
	Mahipal Challa, Jerin Jacob, Bruce Richardson

The queue setup and queue defaults query functions take a void * parameter
as configuration data, preventing any compile-time checking of the
parameters and limiting runtime checks. Adding in the length of the
expected structure provides a measure of typechecking, and can also be used
for ABI compatibility in future, since ABI changes involving structs almost
always involve a change in size.

Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/raw/ntb/ntb.c                       | 25 ++++++++++++++++-----
 drivers/raw/skeleton/skeleton_rawdev.c      | 10 +++++----
 drivers/raw/skeleton/skeleton_rawdev_test.c |  8 +++----
 examples/ntb/ntb_fwd.c                      |  3 ++-
 lib/librte_rawdev/rte_rawdev.c              | 10 +++++----
 lib/librte_rawdev/rte_rawdev.h              | 10 +++++++--
 lib/librte_rawdev/rte_rawdev_pmd.h          |  6 +++--
 7 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c
index c181094d5..7c15e204c 100644
--- a/drivers/raw/ntb/ntb.c
+++ b/drivers/raw/ntb/ntb.c
@@ -249,11 +249,15 @@ ntb_dev_intr_handler(void *param)
 static void
 ntb_queue_conf_get(struct rte_rawdev *dev,
 		   uint16_t queue_id,
-		   rte_rawdev_obj_t queue_conf)
+		   rte_rawdev_obj_t queue_conf,
+		   size_t conf_size)
 {
 	struct ntb_queue_conf *q_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 
+	if (conf_size != sizeof(*q_conf))
+		return;
+
 	q_conf->tx_free_thresh = hw->tx_queues[queue_id]->tx_free_thresh;
 	q_conf->nb_desc = hw->rx_queues[queue_id]->nb_rx_desc;
 	q_conf->rx_mp = hw->rx_queues[queue_id]->mpool;
@@ -294,12 +298,16 @@ ntb_rxq_release(struct ntb_rx_queue *rxq)
 static int
 ntb_rxq_setup(struct rte_rawdev *dev,
 	      uint16_t qp_id,
-	      rte_rawdev_obj_t queue_conf)
+	      rte_rawdev_obj_t queue_conf,
+	      size_t conf_size)
 {
 	struct ntb_queue_conf *rxq_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_rx_queue *rxq;
 
+	if (conf_size != sizeof(*rxq_conf))
+		return -EINVAL;
+
 	/* Allocate the rx queue data structure */
 	rxq = rte_zmalloc_socket("ntb rx queue",
 				 sizeof(struct ntb_rx_queue),
@@ -375,13 +383,17 @@ ntb_txq_release(struct ntb_tx_queue *txq)
 static int
 ntb_txq_setup(struct rte_rawdev *dev,
 	      uint16_t qp_id,
-	      rte_rawdev_obj_t queue_conf)
+	      rte_rawdev_obj_t queue_conf,
+	      size_t conf_size)
 {
 	struct ntb_queue_conf *txq_conf = queue_conf;
 	struct ntb_hw *hw = dev->dev_private;
 	struct ntb_tx_queue *txq;
 	uint16_t i, prev;
 
+	if (conf_size != sizeof(*txq_conf))
+		return -EINVAL;
+
 	/* Allocate the TX queue data structure. */
 	txq = rte_zmalloc_socket("ntb tx queue",
 				  sizeof(struct ntb_tx_queue),
@@ -439,7 +451,8 @@ ntb_txq_setup(struct rte_rawdev *dev,
 static int
 ntb_queue_setup(struct rte_rawdev *dev,
 		uint16_t queue_id,
-		rte_rawdev_obj_t queue_conf)
+		rte_rawdev_obj_t queue_conf,
+		size_t conf_size)
 {
 	struct ntb_hw *hw = dev->dev_private;
 	int ret;
@@ -447,11 +460,11 @@ ntb_queue_setup(struct rte_rawdev *dev,
 	if (queue_id >= hw->queue_pairs)
 		return -EINVAL;
 
-	ret = ntb_txq_setup(dev, queue_id, queue_conf);
+	ret = ntb_txq_setup(dev, queue_id, queue_conf, conf_size);
 	if (ret < 0)
 		return ret;
 
-	ret = ntb_rxq_setup(dev, queue_id, queue_conf);
+	ret = ntb_rxq_setup(dev, queue_id, queue_conf, conf_size);
 
 	return ret;
 }
diff --git a/drivers/raw/skeleton/skeleton_rawdev.c b/drivers/raw/skeleton/skeleton_rawdev.c
index 531d0450c..f109e4d2c 100644
--- a/drivers/raw/skeleton/skeleton_rawdev.c
+++ b/drivers/raw/skeleton/skeleton_rawdev.c
@@ -222,14 +222,15 @@ static int skeleton_rawdev_reset(struct rte_rawdev *dev)
 
 static void skeleton_rawdev_queue_def_conf(struct rte_rawdev *dev,
 					   uint16_t queue_id,
-					   rte_rawdev_obj_t queue_conf)
+					   rte_rawdev_obj_t queue_conf,
+					   size_t conf_size)
 {
 	struct skeleton_rawdev *skeldev;
 	struct skeleton_rawdev_queue *skelq;
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev || !queue_conf)
+	if (!dev || !queue_conf || conf_size != sizeof(struct skeleton_rawdev_queue))
 		return;
 
 	skeldev = skeleton_rawdev_get_priv(dev);
@@ -252,7 +253,8 @@ clear_queue_bufs(int queue_id)
 
 static int skeleton_rawdev_queue_setup(struct rte_rawdev *dev,
 				       uint16_t queue_id,
-				       rte_rawdev_obj_t queue_conf)
+				       rte_rawdev_obj_t queue_conf,
+				       size_t conf_size)
 {
 	int ret = 0;
 	struct skeleton_rawdev *skeldev;
@@ -260,7 +262,7 @@ static int skeleton_rawdev_queue_setup(struct rte_rawdev *dev,
 
 	SKELETON_PMD_FUNC_TRACE();
 
-	if (!dev || !queue_conf)
+	if (!dev || !queue_conf || conf_size != sizeof(struct skeleton_rawdev_queue))
 		return -EINVAL;
 
 	skeldev = skeleton_rawdev_get_priv(dev);
diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c b/drivers/raw/skeleton/skeleton_rawdev_test.c
index 7dc7c7684..bb4b6efe4 100644
--- a/drivers/raw/skeleton/skeleton_rawdev_test.c
+++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
@@ -185,7 +185,7 @@ test_rawdev_queue_default_conf_get(void)
 	 * depth = DEF_DEPTH
 	 */
 	for (i = 0; i < rdev_conf_get.num_queues; i++) {
-		rte_rawdev_queue_conf_get(test_dev_id, i, &q);
+		rte_rawdev_queue_conf_get(test_dev_id, i, &q, sizeof(q));
 		RTE_TEST_ASSERT_EQUAL(q.depth, SKELETON_QUEUE_DEF_DEPTH,
 				      "Invalid default depth of queue (%d)",
 				      q.depth);
@@ -235,11 +235,11 @@ test_rawdev_queue_setup(void)
 	/* Modify the queue depth for Queue 0 and attach it */
 	qset.depth = 15;
 	qset.state = SKELETON_QUEUE_ATTACH;
-	ret = rte_rawdev_queue_setup(test_dev_id, 0, &qset);
+	ret = rte_rawdev_queue_setup(test_dev_id, 0, &qset, sizeof(qset));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to setup queue (%d)", ret);
 
 	/* Now, fetching the queue 0 should show depth as 15 */
-	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget);
+	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget, sizeof(qget));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get queue config (%d)", ret);
 
 	RTE_TEST_ASSERT_EQUAL(qset.depth, qget.depth,
@@ -263,7 +263,7 @@ test_rawdev_queue_release(void)
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to release queue 0; (%d)", ret);
 
 	/* Now, fetching the queue 0 should show depth as default */
-	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget);
+	ret = rte_rawdev_queue_conf_get(test_dev_id, 0, &qget, sizeof(qget));
 	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get queue config (%d)", ret);
 
 	RTE_TEST_ASSERT_EQUAL(qget.depth, SKELETON_QUEUE_DEF_DEPTH,
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 656f73659..5a8439b8d 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -1411,7 +1411,8 @@ main(int argc, char **argv)
 	ntb_q_conf.rx_mp = mbuf_pool;
 	for (i = 0; i < num_queues; i++) {
 		/* Setup rawdev queue */
-		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf);
+		ret = rte_rawdev_queue_setup(dev_id, i, &ntb_q_conf,
+				sizeof(ntb_q_conf));
 		if (ret < 0)
 			rte_exit(EXIT_FAILURE,
 				"Failed to setup ntb queue %u.\n", i);
diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
index 6c4d783cc..8965f2ce3 100644
--- a/lib/librte_rawdev/rte_rawdev.c
+++ b/lib/librte_rawdev/rte_rawdev.c
@@ -137,7 +137,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
 int
 rte_rawdev_queue_conf_get(uint16_t dev_id,
 			  uint16_t queue_id,
-			  rte_rawdev_obj_t queue_conf)
+			  rte_rawdev_obj_t queue_conf,
+			  size_t queue_conf_size)
 {
 	struct rte_rawdev *dev;
 
@@ -145,14 +146,15 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
 	dev = &rte_rawdevs[dev_id];
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_def_conf, -ENOTSUP);
-	(*dev->dev_ops->queue_def_conf)(dev, queue_id, queue_conf);
+	(*dev->dev_ops->queue_def_conf)(dev, queue_id, queue_conf, queue_conf_size);
 	return 0;
 }
 
 int
 rte_rawdev_queue_setup(uint16_t dev_id,
 		       uint16_t queue_id,
-		       rte_rawdev_obj_t queue_conf)
+		       rte_rawdev_obj_t queue_conf,
+		       size_t queue_conf_size)
 {
 	struct rte_rawdev *dev;
 
@@ -160,7 +162,7 @@ rte_rawdev_queue_setup(uint16_t dev_id,
 	dev = &rte_rawdevs[dev_id];
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->queue_setup, -ENOTSUP);
-	return (*dev->dev_ops->queue_setup)(dev, queue_id, queue_conf);
+	return (*dev->dev_ops->queue_setup)(dev, queue_id, queue_conf, queue_conf_size);
 }
 
 int
diff --git a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h
index 73e3bd5ae..bbd63913a 100644
--- a/lib/librte_rawdev/rte_rawdev.h
+++ b/lib/librte_rawdev/rte_rawdev.h
@@ -146,6 +146,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
  *   previously supplied to rte_rawdev_configure().
  * @param[out] queue_conf
  *   The pointer to the default raw queue configuration data.
+ * @param queue_conf_size
+ *   The size of the structure pointed to by queue_conf
  * @return
  *   - 0: Success, driver updates the default raw queue configuration data.
  *   - <0: Error code returned by the driver info get function.
@@ -156,7 +158,8 @@ rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
 int
 rte_rawdev_queue_conf_get(uint16_t dev_id,
 			  uint16_t queue_id,
-			  rte_rawdev_obj_t queue_conf);
+			  rte_rawdev_obj_t queue_conf,
+			  size_t queue_conf_size);
 
 /**
  * Allocate and set up a raw queue for a raw device.
@@ -169,6 +172,8 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
  * @param queue_conf
  *   The pointer to the configuration data to be used for the raw queue.
  *   NULL value is allowed, in which case default configuration	used.
+ * @param queue_conf_size
+ *   The size of the structure pointed to by queue_conf
  *
  * @see rte_rawdev_queue_conf_get()
  *
@@ -179,7 +184,8 @@ rte_rawdev_queue_conf_get(uint16_t dev_id,
 int
 rte_rawdev_queue_setup(uint16_t dev_id,
 		       uint16_t queue_id,
-		       rte_rawdev_obj_t queue_conf);
+		       rte_rawdev_obj_t queue_conf,
+		       size_t queue_conf_size);
 
 /**
  * Release and deallocate a raw queue from a raw device.
diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h b/lib/librte_rawdev/rte_rawdev_pmd.h
index 050f8b029..34eb667f6 100644
--- a/lib/librte_rawdev/rte_rawdev_pmd.h
+++ b/lib/librte_rawdev/rte_rawdev_pmd.h
@@ -218,7 +218,8 @@ typedef int (*rawdev_reset_t)(struct rte_rawdev *dev);
  */
 typedef void (*rawdev_queue_conf_get_t)(struct rte_rawdev *dev,
 					uint16_t queue_id,
-					rte_rawdev_obj_t queue_conf);
+					rte_rawdev_obj_t queue_conf,
+					size_t queue_conf_size);
 
 /**
  * Setup an raw queue.
@@ -235,7 +236,8 @@ typedef void (*rawdev_queue_conf_get_t)(struct rte_rawdev *dev,
  */
 typedef int (*rawdev_queue_setup_t)(struct rte_rawdev *dev,
 				    uint16_t queue_id,
-				    rte_rawdev_obj_t queue_conf);
+				    rte_rawdev_obj_t queue_conf,
+				    size_t queue_conf_size);
 
 /**
  * Release resources allocated by given raw queue.
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library
                     ` (3 preceding siblings ...)
  2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42  4% ` Ruifeng Wang
  2020-07-09 15:42  2%   ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-09 15:42  2%   ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-09 15:42 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..2d687c372 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,21 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -197,7 +212,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -225,6 +241,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +263,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +483,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +517,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +646,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +692,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1100,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1116,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
  2020-07-08 13:09  7% ` Kinsella, Ray
@ 2020-07-09 15:52  4% ` Dodji Seketeli
  2020-07-10  7:37  4% ` Kinsella, Ray
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Dodji Seketeli @ 2020-07-09 15:52 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, Ray Kinsella, Neil Horman

Hello,

David Marchand <david.marchand@redhat.com> writes:

> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>

For what it's worth, the change looks good to me, at least from an
abidiff perspective.

Thanks.

Cheers.

-- 
		Dodji


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
  2020-07-09 11:03  3%     ` Olivier Matz
@ 2020-07-09 15:58  4%     ` Phil Yang
  2020-07-15 12:29  0%       ` David Marchand
  1 sibling, 1 reply; 200+ results
From: Phil Yang @ 2020-07-09 15:58 UTC (permalink / raw)
  To: olivier.matz, dev
  Cc: stephen, david.marchand, drc, Honnappa.Nagarahalli, Ruifeng.Wang, nd

Use C11 atomic built-ins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v4:
1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
to avoid ABI breakage. (Olivier)
2. Add notice of refcnt_atomic deprecation. (Honnappa)

v3:
1.Fix ABI breakage.
2.Simplify data type cast.

v2:
Fix ABI issue: revert the rte_mbuf_ext_shared_info struct refcnt field
to refcnt_atomic.

 lib/librte_mbuf/rte_mbuf.c      |  1 -
 lib/librte_mbuf/rte_mbuf.h      | 19 ++++++++++---------
 lib/librte_mbuf/rte_mbuf_core.h |  6 +++++-
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index ae91ae2..8a456e5 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -22,7 +22,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f8e492e..7259575 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -37,7 +37,6 @@
 #include <rte_config.h>
 #include <rte_mempool.h>
 #include <rte_memory.h>
-#include <rte_atomic.h>
 #include <rte_prefetch.h>
 #include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
@@ -365,7 +364,7 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return (uint16_t)(rte_atomic16_read(&m->refcnt_atomic));
+	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -378,14 +377,15 @@ rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	rte_atomic16_set(&m->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return (uint16_t)(rte_atomic16_add_return(&m->refcnt_atomic, value));
+	return __atomic_add_fetch(&m->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /**
@@ -466,7 +466,7 @@ rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return (uint16_t)(rte_atomic16_read(&shinfo->refcnt_atomic));
+	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
 }
 
 /**
@@ -481,7 +481,7 @@ static inline void
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	rte_atomic16_set(&shinfo->refcnt_atomic, (int16_t)new_value);
+	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
 }
 
 /**
@@ -505,7 +505,8 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 		return (uint16_t)value;
 	}
 
-	return (uint16_t)rte_atomic16_add_return(&shinfo->refcnt_atomic, value);
+	return __atomic_add_fetch(&shinfo->refcnt, (uint16_t)value,
+				 __ATOMIC_ACQ_REL);
 }
 
 /** Mbuf prefetch */
@@ -1304,8 +1305,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(rte_atomic16_add_return
-			(&shinfo->refcnt_atomic, -1)))
+	if (likely(__atomic_add_fetch(&shinfo->refcnt, (uint16_t)-1,
+				     __ATOMIC_ACQ_REL)))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/librte_mbuf/rte_mbuf_core.h b/lib/librte_mbuf/rte_mbuf_core.h
index 16600f1..8cd7137 100644
--- a/lib/librte_mbuf/rte_mbuf_core.h
+++ b/lib/librte_mbuf/rte_mbuf_core.h
@@ -679,7 +679,11 @@ typedef void (*rte_mbuf_extbuf_free_callback_t)(void *addr, void *opaque);
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	rte_atomic16_t refcnt_atomic;        /**< Atomically accessed refcnt */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+	};
 };
 
 /**< Maximum number of nb_segs allowed. */
-- 
2.7.4


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
@ 2020-07-09 23:47  0%   ` Ferruh Yigit
  2020-07-10 12:32  0%     ` Slava Ovsiienko
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2020-07-09 23:47 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, dev
  Cc: matan, rasland, olivier.matz, bernard.iremonger, thomas,
	Andrew Rybchenko

On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive)
> the packets at the very precisely specified moment of time provides
> the opportunity to support the connections with Time Division
> Multiplexing using the contemporary general purpose NIC without involving
> an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> interface is one of the promising features for potentially usage of the
> precise time management for the egress packets.

Is this a HW support, or is the scheduling planned to be done in the driver?

> 
> The main objective of this RFC is to specify the way how applications

It is no more RFC.

> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from
> mlx5 PMD side.

I was about the ask this, will there be a PMD counterpart implementation of the
feature? It would be better to have it as part of this set.
What is the plan for the PMD implementation?

> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not
> explicitly defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return
> the current device timestamp. The dynamic timestamp flag tells whether
> the field contains actual timestamp value. For the packets being sent
> this value can be used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are
> vendor specific and might depend on hardware capabilities and
> configurations. If might (or not) be synchronized with real time
> via PTP, might (or not) be synchronous with CPU clock (for example
> if NIC and CPU share the same clock source there might be no
> any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
> and obsoleting, these dynamic flag and field will be used to manage
> the timestamps on receiving datapath as well. Having the dedicated
> flags for Rx/Tx timestamps allows applications not to perform explicit
> flags reset on forwarding and not to promote received timestamps
> to the transmitting datapath by default. The static PKT_RX_TIMESTAMP
> is considered as candidate to become the dynamic flag.

Is there a deprecation notice for 'PKT_RX_TIMESTAMP'? Is this decided?

> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
> it tries to synchronize the time of packet appearing on the wire with
> the specified packet timestamp. If the specified one is in the past it
> should be ignored, if one is in the distant future it should be capped
> with some reasonable value (in range of seconds). These specific cases
> ("too late" and "distant future") can be optionally reported via
> device xstats to assist applications to detect the time-related
> problems.
> 
> There is no any packet reordering according timestamps is supposed,
> neither within packet burst, nor between packets, it is an entirely
> application responsibility to generate packets and its timestamps
> in desired order. The timestamps can be put only in the first packet
> in the burst providing the entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp
> with new offload flag:
> 
> This is palliative and is going to be replaced with new eth_dev API
> about reporting/managing the supported dynamic flags and its related
> features. This API would break ABI compatibility and can't be introduced
> at the moment, so is postponed to 20.11.

Good to hear that there will be a generic API to get supported dynamic flags. I
was concerned about adding 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP' flag, since not
sure if there will be any other PMD that will want to use it.
The trouble is it is hard to remove a public macro after it is introduced, in
this release I think only single PMD (mlx) will support this feature, and in
next release the plan is to remove the macro. In this case what do you think to
not introduce the flag at all?

> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific
> delays between the packets within the burst and specific delay between
> the bursts. The rte_eth_get_clock is supposed to be engaged to get the

'rte_eth_read_clock()'?

> current device clock value and provide the reference for the timestamps.
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
>   v5->v6:
>      - release notes are updated
> ---
>  doc/guides/rel_notes/release_20_08.rst |  6 ++++++
>  lib/librte_ethdev/rte_ethdev.c         |  1 +
>  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> index 988474c..5527bab 100644
> --- a/doc/guides/rel_notes/release_20_08.rst
> +++ b/doc/guides/rel_notes/release_20_08.rst
> @@ -200,6 +200,12 @@ New Features
>    See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
>    details of this parameter usage.
>  
> +* **Introduced send packet scheduling on the timestamps.**
> +
> +  Added the new mbuf dynamic field and flag to provide timestamp on what packet
> +  transmitting can be synchronized. The device Tx offload flag is added to
> +  indicate the PMD supports send scheduling.
> +

This is a core library change, can go up in the section, please check the
section comment for the ordering details.

>  
>  Removed Items
>  -------------
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 7022bd7..c48ca2a 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
>  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
>  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
>  };
>  
>  #undef RTE_TX_OFFLOAD_BIT2STR
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 631b146..97313a0 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
>  /** Device supports outer UDP checksum */
>  #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
>  
> +/** Device supports send on timestamp */
> +#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000

Please cc the ethdev maintainers.

As mentioned above my concern is if this is generic enough or are we adding a
flag to a specific PMD? And since commit log says this is temporary solution for
just this release, I repeat my question if we can remove the flag completely?


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library
                     ` (4 preceding siblings ...)
  2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-10  2:22  4% ` Ruifeng Wang
  2020-07-10  2:22  2%   ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
  5 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  Cc: dev, mdr, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

This patchset integrates RCU QSBR support with LPM library.

Resource reclaimation implementation was splitted from the original
series, and has already been part of RCU library. Rework the series
to base LPM integration on RCU reclaimation APIs.

New API rte_lpm_rcu_qsbr_add is introduced for application to
register a RCU variable that LPM library will use. This provides
user the handle to enable RCU that integrated in LPM library.

Functional tests and performance tests are added to cover the
integration with RCU.

---
v10:
Added missing Acked-by tags.

v9:
Cleared lpm when allocation failed. (David)

v8:
Fixed ABI issue by adding internal LPM control structure. (David)
Changed to use RFC5737 address in unit test. (Vladimir)

v7:
Fixed typos in document.

v6:
Remove ALLOW_EXPERIMENTAL_API from rte_lpm.c.

v5:
No default value for reclaim_thd. This allows reclamation triggering with every call.
Pass LPM pointer instead of tbl8 as argument of reclaim callback free function.
Updated group_idx check at tbl8 allocation.
Use enums instead of defines for different reclamation modes.
RCU QSBR integrated path is inside ALLOW_EXPERIMENTAL_API to avoid ABI change.

v4:
Allow user to configure defer queue: size, reclaim threshold, max entries.
Return defer queue handler so user can manually trigger reclaimation.
Add blocking mode support. Defer queue will not be created.

Honnappa Nagarahalli (1):
  test/lpm: add RCU integration performance tests

Ruifeng Wang (2):
  lib/lpm: integrate RCU QSBR
  test/lpm: add LPM RCU integration functional tests

 app/test/test_lpm.c                | 291 ++++++++++++++++-
 app/test/test_lpm_perf.c           | 492 ++++++++++++++++++++++++++++-
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 ++++++++--
 lib/librte_lpm/rte_lpm.h           |  53 ++++
 lib/librte_lpm/rte_lpm_version.map |   6 +
 8 files changed, 1016 insertions(+), 26 deletions(-)

-- 
2.17.1

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
  2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
@ 2020-07-10  2:22  2%   ` Ruifeng Wang
  2020-07-10  2:29  0%     ` Ruifeng Wang
  0 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2020-07-10  2:22 UTC (permalink / raw)
  To: Bruce Richardson, Vladimir Medvedkin, John McNamara,
	Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, honnappa.nagarahalli, nd, Ruifeng Wang

Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
---
 doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
 lib/librte_lpm/Makefile            |   2 +-
 lib/librte_lpm/meson.build         |   1 +
 lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
 lib/librte_lpm/rte_lpm.h           |  53 +++++++++
 lib/librte_lpm/rte_lpm_version.map |   6 ++
 6 files changed, 237 insertions(+), 22 deletions(-)

diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst
index 1609a57d0..03945904b 100644
--- a/doc/guides/prog_guide/lpm_lib.rst
+++ b/doc/guides/prog_guide/lpm_lib.rst
@@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not.
 Prefix expansion is one of the keys of this algorithm,
 since it improves the speed dramatically by adding redundancy.
 
+Deletion
+~~~~~~~~
+
+When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has
+the longest prefix match with the rule to be deleted, but has shorter prefix.
+
+If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop
+value with the replacement rule.
+
+If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared.
+
+Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits.
+
+After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases:
+
+*   All tbl8s in the group are empty .
+
+*   All tbl8s in the group have the same values and with depth no greater than 24.
+
+Free of tbl8s have different behaviors:
+
+*   If RCU is not used, tbl8s are cleared and reclaimed immediately.
+
+*   If RCU is used, tbl8s are reclaimed when readers are in quiescent state.
+
+When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using
+the tbl8 group entries. This might result in incorrect lookup results.
+
+RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities
+while using this feature. Please refer to resource reclamation framework of :ref:`RCU library <RCU_Library>`
+for more details.
+
 Lookup
 ~~~~~~
 
diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile
index d682785b6..6f06c5c03 100644
--- a/lib/librte_lpm/Makefile
+++ b/lib/librte_lpm/Makefile
@@ -8,7 +8,7 @@ LIB = librte_lpm.a
 
 CFLAGS += -O3
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
-LDLIBS += -lrte_eal -lrte_hash
+LDLIBS += -lrte_eal -lrte_hash -lrte_rcu
 
 EXPORT_MAP := rte_lpm_version.map
 
diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build
index 021ac6d8d..6cfc083c5 100644
--- a/lib/librte_lpm/meson.build
+++ b/lib/librte_lpm/meson.build
@@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h')
 # without worrying about which architecture we actually need
 headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h')
 deps += ['hash']
+deps += ['rcu']
diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c
index 38ab512a4..2d687c372 100644
--- a/lib/librte_lpm/rte_lpm.c
+++ b/lib/librte_lpm/rte_lpm.c
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #include <string.h>
@@ -39,6 +40,17 @@ enum valid_flag {
 	VALID
 };
 
+/** @internal LPM structure. */
+struct __rte_lpm {
+	/* LPM metadata. */
+	struct rte_lpm lpm;
+
+	/* RCU config. */
+	struct rte_rcu_qsbr *v;		/* RCU QSBR variable. */
+	enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */
+	struct rte_rcu_qsbr_dq *dq;	/* RCU QSBR defer queue. */
+};
+
 /* Macro to enable/disable run-time checks. */
 #if defined(RTE_LIBRTE_LPM_DEBUG)
 #include <rte_debug.h>
@@ -122,6 +134,7 @@ rte_lpm_create(const char *name, int socket_id,
 		const struct rte_lpm_config *config)
 {
 	char mem_name[RTE_LPM_NAMESIZE];
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm *lpm = NULL;
 	struct rte_tailq_entry *te;
 	uint32_t mem_size, rules_size, tbl8s_size;
@@ -140,12 +153,6 @@ rte_lpm_create(const char *name, int socket_id,
 
 	snprintf(mem_name, sizeof(mem_name), "LPM_%s", name);
 
-	/* Determine the amount of memory to allocate. */
-	mem_size = sizeof(*lpm);
-	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
-	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
-			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
-
 	rte_mcfg_tailq_write_lock();
 
 	/* guarantee there's no existing */
@@ -161,6 +168,12 @@ rte_lpm_create(const char *name, int socket_id,
 		goto exit;
 	}
 
+	/* Determine the amount of memory to allocate. */
+	mem_size = sizeof(*internal_lpm);
+	rules_size = sizeof(struct rte_lpm_rule) * config->max_rules;
+	tbl8s_size = (sizeof(struct rte_lpm_tbl_entry) *
+			RTE_LPM_TBL8_GROUP_NUM_ENTRIES * config->number_tbl8s);
+
 	/* allocate tailq entry */
 	te = rte_zmalloc("LPM_TAILQ_ENTRY", sizeof(*te), 0);
 	if (te == NULL) {
@@ -170,21 +183,23 @@ rte_lpm_create(const char *name, int socket_id,
 	}
 
 	/* Allocate memory to store the LPM data structures. */
-	lpm = rte_zmalloc_socket(mem_name, mem_size,
+	internal_lpm = rte_zmalloc_socket(mem_name, mem_size,
 			RTE_CACHE_LINE_SIZE, socket_id);
-	if (lpm == NULL) {
+	if (internal_lpm == NULL) {
 		RTE_LOG(ERR, LPM, "LPM memory allocation failed\n");
 		rte_free(te);
 		rte_errno = ENOMEM;
 		goto exit;
 	}
 
+	lpm = &internal_lpm->lpm;
 	lpm->rules_tbl = rte_zmalloc_socket(NULL,
 			(size_t)rules_size, RTE_CACHE_LINE_SIZE, socket_id);
 
 	if (lpm->rules_tbl == NULL) {
 		RTE_LOG(ERR, LPM, "LPM rules_tbl memory allocation failed\n");
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -197,7 +212,8 @@ rte_lpm_create(const char *name, int socket_id,
 	if (lpm->tbl8 == NULL) {
 		RTE_LOG(ERR, LPM, "LPM tbl8 memory allocation failed\n");
 		rte_free(lpm->rules_tbl);
-		rte_free(lpm);
+		rte_free(internal_lpm);
+		internal_lpm = NULL;
 		lpm = NULL;
 		rte_free(te);
 		rte_errno = ENOMEM;
@@ -225,6 +241,7 @@ rte_lpm_create(const char *name, int socket_id,
 void
 rte_lpm_free(struct rte_lpm *lpm)
 {
+	struct __rte_lpm *internal_lpm;
 	struct rte_lpm_list *lpm_list;
 	struct rte_tailq_entry *te;
 
@@ -246,12 +263,84 @@ rte_lpm_free(struct rte_lpm *lpm)
 
 	rte_mcfg_tailq_write_unlock();
 
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->dq)
+		rte_rcu_qsbr_dq_delete(internal_lpm->dq);
 	rte_free(lpm->tbl8);
 	rte_free(lpm->rules_tbl);
 	rte_free(lpm);
 	rte_free(te);
 }
 
+static void
+__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n)
+{
+	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
+	uint32_t tbl8_group_index = *(uint32_t *)data;
+	struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8;
+
+	RTE_SET_USED(n);
+	/* Set tbl8 group invalid */
+	__atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry,
+		__ATOMIC_RELAXED);
+}
+
+/* Associate QSBR variable with an LPM object.
+ */
+int
+rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq)
+{
+	struct __rte_lpm *internal_lpm;
+	char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE];
+	struct rte_rcu_qsbr_dq_parameters params = {0};
+
+	if (lpm == NULL || cfg == NULL) {
+		rte_errno = EINVAL;
+		return 1;
+	}
+
+	internal_lpm = container_of(lpm, struct __rte_lpm, lpm);
+	if (internal_lpm->v != NULL) {
+		rte_errno = EEXIST;
+		return 1;
+	}
+
+	if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* No other things to do. */
+	} else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Init QSBR defer queue. */
+		snprintf(rcu_dq_name, sizeof(rcu_dq_name),
+				"LPM_RCU_%s", lpm->name);
+		params.name = rcu_dq_name;
+		params.size = cfg->dq_size;
+		if (params.size == 0)
+			params.size = lpm->number_tbl8s;
+		params.trigger_reclaim_limit = cfg->reclaim_thd;
+		params.max_reclaim_size = cfg->reclaim_max;
+		if (params.max_reclaim_size == 0)
+			params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX;
+		params.esize = sizeof(uint32_t);	/* tbl8 group index */
+		params.free_fn = __lpm_rcu_qsbr_free_resource;
+		params.p = lpm;
+		params.v = cfg->v;
+		internal_lpm->dq = rte_rcu_qsbr_dq_create(&params);
+		if (internal_lpm->dq == NULL) {
+			RTE_LOG(ERR, LPM, "LPM defer queue creation failed\n");
+			return 1;
+		}
+		if (dq)
+			*dq = internal_lpm->dq;
+	} else {
+		rte_errno = EINVAL;
+		return 1;
+	}
+	internal_lpm->rcu_mode = cfg->mode;
+	internal_lpm->v = cfg->v;
+
+	return 0;
+}
+
 /*
  * Adds a rule to the rule table.
  *
@@ -394,14 +483,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth)
  * Find, clean and allocate a tbl8.
  */
 static int32_t
-tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
+_tbl8_alloc(struct rte_lpm *lpm)
 {
 	uint32_t group_idx; /* tbl8 group index. */
 	struct rte_lpm_tbl_entry *tbl8_entry;
 
 	/* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */
-	for (group_idx = 0; group_idx < number_tbl8s; group_idx++) {
-		tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
+	for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) {
+		tbl8_entry = &lpm->tbl8[group_idx *
+					RTE_LPM_TBL8_GROUP_NUM_ENTRIES];
 		/* If a free tbl8 group is found clean it and set as VALID. */
 		if (!tbl8_entry->valid_group) {
 			struct rte_lpm_tbl_entry new_tbl8_entry = {
@@ -427,14 +517,47 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s)
 	return -ENOSPC;
 }
 
+static int32_t
+tbl8_alloc(struct rte_lpm *lpm)
+{
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
+	int32_t group_idx; /* tbl8 group index. */
+
+	group_idx = _tbl8_alloc(lpm);
+	if (group_idx == -ENOSPC && internal_lpm->dq != NULL) {
+		/* If there are no tbl8 groups try to reclaim one. */
+		if (rte_rcu_qsbr_dq_reclaim(internal_lpm->dq, 1,
+				NULL, NULL, NULL) == 0)
+			group_idx = _tbl8_alloc(lpm);
+	}
+
+	return group_idx;
+}
+
 static void
-tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start)
+tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start)
 {
-	/* Set tbl8 group invalid*/
+	struct __rte_lpm *internal_lpm = container_of(lpm,
+						struct __rte_lpm, lpm);
 	struct rte_lpm_tbl_entry zero_tbl8_entry = {0};
 
-	__atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry,
-			__ATOMIC_RELAXED);
+	if (internal_lpm->v == NULL) {
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) {
+		/* Wait for quiescent state change. */
+		rte_rcu_qsbr_synchronize(internal_lpm->v,
+			RTE_QSBR_THRID_INVALID);
+		/* Set tbl8 group invalid*/
+		__atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry,
+				__ATOMIC_RELAXED);
+	} else if (internal_lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) {
+		/* Push into QSBR defer queue. */
+		rte_rcu_qsbr_dq_enqueue(internal_lpm->dq,
+				(void *)&tbl8_group_start);
+	}
 }
 
 static __rte_noinline int32_t
@@ -523,7 +646,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 
 	if (!lpm->tbl24[tbl24_index].valid) {
 		/* Search for a free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		/* Check tbl8 allocation was successful. */
 		if (tbl8_group_index < 0) {
@@ -569,7 +692,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth,
 	} /* If valid entry but not extended calculate the index into Table8. */
 	else if (lpm->tbl24[tbl24_index].valid_group == 0) {
 		/* Search for free tbl8 group. */
-		tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s);
+		tbl8_group_index = tbl8_alloc(lpm);
 
 		if (tbl8_group_index < 0) {
 			return tbl8_group_index;
@@ -977,7 +1100,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		 */
 		lpm->tbl24[tbl24_index].valid = 0;
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	} else if (tbl8_recycle_index > -1) {
 		/* Update tbl24 entry. */
 		struct rte_lpm_tbl_entry new_tbl24_entry = {
@@ -993,7 +1116,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked,
 		__atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry,
 				__ATOMIC_RELAXED);
 		__atomic_thread_fence(__ATOMIC_RELEASE);
-		tbl8_free(lpm->tbl8, tbl8_group_start);
+		tbl8_free(lpm, tbl8_group_start);
 	}
 #undef group_idx
 	return 0;
diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
index b9d49ac87..a9568fcdd 100644
--- a/lib/librte_lpm/rte_lpm.h
+++ b/lib/librte_lpm/rte_lpm.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2010-2014 Intel Corporation
+ * Copyright(c) 2020 Arm Limited
  */
 
 #ifndef _RTE_LPM_H_
@@ -20,6 +21,7 @@
 #include <rte_memory.h>
 #include <rte_common.h>
 #include <rte_vect.h>
+#include <rte_rcu_qsbr.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -62,6 +64,17 @@ extern "C" {
 /** Bitmask used to indicate successful lookup */
 #define RTE_LPM_LOOKUP_SUCCESS          0x01000000
 
+/** @internal Default RCU defer queue entries to reclaim in one go. */
+#define RTE_LPM_RCU_DQ_RECLAIM_MAX	16
+
+/** RCU reclamation modes */
+enum rte_lpm_qsbr_mode {
+	/** Create defer queue for reclaim. */
+	RTE_LPM_QSBR_MODE_DQ = 0,
+	/** Use blocking mode reclaim. No defer queue created. */
+	RTE_LPM_QSBR_MODE_SYNC
+};
+
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 /** @internal Tbl24 entry structure. */
 __extension__
@@ -132,6 +145,22 @@ struct rte_lpm {
 	struct rte_lpm_rule *rules_tbl; /**< LPM rules. */
 };
 
+/** LPM RCU QSBR configuration structure. */
+struct rte_lpm_rcu_config {
+	struct rte_rcu_qsbr *v;	/* RCU QSBR variable. */
+	/* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx
+	 * '0' for default: create defer queue for reclaim.
+	 */
+	enum rte_lpm_qsbr_mode mode;
+	uint32_t dq_size;	/* RCU defer queue size.
+				 * default: lpm->number_tbl8s.
+				 */
+	uint32_t reclaim_thd;	/* Threshold to trigger auto reclaim. */
+	uint32_t reclaim_max;	/* Max entries to reclaim in one go.
+				 * default: RTE_LPM_RCU_DQ_RECLAIM_MAX.
+				 */
+};
+
 /**
  * Create an LPM object.
  *
@@ -179,6 +208,30 @@ rte_lpm_find_existing(const char *name);
 void
 rte_lpm_free(struct rte_lpm *lpm);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice
+ *
+ * Associate RCU QSBR variable with an LPM object.
+ *
+ * @param lpm
+ *   the lpm object to add RCU QSBR
+ * @param cfg
+ *   RCU QSBR configuration
+ * @param dq
+ *   handler of created RCU QSBR defer queue
+ * @return
+ *   On success - 0
+ *   On error - 1 with error code set in rte_errno.
+ *   Possible rte_errno codes are:
+ *   - EINVAL - invalid pointer
+ *   - EEXIST - already added QSBR
+ *   - ENOMEM - memory allocation failure
+ */
+__rte_experimental
+int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg,
+	struct rte_rcu_qsbr_dq **dq);
+
 /**
  * Add a rule to the LPM table.
  *
diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map
index 500f58b80..bfccd7eac 100644
--- a/lib/librte_lpm/rte_lpm_version.map
+++ b/lib/librte_lpm/rte_lpm_version.map
@@ -21,3 +21,9 @@ DPDK_20.0 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_lpm_rcu_qsbr_add;
+};
-- 
2.17.1


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
  2020-07-10  2:22  2%   ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
@ 2020-07-10  2:29  0%     ` Ruifeng Wang
  0 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2020-07-10  2:29 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson, Vladimir Medvedkin,
	John McNamara, Marko Kovacevic, Ray Kinsella, Neil Horman
  Cc: dev, konstantin.ananyev, Honnappa Nagarahalli, nd, nd

The ci/checkpatch warning is a false positive.

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Friday, July 10, 2020 10:22 AM
> To: Bruce Richardson <bruce.richardson@intel.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; John McNamara
> <john.mcnamara@intel.com>; Marko Kovacevic
> <marko.kovacevic@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Neil Horman
> <nhorman@tuxdriver.com>
> Cc: dev@dpdk.org; konstantin.ananyev@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Subject: [PATCH v10 1/3] lib/lpm: integrate RCU QSBR
> 
> Currently, the tbl8 group is freed even though the readers might be using the
> tbl8 group entries. The freed tbl8 group can be reallocated quickly. This
> results in incorrect lookup results.
> 
> RCU QSBR process is integrated for safe tbl8 group reclaim.
> Refer to RCU documentation to understand various aspects of integrating
> RCU library into other libraries.
> 
> To avoid ABI breakage, a struct __rte_lpm is created for lpm library internal
> use. This struct wraps rte_lpm that has been exposed and also includes
> members that don't need to be exposed such as RCU related config.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> ---
>  doc/guides/prog_guide/lpm_lib.rst  |  32 ++++++
>  lib/librte_lpm/Makefile            |   2 +-
>  lib/librte_lpm/meson.build         |   1 +
>  lib/librte_lpm/rte_lpm.c           | 165 +++++++++++++++++++++++++----
>  lib/librte_lpm/rte_lpm.h           |  53 +++++++++
>  lib/librte_lpm/rte_lpm_version.map |   6 ++
>  6 files changed, 237 insertions(+), 22 deletions(-)
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] eal: use c11 atomic built-ins for interrupt status
  2020-07-09 10:30  0%     ` David Marchand
@ 2020-07-10  7:18  3%       ` Dodji Seketeli
  0 siblings, 0 replies; 200+ results
From: Dodji Seketeli @ 2020-07-10  7:18 UTC (permalink / raw)
  To: David Marchand
  Cc: Phil Yang, Ray Kinsella, Harman Kalra, dev, stefan.puiu,
	Aaron Conole, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Neil Horman

David Marchand <david.marchand@redhat.com> writes:

[...]

>> --- a/devtools/libabigail.abignore
>> +++ b/devtools/libabigail.abignore
>> @@ -48,6 +48,10 @@
>>          changed_enumerators = RTE_CRYPTO_AEAD_LIST_END
>>  [suppress_variable]
>>          name = rte_crypto_aead_algorithm_strings
>> +; Ignore updates of epoll event
>> +[suppress_type]
>> +        type_kind = struct
>> +        name = rte_epoll_event
>
> In general, ignoring all changes on a structure is risky.
> But the risk is acceptable as long as we remember this for the rest of
> the 20.08 release (and we will start from scratch for 20.11).

Right, I thought about this too when I saw that change.  If that struct
is inherently *not* part of the logically exposed ABI, the risk is
really minimal as well.  In that case, maybe a comment saying so in the
.abignore file could be useful for future reference.

[...]

Cheers,

-- 
		Dodji


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
  2020-07-08 13:09  7% ` Kinsella, Ray
  2020-07-09 15:52  4% ` Dodji Seketeli
@ 2020-07-10  7:37  4% ` Kinsella, Ray
  2020-07-10 10:58  4% ` Neil Horman
  2020-07-15 12:15 25% ` [dpdk-dev] [PATCH v2] " David Marchand
  4 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2020-07-10  7:37 UTC (permalink / raw)
  To: David Marchand, dev; +Cc: thomas, dodji, Neil Horman, Aaron Conole



On 08/07/2020 11:22, David Marchand wrote:
> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  devtools/check-abi.sh | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> index e17fedbd9f..521e2cce7c 100755
> --- a/devtools/check-abi.sh
> +++ b/devtools/check-abi.sh
> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>  		error=1
>  		continue
>  	fi
> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> +		abiret=$?
>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>  		error=1
> -	fi
> +		echo
> +		if [ $(($abiret & 3)) != 0 ]; then
> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> +		fi
> +		if [ $(($abiret & 4)) != 0 ]; then
> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> +		fi
> +		if [ $(($abiret & 8)) != 0 ]; then
> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> +		fi
> +		echo
> +	}
>  done
>  
>  [ -z "$error" ] || [ -n "$warnonly" ]
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
                   ` (2 preceding siblings ...)
  2020-07-10  7:37  4% ` Kinsella, Ray
@ 2020-07-10 10:58  4% ` Neil Horman
  2020-07-15 12:15 25% ` [dpdk-dev] [PATCH v2] " David Marchand
  4 siblings, 0 replies; 200+ results
From: Neil Horman @ 2020-07-10 10:58 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, dodji, Ray Kinsella

On Wed, Jul 08, 2020 at 12:22:12PM +0200, David Marchand wrote:
> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
>  devtools/check-abi.sh | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
> index e17fedbd9f..521e2cce7c 100755
> --- a/devtools/check-abi.sh
> +++ b/devtools/check-abi.sh
> @@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
>  		error=1
>  		continue
>  	fi
> -	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
> +	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
> +		abiret=$?
>  		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
>  		error=1
> -	fi
> +		echo
> +		if [ $(($abiret & 3)) != 0 ]; then
> +			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, please report this to dev@dpdk.org."
> +		fi
> +		if [ $(($abiret & 4)) != 0 ]; then
> +			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
> +		fi
> +		if [ $(($abiret & 8)) != 0 ]; then
> +			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
> +		fi
> +		echo
> +	}
>  done
>  
>  [ -z "$error" ] || [ -n "$warnonly" ]
> -- 
> 2.23.0
> 
> 
this looks pretty reasonable to me, sure.
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-09 23:47  0%   ` Ferruh Yigit
@ 2020-07-10 12:32  0%     ` Slava Ovsiienko
  0 siblings, 0 replies; 200+ results
From: Slava Ovsiienko @ 2020-07-10 12:32 UTC (permalink / raw)
  To: Ferruh Yigit, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, bernard.iremonger,
	thomas, Andrew Rybchenko

Hi, Ferruh


Thanks a lot for the review.

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Friday, July 10, 2020 2:47
> To: Slava Ovsiienko <viacheslavo@mellanox.com>; dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> bernard.iremonger@intel.com; thomas@monjalon.com; Andrew Rybchenko
> <arybchenko@solarflare.com>
> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> On 7/9/2020 1:36 PM, Viacheslav Ovsiienko wrote:
> > There is the requirement on some networks for precise traffic timing
> > management. The ability to send (and, generally speaking, receive) the
> > packets at the very precisely specified moment of time provides the
> > opportunity to support the connections with Time Division Multiplexing
> > using the contemporary general purpose NIC without involving an
> > auxiliary hardware. For example, the supporting of O-RAN Fronthaul
> > interface is one of the promising features for potentially usage of
> > the precise time management for the egress packets.
> 
> Is this a HW support, or is the scheduling planned to be done in the driver?
Yes, mlx5 PMD feature v1 is sent: http://patches.dpdk.org/patch/73714/

> 
> >
> > The main objective of this RFC is to specify the way how applications
> 
> It is no more RFC.
Oops, miscopy. Thanks.

> 
> > can provide the moment of time at what the packet transmission must be
> > started and to describe in preliminary the supporting this feature
> > from
> > mlx5 PMD side.
> 
> I was about the ask this, will there be a PMD counterpart implementation of
> the feature? It would be better to have it as part of this set.
> What is the plan for the PMD implementation?
Please, see above.
> 
> >
> > The new dynamic timestamp field is proposed, it provides some timing
> > information, the units and time references (initial phase) are not
> > explicitly defined but are maintained always the same for a given port.
> > Some devices allow to query rte_eth_read_clock() that will return the
> > current device timestamp. The dynamic timestamp flag tells whether the
> > field contains actual timestamp value. For the packets being sent this
> > value can be used by PMD to schedule packet sending.
> >
> > The device clock is opaque entity, the units and frequency are vendor
> > specific and might depend on hardware capabilities and configurations.
> > If might (or not) be synchronized with real time via PTP, might (or
> > not) be synchronous with CPU clock (for example if NIC and CPU share
> > the same clock source there might be no any drift between the NIC and
> > CPU clocks), etc.
> >
> > After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation and
> > obsoleting, these dynamic flag and field will be used to manage the
> > timestamps on receiving datapath as well. Having the dedicated flags
> > for Rx/Tx timestamps allows applications not to perform explicit flags
> > reset on forwarding and not to promote received timestamps to the
> > transmitting datapath by default. The static PKT_RX_TIMESTAMP is
> > considered as candidate to become the dynamic flag.
> 
> Is there a deprecation notice for 'PKT_RX_TIMESTAMP'? Is this decided?
No, we are going to discuss that, the Rx timestamp is a good candidate to be
moved out from the first mbuf cacheline to the dynamic field.
There are good chances we will deprecate fixed Rx timestamp flag/field,
that's why we'd prefer not to rely on ones anymore.

> 
> >
> > When PMD sees the "rte_dynfield_timestamp" set on the packet being
> > sent it tries to synchronize the time of packet appearing on the wire
> > with the specified packet timestamp. If the specified one is in the
> > past it should be ignored, if one is in the distant future it should
> > be capped with some reasonable value (in range of seconds). These
> > specific cases ("too late" and "distant future") can be optionally
> > reported via device xstats to assist applications to detect the
> > time-related problems.
> >
> > There is no any packet reordering according timestamps is supposed,
> > neither within packet burst, nor between packets, it is an entirely
> > application responsibility to generate packets and its timestamps in
> > desired order. The timestamps can be put only in the first packet in
> > the burst providing the entire burst scheduling.
> >
> > PMD reports the ability to synchronize packet sending on timestamp
> > with new offload flag:
> >
> > This is palliative and is going to be replaced with new eth_dev API
> > about reporting/managing the supported dynamic flags and its related
> > features. This API would break ABI compatibility and can't be
> > introduced at the moment, so is postponed to 20.11.
> 
> Good to hear that there will be a generic API to get supported dynamic flags.
> I was concerned about adding 'DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP'
> flag, since not sure if there will be any other PMD that will want to use it.
> The trouble is it is hard to remove a public macro after it is introduced, in this
> release I think only single PMD (mlx) will support this feature, and in next
> release the plan is to remove the macro. In this case what do you think to
> not introduce the flag at all?

Currently no other way to report/control the port caps/cfg, but these xx_OFFLOAD_xx flags.
If the new side-channel API to report/control very specific PMD caps is introduced
this should be consistent with OFFLOAD flags, ie., if cap is disabled via the new API
it will be reflected in OFFLOAD flags either. The new API is questionable, the OFFLOAD
flags is not scarce resource, the offload field can be extended and we are still far
from exhausting the existing one. So, I replaced the "will" with "might" in commit
message. Not sure we should remove this flag, we can keep this consistent.

> 
> >
> > For testing purposes it is proposed to update testpmd "txonly"
> > forwarding mode routine. With this update testpmd application
> > generates the packets and sets the dynamic timestamps according to
> > specified time pattern if it sees the "rte_dynfield_timestamp" is registered.
> >
> > The new testpmd command is proposed to configure sending pattern:
> >
> > set tx_times <burst_gap>,<intra_gap>
> >
> > <intra_gap> - the delay between the packets within the burst
> >               specified in the device clock units. The number
> >               of packets in the burst is defined by txburst parameter
> >
> > <burst_gap> - the delay between the bursts in the device clock units
> >
> > As the result the bursts of packet will be transmitted with specific
> > delays between the packets within the burst and specific delay between
> > the bursts. The rte_eth_get_clock is supposed to be engaged to get the
> 
> 'rte_eth_read_clock()'?
Sure, my bad.

> 
> > current device clock value and provide the reference for the timestamps.
> >
> > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> >
> > ---
> >   v1->v4:
> >      - dedicated dynamic Tx timestamp flag instead of shared with Rx
> >   v4->v5:
> >      - elaborated commit message
> >      - more words about device clocks added,
> >      - note about dedicated Rx/Tx timestamp flags added
> >   v5->v6:
> >      - release notes are updated
> > ---
> >  doc/guides/rel_notes/release_20_08.rst |  6 ++++++
> >  lib/librte_ethdev/rte_ethdev.c         |  1 +
> >  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
> >  lib/librte_mbuf/rte_mbuf_dyn.h         | 31
> +++++++++++++++++++++++++++++++
> >  4 files changed, 42 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_20_08.rst
> > b/doc/guides/rel_notes/release_20_08.rst
> > index 988474c..5527bab 100644
> > --- a/doc/guides/rel_notes/release_20_08.rst
> > +++ b/doc/guides/rel_notes/release_20_08.rst
> > @@ -200,6 +200,12 @@ New Features
> >    See the :doc:`../sample_app_ug/l2_forward_real_virtual` for more
> >    details of this parameter usage.
> >
> > +* **Introduced send packet scheduling on the timestamps.**
> > +
> > +  Added the new mbuf dynamic field and flag to provide timestamp on
> > + what packet  transmitting can be synchronized. The device Tx offload
> > + flag is added to  indicate the PMD supports send scheduling.
> > +
> 
> This is a core library change, can go up in the section, please check the
> section comment for the ordering details.
> 
Done.

> >
> >  Removed Items
> >  -------------
> > diff --git a/lib/librte_ethdev/rte_ethdev.c
> > b/lib/librte_ethdev/rte_ethdev.c index 7022bd7..c48ca2a 100644
> > --- a/lib/librte_ethdev/rte_ethdev.c
> > +++ b/lib/librte_ethdev/rte_ethdev.c
> > @@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
> >  	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
> >  	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
> > +	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
> >  };
> >
> >  #undef RTE_TX_OFFLOAD_BIT2STR
> > diff --git a/lib/librte_ethdev/rte_ethdev.h
> > b/lib/librte_ethdev/rte_ethdev.h index 631b146..97313a0 100644
> > --- a/lib/librte_ethdev/rte_ethdev.h
> > +++ b/lib/librte_ethdev/rte_ethdev.h
> > @@ -1178,6 +1178,10 @@ struct rte_eth_conf {
> >  /** Device supports outer UDP checksum */  #define
> > DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000
> >
> > +/** Device supports send on timestamp */ #define
> > +DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
> 
> Please cc the ethdev maintainers.
> 
> As mentioned above my concern is if this is generic enough or are we adding
> a flag to a specific PMD? And since commit log says this is temporary
> solution for just this release, I repeat my question if we can remove the flag
> completely?
Will remove "temporary", replace with might. And now I do not think this flag
will be actually removed. As this feature development proved - it is on right place
and is easy to use by PMD and application in  standardized (for offloads)way.

With best regards, Slava


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
                     ` (6 preceding siblings ...)
  2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
@ 2020-07-10 12:39  2% ` Viacheslav Ovsiienko
  2020-07-10 15:46  0%   ` Slava Ovsiienko
  7 siblings, 1 reply; 200+ results
From: Viacheslav Ovsiienko @ 2020-07-10 12:39 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, olivier.matz, arybchenko, thomas, ferruh.yigit

There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this patchset is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature
from mlx5 PMD side [1].

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
deprecation and obsoleting, these dynamic flag and field might be
used to manage the timestamps on receiving datapath as well. Having
the dedicated flags for Rx/Tx timestamps allows applications not
to perform explicit flags reset on forwarding and not to promote
received timestamps to the transmitting datapath by default.
The static PKT_RX_TIMESTAMP is considered as candidate to become
the dynamic flag and this move should be discussed.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and might be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_read_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

[1] http://patches.dpdk.org/patch/73714/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
  v1->v4:
     - dedicated dynamic Tx timestamp flag instead of shared with Rx
  v4->v5:
     - elaborated commit message
     - more words about device clocks added,
     - note about dedicated Rx/Tx timestamp flags added
  v5->v6:
     - release notes are updated
  v6->v7:
     - commit message is updated
     - testpmd checks the supported offloads before registering
       dynamic timestamp flag/field
---
 doc/guides/rel_notes/release_20_08.rst |  7 +++++++
 lib/librte_ethdev/rte_ethdev.c         |  1 +
 lib/librte_ethdev/rte_ethdev.h         |  4 ++++
 lib/librte_mbuf/rte_mbuf_dyn.h         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index 988474c..bdea389 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -81,6 +81,13 @@ New Features
   Added the RegEx library which provides an API for offload of regular
   expressions search operations to hardware or software accelerator devices.

+
+* **Introduced send packet scheduling on the timestamps.**
+
+   Added the new mbuf dynamic field and flag to provide timestamp on what packet
+   transmitting can be synchronized. The device Tx offload flag is added to
+   indicate the PMD supports send scheduling.
+
 * **Updated PCAP driver.**

   Updated PCAP driver with new features and improvements, including:
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 7022bd7..c48ca2a 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -160,6 +160,7 @@ struct rte_eth_xstats_name_off {
 	RTE_TX_OFFLOAD_BIT2STR(UDP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(IP_TNL_TSO),
 	RTE_TX_OFFLOAD_BIT2STR(OUTER_UDP_CKSUM),
+	RTE_TX_OFFLOAD_BIT2STR(SEND_ON_TIMESTAMP),
 };

 #undef RTE_TX_OFFLOAD_BIT2STR
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 631b146..97313a0 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1178,6 +1178,10 @@ struct rte_eth_conf {
 /** Device supports outer UDP checksum */
 #define DEV_TX_OFFLOAD_OUTER_UDP_CKSUM  0x00100000

+/** Device supports send on timestamp */
+#define DEV_TX_OFFLOAD_SEND_ON_TIMESTAMP 0x00200000
+
+
 #define RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP 0x00000001
 /**< Device supports Rx queue setup after device started*/
 #define RTE_ETH_DEV_CAPA_RUNTIME_TX_QUEUE_SETUP 0x00000002
diff --git a/lib/librte_mbuf/rte_mbuf_dyn.h b/lib/librte_mbuf/rte_mbuf_dyn.h
index 96c3631..8407230 100644
--- a/lib/librte_mbuf/rte_mbuf_dyn.h
+++ b/lib/librte_mbuf/rte_mbuf_dyn.h
@@ -250,4 +250,35 @@ int rte_mbuf_dynflag_lookup(const char *name,
 #define RTE_MBUF_DYNFIELD_METADATA_NAME "rte_flow_dynfield_metadata"
 #define RTE_MBUF_DYNFLAG_METADATA_NAME "rte_flow_dynflag_metadata"

+/**
+ * The timestamp dynamic field provides some timing information, the
+ * units and time references (initial phase) are not explicitly defined
+ * but are maintained always the same for a given port. Some devices allow
+ * to query rte_eth_read_clock() that will return the current device
+ * timestamp. The dynamic Tx timestamp flag tells whether the field contains
+ * actual timestamp value for the packets being sent, this value can be
+ * used by PMD to schedule packet sending.
+ *
+ * After PKT_RX_TIMESTAMP flag and fixed timestamp field deprecation
+ * and obsoleting, the dedicated Rx timestamp flag is supposed to be
+ * introduced and the shared dynamic timestamp field will be used
+ * to handle the timestamps on receiving datapath as well.
+ */
+#define RTE_MBUF_DYNFIELD_TIMESTAMP_NAME "rte_dynfield_timestamp"
+
+/**
+ * When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME flag set on the
+ * packet being sent it tries to synchronize the time of packet appearing
+ * on the wire with the specified packet timestamp. If the specified one
+ * is in the past it should be ignored, if one is in the distant future
+ * it should be capped with some reasonable value (in range of seconds).
+ *
+ * There is no any packet reordering according to timestamps is supposed,
+ * neither for packet within the burst, nor for the whole bursts, it is
+ * an entirely application responsibility to generate packets and its
+ * timestamps in desired order. The timestamps might be put only in
+ * the first packet in the burst providing the entire burst scheduling.
+ */
+#define RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME "rte_dynflag_tx_timestamp"
+
 #endif
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev
  @ 2020-07-10 14:20  0%   ` Thomas Monjalon
  2020-07-10 16:17  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2020-07-10 14:20 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, David Marchand,
	Andrew Rybchenko, Kinsella, Ray

26/06/2020 10:49, Kinsella, Ray:
> On 23/06/2020 14:49, Ferruh Yigit wrote:
> > The APIs are marked in the doxygen comment but better to mark the
> > symbols too. This is planned for v20.11 release.
> > 
> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > ---
> > +* ethdev: Some internal APIs for driver usage are exported in the .map file.
> > +  Now DPDK has ``__rte_internal`` marker so we can mark internal APIs and move
> > +  them to the INTERNAL block in .map. Although these APIs are internal it will
> > +  break the ABI checks, that is why change is planned for 20.11.
> > +  The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
> > +
> 
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> 
> A bunch of other folks have already annotated "internal" APIs, and added entries to 
> libabigail.abignore to suppress warnings. If you are 100% certain these are never used 
> by end applications, you could do likewise.
> 
> That said, depreciation notice and completing in 20.11 is definitely the better approach. 
> See https://git.dpdk.org/dpdk/tree/devtools/libabigail.abignore#n53

I agree we can wait 20.11.

Acked-by: Thomas Monjalon <thomas@monjalon.net>



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 00/10] rename blacklist/whitelist to block/allow
  @ 2020-07-10 15:06  3% ` David Marchand
  2020-07-14  4:43  0%   ` Stephen Hemminger
    1 sibling, 1 reply; 200+ results
From: David Marchand @ 2020-07-10 15:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, techboard, Luca Boccassi, Mcnamara, John

On Sat, Jun 13, 2020 at 2:01 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> The terms blacklist and whitelist are often seen as reminders
> of the divisions in society. Instead, use more exact terms for
> handling of which devices are used in DPDK.
>
> This is a proposed change for DPDK 20.08 to replace the names
> blacklist and whitelist in API and command lines.
>
> The first three patches fix some other unnecessary use of
> blacklist/whitelist and have no user visible impact.
>
> The rest change the PCI blacklist to be blocklist and
> whitelist to be allowlist.

Thanks for working on this.

I agree, the first patches can go in right now.

But I have some concerns about the rest.

New options in EAL are not consistent with "allow"/"block" list:
+    "b:" /* pci-skip-probe */
+    "w:" /* pci-only-probe */
+#define OPT_PCI_SKIP_PROBE     "pci-skip-probe"
+    OPT_PCI_SKIP_PROBE_NUM  = 'b',
+#define OPT_PCI_ONLY_PROBE     "pci-only-probe"
+    OPT_PCI_ONLY_PROBE_NUM  = 'w',

The CI flagged the series as failing, because the unit test for EAL
flags is unaligned:
+#define pci_allowlist "--pci-allowlist"
https://travis-ci.com/github/ovsrobot/dpdk/jobs/348668299#L5657


The ABI check complains about the enum update:
https://travis-ci.com/github/ovsrobot/dpdk/jobs/348668301#L2400
Either we deal with this, or we need a libabigail exception rule.


About deprecating existing API/EAL flags in this release, this should
go through the standard deprecation process.
I would go with introducing new options + full compatibility + a
deprecation notice in the 20.08 release.
The actual deprecation/API flagging will go in 20.11.
Removal will come later.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] lib/librte_timer:fix corruption with reset
  @ 2020-07-10 15:19  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-10 15:19 UTC (permalink / raw)
  To: Sarosh Arif; +Cc: rsanford, erik.g.carrillo, dev, stable, h.mikita89

On Fri, 10 Jul 2020 11:59:54 +0500
Sarosh Arif <sarosh.arif@emumba.com> wrote:

> If the user tries to reset/stop some other timer in it's callback
> function, which is also about to expire, using 
> rte_timer_reset_sync/rte_timer_stop_sync the application goes into
> an infinite loop. This happens because 
> rte_timer_reset_sync/rte_timer_stop_sync loop until the timer 
> resets/stops and there is check inside timer_set_config_state which
> prevents a running timer from being reset/stopped by not it's own 
> timer_cb. Therefore timer_set_config_state returns -1 due to which 
> rte_timer_reset returns -1 and rte_timer_reset_sync goes into an 
> infinite loop. 
> 
> The soloution to this problem is to return -1 from 
> rte_timer_reset_sync/rte_timer_stop_sync in case the user tries to 
> reset/stop some other timer in it's callback function.
> 
> Bugzilla ID: 491
> Fixes: 20d159f20543 ("timer: fix corruption with reset")
> Cc: h.mikita89@gmail.com
> Signed-off-by: Sarosh Arif <sarosh.arif@emumba.com>
> ---
> v2: remove line continuations
> v3: separate code and declarations

If you want to change the return value, you need to go through the steps
in the API/ABI policy. Maybe even symbol versioning.

Sorry, I know it is painful but we committed to the rules.
And changing the return value can never go to stable.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-10 12:39  2% ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
@ 2020-07-10 15:46  0%   ` Slava Ovsiienko
  2020-07-10 22:07  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Slava Ovsiienko @ 2020-07-10 15:46 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, arybchenko,
	Thomas Monjalon, ferruh.yigit



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Friday, July 10, 2020 15:40
> To: dev@dpdk.org
> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
> <rasland@mellanox.com>; olivier.matz@6wind.com;
> arybchenko@solarflare.com; Thomas Monjalon <thomas@monjalon.net>;
> ferruh.yigit@intel.com
> Subject: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx
> scheduling
> 
> There is the requirement on some networks for precise traffic timing
> management. The ability to send (and, generally speaking, receive) the
> packets at the very precisely specified moment of time provides the
> opportunity to support the connections with Time Division Multiplexing using
> the contemporary general purpose NIC without involving an auxiliary
> hardware. For example, the supporting of O-RAN Fronthaul interface is one
> of the promising features for potentially usage of the precise time
> management for the egress packets.
> 
> The main objective of this patchset is to specify the way how applications
> can provide the moment of time at what the packet transmission must be
> started and to describe in preliminary the supporting this feature from mlx5
> PMD side [1].
> 
> The new dynamic timestamp field is proposed, it provides some timing
> information, the units and time references (initial phase) are not explicitly
> defined but are maintained always the same for a given port.
> Some devices allow to query rte_eth_read_clock() that will return the current
> device timestamp. The dynamic timestamp flag tells whether the field
> contains actual timestamp value. For the packets being sent this value can be
> used by PMD to schedule packet sending.
> 
> The device clock is opaque entity, the units and frequency are vendor specific
> and might depend on hardware capabilities and configurations. If might (or
> not) be synchronized with real time via PTP, might (or not) be synchronous
> with CPU clock (for example if NIC and CPU share the same clock source
> there might be no any drift between the NIC and CPU clocks), etc.
> 
> After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
> deprecation and obsoleting, these dynamic flag and field might be used to
> manage the timestamps on receiving datapath as well. Having the dedicated
> flags for Rx/Tx timestamps allows applications not to perform explicit flags
> reset on forwarding and not to promote received timestamps to the
> transmitting datapath by default.
> The static PKT_RX_TIMESTAMP is considered as candidate to become the
> dynamic flag and this move should be discussed.
> 
> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
> tries to synchronize the time of packet appearing on the wire with the
> specified packet timestamp. If the specified one is in the past it should be
> ignored, if one is in the distant future it should be capped with some
> reasonable value (in range of seconds). These specific cases ("too late" and
> "distant future") can be optionally reported via device xstats to assist
> applications to detect the time-related problems.
> 
> There is no any packet reordering according timestamps is supposed, neither
> within packet burst, nor between packets, it is an entirely application
> responsibility to generate packets and its timestamps in desired order. The
> timestamps can be put only in the first packet in the burst providing the
> entire burst scheduling.
> 
> PMD reports the ability to synchronize packet sending on timestamp with
> new offload flag:
> 
> This is palliative and might be replaced with new eth_dev API about
> reporting/managing the supported dynamic flags and its related features.
> This API would break ABI compatibility and can't be introduced at the
> moment, so is postponed to 20.11.
> 
> For testing purposes it is proposed to update testpmd "txonly"
> forwarding mode routine. With this update testpmd application generates
> the packets and sets the dynamic timestamps according to specified time
> pattern if it sees the "rte_dynfield_timestamp" is registered.
> 
> The new testpmd command is proposed to configure sending pattern:
> 
> set tx_times <burst_gap>,<intra_gap>
> 
> <intra_gap> - the delay between the packets within the burst
>               specified in the device clock units. The number
>               of packets in the burst is defined by txburst parameter
> 
> <burst_gap> - the delay between the bursts in the device clock units
> 
> As the result the bursts of packet will be transmitted with specific delays
> between the packets within the burst and specific delay between the bursts.
> The rte_eth_read_clock is supposed to be engaged to get the current device
> clock value and provide the reference for the timestamps.
> 
> [1]
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
> s.dpdk.org%2Fpatch%2F73714%2F&amp;data=02%7C01%7Cviacheslavo%40
> mellanox.com%7C810609c61c3b466e8f5a08d824ce57f8%7Ca652971c7d2e4
> d9ba6a4d149256f461b%7C0%7C0%7C637299815958194092&amp;sdata=H
> D7efBGOLuYxHd5KLJYJj7RSbiLRVBNm5jdq%2FJv%2FXfk%3D&amp;reserved=
> 0
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

 promote Acked-bt from previous patch version to maintain patchwork 
 status accordingly

Acked-by: Olivier Matz <olivier.matz@6wind.com>

> 
> ---
>   v1->v4:
>      - dedicated dynamic Tx timestamp flag instead of shared with Rx
>   v4->v5:
>      - elaborated commit message
>      - more words about device clocks added,
>      - note about dedicated Rx/Tx timestamp flags added
>   v5->v6:
>      - release notes are updated
>   v6->v7:
>      - commit message is updated
>      - testpmd checks the supported offloads before registering
>        dynamic timestamp flag/field
> ---
>  doc/guides/rel_notes/release_20_08.rst |  7 +++++++
>  lib/librte_ethdev/rte_ethdev.c         |  1 +
>  lib/librte_ethdev/rte_ethdev.h         |  4 ++++
>  lib/librte_mbuf/rte_mbuf_dyn.h         | 31
> +++++++++++++++++++++++++++++++
>  4 files changed, 43 insertions(+)
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev
  2020-07-10 14:20  0%   ` Thomas Monjalon
@ 2020-07-10 16:17  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-10 16:17 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, David Marchand,
	Andrew Rybchenko, Kinsella, Ray

On 7/10/2020 3:20 PM, Thomas Monjalon wrote:
> 26/06/2020 10:49, Kinsella, Ray:
>> On 23/06/2020 14:49, Ferruh Yigit wrote:
>>> The APIs are marked in the doxygen comment but better to mark the
>>> symbols too. This is planned for v20.11 release.
>>>
>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>> ---
>>> +* ethdev: Some internal APIs for driver usage are exported in the .map file.
>>> +  Now DPDK has ``__rte_internal`` marker so we can mark internal APIs and move
>>> +  them to the INTERNAL block in .map. Although these APIs are internal it will
>>> +  break the ABI checks, that is why change is planned for 20.11.
>>> +  The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
>>> +
>>
>> Acked-by: Ray Kinsella <mdr@ashroe.eu>
>>
>> A bunch of other folks have already annotated "internal" APIs, and added entries to 
>> libabigail.abignore to suppress warnings. If you are 100% certain these are never used 
>> by end applications, you could do likewise.
>>
>> That said, depreciation notice and completing in 20.11 is definitely the better approach. 
>> See https://git.dpdk.org/dpdk/tree/devtools/libabigail.abignore#n53
> 
> I agree we can wait 20.11.
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx scheduling
  2020-07-10 15:46  0%   ` Slava Ovsiienko
@ 2020-07-10 22:07  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2020-07-10 22:07 UTC (permalink / raw)
  To: Slava Ovsiienko, dev
  Cc: Matan Azrad, Raslan Darawsheh, olivier.matz, arybchenko, Thomas Monjalon

On 7/10/2020 4:46 PM, Slava Ovsiienko wrote:
> 
> 
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
>> Sent: Friday, July 10, 2020 15:40
>> To: dev@dpdk.org
>> Cc: Matan Azrad <matan@mellanox.com>; Raslan Darawsheh
>> <rasland@mellanox.com>; olivier.matz@6wind.com;
>> arybchenko@solarflare.com; Thomas Monjalon <thomas@monjalon.net>;
>> ferruh.yigit@intel.com
>> Subject: [dpdk-dev] [PATCH v7 1/2] mbuf: introduce accurate packet Tx
>> scheduling
>>
>> There is the requirement on some networks for precise traffic timing
>> management. The ability to send (and, generally speaking, receive) the
>> packets at the very precisely specified moment of time provides the
>> opportunity to support the connections with Time Division Multiplexing using
>> the contemporary general purpose NIC without involving an auxiliary
>> hardware. For example, the supporting of O-RAN Fronthaul interface is one
>> of the promising features for potentially usage of the precise time
>> management for the egress packets.
>>
>> The main objective of this patchset is to specify the way how applications
>> can provide the moment of time at what the packet transmission must be
>> started and to describe in preliminary the supporting this feature from mlx5
>> PMD side [1].
>>
>> The new dynamic timestamp field is proposed, it provides some timing
>> information, the units and time references (initial phase) are not explicitly
>> defined but are maintained always the same for a given port.
>> Some devices allow to query rte_eth_read_clock() that will return the current
>> device timestamp. The dynamic timestamp flag tells whether the field
>> contains actual timestamp value. For the packets being sent this value can be
>> used by PMD to schedule packet sending.
>>
>> The device clock is opaque entity, the units and frequency are vendor specific
>> and might depend on hardware capabilities and configurations. If might (or
>> not) be synchronized with real time via PTP, might (or not) be synchronous
>> with CPU clock (for example if NIC and CPU share the same clock source
>> there might be no any drift between the NIC and CPU clocks), etc.
>>
>> After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
>> deprecation and obsoleting, these dynamic flag and field might be used to
>> manage the timestamps on receiving datapath as well. Having the dedicated
>> flags for Rx/Tx timestamps allows applications not to perform explicit flags
>> reset on forwarding and not to promote received timestamps to the
>> transmitting datapath by default.
>> The static PKT_RX_TIMESTAMP is considered as candidate to become the
>> dynamic flag and this move should be discussed.
>>
>> When PMD sees the "rte_dynfield_timestamp" set on the packet being sent it
>> tries to synchronize the time of packet appearing on the wire with the
>> specified packet timestamp. If the specified one is in the past it should be
>> ignored, if one is in the distant future it should be capped with some
>> reasonable value (in range of seconds). These specific cases ("too late" and
>> "distant future") can be optionally reported via device xstats to assist
>> applications to detect the time-related problems.
>>
>> There is no any packet reordering according timestamps is supposed, neither
>> within packet burst, nor between packets, it is an entirely application
>> responsibility to generate packets and its timestamps in desired order. The
>> timestamps can be put only in the first packet in the burst providing the
>> entire burst scheduling.
>>
>> PMD reports the ability to synchronize packet sending on timestamp with
>> new offload flag:
>>
>> This is palliative and might be replaced with new eth_dev API about
>> reporting/managing the supported dynamic flags and its related features.
>> This API would break ABI compatibility and can't be introduced at the
>> moment, so is postponed to 20.11.
>>
>> For testing purposes it is proposed to update testpmd "txonly"
>> forwarding mode routine. With this update testpmd application generates
>> the packets and sets the dynamic timestamps according to specified time
>> pattern if it sees the "rte_dynfield_timestamp" is registered.
>>
>> The new testpmd command is proposed to configure sending pattern:
>>
>> set tx_times <burst_gap>,<intra_gap>
>>
>> <intra_gap> - the delay between the packets within the burst
>>               specified in the device clock units. The number
>>               of packets in the burst is defined by txburst parameter
>>
>> <burst_gap> - the delay between the bursts in the device clock units
>>
>> As the result the bursts of packet will be transmitted with specific delays
>> between the packets within the burst and specific delay between the bursts.
>> The rte_eth_read_clock is supposed to be engaged to get the current device
>> clock value and provide the reference for the timestamps.
>>
>> [1]
>> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatche
>> s.dpdk.org%2Fpatch%2F73714%2F&amp;data=02%7C01%7Cviacheslavo%40
>> mellanox.com%7C810609c61c3b466e8f5a08d824ce57f8%7Ca652971c7d2e4
>> d9ba6a4d149256f461b%7C0%7C0%7C637299815958194092&amp;sdata=H
>> D7efBGOLuYxHd5KLJYJj7RSbiLRVBNm5jdq%2FJv%2FXfk%3D&amp;reserved=
>> 0
>>
>> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
>  promote Acked-bt from previous patch version to maintain patchwork 
>  status accordingly
> 
> Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 

For series,
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API
  @ 2020-07-12 13:17  3%         ` Olivier Matz
  2020-07-12 14:28  0%           ` Bing Zhao
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-12 13:17 UTC (permalink / raw)
  To: Bing Zhao
  Cc: Ori Kam, john.mcnamara, marko.kovacevic, Thomas Monjalon,
	ferruh.yigit, arybchenko, akhil.goyal, dev, wenzhuo.lu,
	beilei.xing, bernard.iremonger

Hi Bing,

On Sat, Jul 11, 2020 at 04:25:49AM +0000, Bing Zhao wrote:
> Hi Olivier,
> Many thanks for your comments.

[...]

> > > +/**
> > > + * eCPRI Common Header
> > > + */
> > > +RTE_STD_C11
> > > +struct rte_ecpri_common_hdr {
> > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > +	uint32_t size:16;		/**< Payload Size */
> > > +	uint32_t type:8;		/**< Message Type */
> > > +	uint32_t c:1;			/**< Concatenation Indicator
> > */
> > > +	uint32_t res:3;			/**< Reserved */
> > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > +	uint32_t res:3;			/**< Reserved */
> > > +	uint32_t c:1;			/**< Concatenation Indicator
> > */
> > > +	uint32_t type:8;		/**< Message Type */
> > > +	uint32_t size:16;		/**< Payload Size */
> > > +#endif
> > > +} __rte_packed;
> > 
> > Does it really need to be packed? Why next types do not need it?
> > It looks only those which have bitfields are.
> > 
> 
> Nice catch, thanks. For the common header, there is no need to use
> the packed attribute, because it is a u32 and the bitfields will be
> aligned.
> I checked all the definitions again. Only " Type #4: Remote Memory Access"
> needs to use the packed attribute.
> For other sub-types, "sub-header" part of the message payload will get
> aligned by nature. For example, u16 after u16, u8 after u16, these should
> be OK.
> But in type #4, the address is 48bits wide, with 16bits MSB and 32bits LSB (no
> detailed description in the specification, correct me if anything wrong.) Usually,
> the 48bits address will be devided as this in a system. And there is no 48-bits
> type at all. So we need to define two parts for it: 32b LSB follows 16b MSB.
> u32 after u16 should be with packed attribute. Thanks

What about using a bitfield into a uint64_t ? I mean:

	struct rte_ecpri_msg_rm_access {
if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
		...
		uint64_t length:16;		/**< number of bytes */
		uint64_t addr:48;		/**< address */
#else
		...
		uint64_t addr:48;		/**< address */
		uint64_t length:16;		/**< number of bytes */
#endif
	};


> 
> > 
> > I wonder if the 'dw0' could be in this definition instead of in struct
> > rte_ecpri_msg_hdr?
> > 
> > Something like this:
> > 
> > struct rte_ecpri_common_hdr {
> > 	union {
> > 		uint32_t u32;
> > 		struct {
> > 			...
> > 		};
> > 	};
> > };
> > 
> > I see 2 advantages:
> > 
> > - someone that only uses the common_hdr struct can use the .u32
> >   in its software
> > - when using it in messages, it looks clearer to me:
> >     msg.common_hdr.u32 = value;
> >   instead of:
> >     msg.dw0 = value;
> > 
> > What do you think?
> 
> Thanks for the suggestion, this is much better, I will change it.
> Indeed, in my original version, no DW(u32) is defined for the header.
> After that, I noticed that it would not be easy for the static casting to a u32
> from bitfield(the compiler will complain), and it would not be clear to
> swap the endian if the user wants to use this header. I added this DW(u32)
> to simplify the usage of this header. But yes, if I do not add it here, it would
> be not easy or clear for users who just use this header structure.
> I will change it. Is it OK if I use the name "dw0"?

In my opinion, u32 is more usual than dw0.

> 
> > 
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #0: IQ Data  */ struct
> > > +rte_ecpri_msg_iq_data {
> > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #1: Bit Sequence  */ struct
> > > +rte_ecpri_msg_bit_seq {
> > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #2: Real-Time Control Data  */
> > struct
> > > +rte_ecpri_msg_rtc_ctrl {
> > > +	rte_be16_t rtc_id;		/**< Real-Time Control Data ID
> > */
> > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #3: Generic Data Transfer  */
> > struct
> > > +rte_ecpri_msg_gen_data {
> > > +	rte_be32_t pc_id;		/**< Physical channel ID */
> > > +	rte_be32_t seq_id;		/**< Sequence ID */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #4: Remote Memory Access  */
> > > +RTE_STD_C11
> > > +struct rte_ecpri_msg_rm_access {
> > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > +	uint32_t ele_id:16;		/**< Element ID */
> > > +	uint32_t rr:4;			/**< Req/Resp */
> > > +	uint32_t rw:4;			/**< Read/Write */
> > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > ID */
> > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > ID */
> > > +	uint32_t rw:4;			/**< Read/Write */
> > > +	uint32_t rr:4;			/**< Req/Resp */
> > > +	uint32_t ele_id:16;		/**< Element ID */
> > > +#endif
> > > +	rte_be16_t addr_m;		/**< 48-bits address (16 MSB)
> > */
> > > +	rte_be32_t addr_l;		/**< 48-bits address (32 LSB) */
> > > +	rte_be16_t length;		/**< number of bytes */
> > > +} __rte_packed;
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #5: One-Way Delay Measurement
> > */
> > > +struct rte_ecpri_msg_delay_measure {
> > > +	uint8_t msr_id;			/**< Measurement ID */
> > > +	uint8_t act_type;		/**< Action Type */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #6: Remote Reset  */ struct
> > > +rte_ecpri_msg_remote_reset {
> > > +	rte_be16_t rst_id;		/**< Reset ID */
> > > +	uint8_t rst_op;			/**< Reset Code Op */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header of Type #7: Event Indication  */ struct
> > > +rte_ecpri_msg_event_ind {
> > > +	uint8_t evt_id;			/**< Event ID */
> > > +	uint8_t evt_type;		/**< Event Type */
> > > +	uint8_t seq;			/**< Sequence Number */
> > > +	uint8_t number;			/**< Number of
> > Faults/Notif */
> > > +};
> > > +
> > > +/**
> > > + * eCPRI Message Header Format: Common Header + Message
> > Types  */
> > > +RTE_STD_C11
> > > +struct rte_ecpri_msg_hdr {
> > > +	union {
> > > +		struct rte_ecpri_common_hdr common;
> > > +		uint32_t dw0;
> > > +	};
> > > +	union {
> > > +		struct rte_ecpri_msg_iq_data type0;
> > > +		struct rte_ecpri_msg_bit_seq type1;
> > > +		struct rte_ecpri_msg_rtc_ctrl type2;
> > > +		struct rte_ecpri_msg_bit_seq type3;
> > > +		struct rte_ecpri_msg_rm_access type4;
> > > +		struct rte_ecpri_msg_delay_measure type5;
> > > +		struct rte_ecpri_msg_remote_reset type6;
> > > +		struct rte_ecpri_msg_event_ind type7;
> > > +		uint32_t dummy[3];
> > > +	};
> > > +};
> > 
> > What is the point in having this struct?
> > 
> > From a software point of view, I think it is a bit risky, because its size is
> > the size of the largest message. This is probably what you want in your
> > case, but when a software will rx or tx such packet, I think they
> > shouldn't use this one. My understanding is that you only need this
> > structure for the mask in rte_flow.
> > 
> > Also, I'm not sure to understand the purpose of dummy[3], even after
> > reading your answer to Akhil's question.
> > 
> 
> Basically YES and no. To my understanding, the eCPRI message format is something
> like the ICMP packet format. The message (packet) itself will be parsed into
> different formats based on the type of the common header. In the message
> payload part, there is no distinct definition of the "sub-header". We can divide
> them into the sub-header and data parts based on the specification.
> E.g. physical channel ID / real-time control ID / Event ID + type are the parts
> that datapath forwarding will only care about. The following timestamp or user data
> parts are the parts that the higher layer in the application will use.
> 1. If an application wants to create some offload flow, or even handle it in the SW, the
> common header + first several bytes in the payload should be enough. BUT YES, it is
> not good or safe to use it in the higher layer of the application.
> 2. A higher layer of the application should have its own definition of the whole payload
> of a specific sub-type, including the parsing of the user data part after the "sub-header".
> It is better for them just skip the first 4 bytes of the eCPRI message or a known offset.
> We do not need to cover the upper layers. 

Let me explain what is my vision of how an application would use the
structures (these are completly dummy examples, as I don't know ecpri
protocol at all).

Rx:

	int ecpri_input(struct rte_mbuf *m)
	{
		struct rte_ecpri_common_hdr hdr_copy, *hdr;
		struct rte_ecpri_msg_event_ind event_copy, *event;
		struct app_specific app_copy, *app;

		hdr = rte_pktmbuf_read(m, 0, sizeof(*hdr), &hdr_copy);
		if (unlikely(hdr == NULL))
			return -1;
		switch (hdr->type) {
		...
		case RTE_ECPRI_EVT_IND_NTFY_IND:
			event = rte_pktmbuf_read(m, sizeof(*hdr), sizeof(*event),
				&event_copy);
			if (unlikely(event == NULL))
				return -1;
			...
			app = rte_pktmbuf_read(m, sizeof(*app),
				sizeof(*hdr) + sizeof(*event),
				&app_copy);
			...

Tx:

	int ecpri_output(void)
	{
		struct rte_ecpri_common_hdr *hdr;
		struct rte_ecpri_msg_event_ind *event;
		struct app_specific *app;

		m = rte_pktmbuf_alloc(mp);
		if (unlikely(m == NULL))
			return -1;

		app = rte_pktmbuf_append(m, sizeof(*data));
		if (app == NULL)
			...
		app->... = ...;
		...
		event = rte_pktmbuf_prepend(m, sizeof(*event));
		if (event == NULL)
			...
		event->... = ...;
		...
		hdr = rte_pktmbuf_prepend(m, sizeof(*hdr));
		if (hdr == NULL)
			...
		hdr->... = ...;

		return packet_send(m);
	}

In these 2 examples, we never need the unioned structure (struct
rte_ecpri_msg_hdr).

Using it does not look possible to me, because its size is corresponds
to the largest message, not to the one we are parsing/building.

> I think some comments could be added here, is it OK?
> 3. Regarding this structure, I add it because I do not want to introduce a lot of new items
> in the rte_flow: new items with structures, new enum types. I prefer one single structure
> will cover most of the cases (subtypes). What do you think?
> 4. About the *dummy* u32, I calculated all the "subheaders" and choose the maximal value
> of the length. Two purposes (same as the u32 in the common header):
>   a. easy to swap the endianness, but not quite useful. Because some parts are u16 and u8,
>     and should not be swapped in a u32. (some physical channel ID and address LSB have 32bits width)
>     But if some HW parsed the header u32 by u32, then it would be helpful.
>   b. easy for checking in flow API, if the user wants to insert a flow. Some checking should
>       be done to confirm if it is wildcard flow (all eCPRI messages or eCPRI message in some specific type),
>       or some precise flow (to match the pc id or rtc id, for example). With these fields, 3 DW
>       of the masks only need to be check before continuing. Or else, the code needs to check the type
>       and a lot of switch-case conditions and go through all different members of each header.

Thanks for the clarification.

I'll tend to say that if the rte_ecpri_msg_hdr structure is only
useful for rte_flow, it should be defined inside rte_flow.

However, I have some fears about the dummy[3]. You said it could be
enlarged later: I think it is dangerous to change the size of a
structure that may be used to parse data (and this would be an ABI
change). Also, it seems dummy[3] cannot be used easily to swap
endianness, so is it really useful?


Thanks,
Olivier


> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_ECPRI_H_ */
> > > diff --git a/lib/librte_net/rte_ether.h b/lib/librte_net/rte_ether.h
> > > index 0ae4e75..184a3f9 100644
> > > --- a/lib/librte_net/rte_ether.h
> > > +++ b/lib/librte_net/rte_ether.h
> > > @@ -304,6 +304,7 @@ struct rte_vlan_hdr {  #define
> > RTE_ETHER_TYPE_LLDP
> > > 0x88CC /**< LLDP Protocol. */  #define RTE_ETHER_TYPE_MPLS
> > 0x8847 /**<
> > > MPLS ethertype. */  #define RTE_ETHER_TYPE_MPLSM 0x8848 /**<
> > MPLS
> > > multicast ethertype. */
> > > +#define RTE_ETHER_TYPE_ECPRI 0xAEFE /**< eCPRI ethertype (.1Q
> > > +supported). */
> > >
> > >  /**
> > >   * Extract VLAN tag information into mbuf
> > > --
> > > 1.8.3.1
> > >
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
@ 2020-07-12 14:13  0%   ` Xu, Rosen
  0 siblings, 0 replies; 200+ results
From: Xu, Rosen @ 2020-07-12 14:13 UTC (permalink / raw)
  To: Richardson, Bruce, Nipun Gupta, Hemant Agrawal
  Cc: dev, Zhang, Tianfei, Li, Xiaoyun, Wu, Jingjing, Satha Rao,
	Mahipal Challa, Jerin Jacob

Hi,

Reviewed-by: Rosen Xu <rosen.xu@intel.com>

> -----Original Message-----
> From: Richardson, Bruce <bruce.richardson@intel.com>
> Sent: Thursday, July 09, 2020 23:21
> To: Nipun Gupta <nipun.gupta@nxp.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>
> Cc: dev@dpdk.org; Xu, Rosen <rosen.xu@intel.com>; Zhang, Tianfei
> <tianfei.zhang@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Satha Rao <skoteshwar@marvell.com>; Mahipal
> Challa <mchalla@marvell.com>; Jerin Jacob <jerinj@marvell.com>;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [PATCH 20.11 1/5] rawdev: add private data length parameter to info
> fn
> 
> Currently with the rawdev API there is no way to check that the structure
> passed in via the dev_private pointer in the dev_info structure is of the
> correct type - it's just checked that it is non-NULL. Adding in the length of the
> expected structure provides a measure of typechecking, and can also be
> used for ABI compatibility in future, since ABI changes involving structs
> almost always involve a change in size.
> 
> Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/bus/ifpga/ifpga_bus.c               |  2 +-
>  drivers/raw/ifpga/ifpga_rawdev.c            |  5 +++--
>  drivers/raw/ioat/ioat_rawdev.c              |  5 +++--
>  drivers/raw/ioat/ioat_rawdev_test.c         |  4 ++--
>  drivers/raw/ntb/ntb.c                       |  8 +++++++-
>  drivers/raw/skeleton/skeleton_rawdev.c      |  5 +++--
>  drivers/raw/skeleton/skeleton_rawdev_test.c | 19 ++++++++++++-------
>  examples/ioat/ioatfwd.c                     |  2 +-
>  examples/ntb/ntb_fwd.c                      |  2 +-
>  lib/librte_rawdev/rte_rawdev.c              |  6 ++++--
>  lib/librte_rawdev/rte_rawdev.h              |  9 ++++++++-
>  lib/librte_rawdev/rte_rawdev_pmd.h          |  5 ++++-
>  12 files changed, 49 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
> index 6b16a20bb..bb8b3dcfb 100644
> --- a/drivers/bus/ifpga/ifpga_bus.c
> +++ b/drivers/bus/ifpga/ifpga_bus.c
> @@ -162,7 +162,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
>  	afu_dev->id.port      = afu_pr_conf.afu_id.port;
> 
>  	if (rawdev->dev_ops && rawdev->dev_ops->dev_info_get)
> -		rawdev->dev_ops->dev_info_get(rawdev, afu_dev);
> +		rawdev->dev_ops->dev_info_get(rawdev, afu_dev,
> sizeof(*afu_dev));
> 
>  	if (rawdev->dev_ops &&
>  		rawdev->dev_ops->dev_start &&
> diff --git a/drivers/raw/ifpga/ifpga_rawdev.c
> b/drivers/raw/ifpga/ifpga_rawdev.c
> index cc25c662b..47cfa3877 100644
> --- a/drivers/raw/ifpga/ifpga_rawdev.c
> +++ b/drivers/raw/ifpga/ifpga_rawdev.c
> @@ -605,7 +605,8 @@ ifpga_fill_afu_dev(struct opae_accelerator *acc,
> 
>  static void
>  ifpga_rawdev_info_get(struct rte_rawdev *dev,
> -				     rte_rawdev_obj_t dev_info)
> +		      rte_rawdev_obj_t dev_info,
> +		      size_t dev_info_size)
>  {
>  	struct opae_adapter *adapter;
>  	struct opae_accelerator *acc;
> @@ -617,7 +618,7 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
> 
>  	IFPGA_RAWDEV_PMD_FUNC_TRACE();
> 
> -	if (!dev_info) {
> +	if (!dev_info || dev_info_size != sizeof(*afu_dev)) {
>  		IFPGA_RAWDEV_PMD_ERR("Invalid request");
>  		return;
>  	}
> diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
> index f876ffc3f..8dd856c55 100644
> --- a/drivers/raw/ioat/ioat_rawdev.c
> +++ b/drivers/raw/ioat/ioat_rawdev.c
> @@ -113,12 +113,13 @@ ioat_dev_stop(struct rte_rawdev *dev)  }
> 
>  static void
> -ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
> +ioat_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
> +		size_t dev_info_size)
>  {
>  	struct rte_ioat_rawdev_config *cfg = dev_info;
>  	struct rte_ioat_rawdev *ioat = dev->dev_private;
> 
> -	if (cfg != NULL)
> +	if (cfg != NULL && dev_info_size == sizeof(*cfg))
>  		cfg->ring_size = ioat->ring_size;
>  }
> 
> diff --git a/drivers/raw/ioat/ioat_rawdev_test.c
> b/drivers/raw/ioat/ioat_rawdev_test.c
> index d99f1bd6b..90f5974cd 100644
> --- a/drivers/raw/ioat/ioat_rawdev_test.c
> +++ b/drivers/raw/ioat/ioat_rawdev_test.c
> @@ -157,7 +157,7 @@ ioat_rawdev_test(uint16_t dev_id)
>  		return TEST_SKIPPED;
>  	}
> 
> -	rte_rawdev_info_get(dev_id, &info);
> +	rte_rawdev_info_get(dev_id, &info, sizeof(p));
>  	if (p.ring_size != expected_ring_size[dev_id]) {
>  		printf("Error, initial ring size is not as expected (Actual: %d,
> Expected: %d)\n",
>  				(int)p.ring_size, expected_ring_size[dev_id]);
> @@ -169,7 +169,7 @@ ioat_rawdev_test(uint16_t dev_id)
>  		printf("Error with rte_rawdev_configure()\n");
>  		return -1;
>  	}
> -	rte_rawdev_info_get(dev_id, &info);
> +	rte_rawdev_info_get(dev_id, &info, sizeof(p));
>  	if (p.ring_size != IOAT_TEST_RINGSIZE) {
>  		printf("Error, ring size is not %d (%d)\n",
>  				IOAT_TEST_RINGSIZE, (int)p.ring_size); diff --
> git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index
> e40412bb7..4676c6f8f 100644
> --- a/drivers/raw/ntb/ntb.c
> +++ b/drivers/raw/ntb/ntb.c
> @@ -801,11 +801,17 @@ ntb_dequeue_bufs(struct rte_rawdev *dev,  }
> 
>  static void
> -ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info)
> +ntb_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
> +		size_t dev_info_size)
>  {
>  	struct ntb_hw *hw = dev->dev_private;
>  	struct ntb_dev_info *info = dev_info;
> 
> +	if (dev_info_size != sizeof(*info)){
> +		NTB_LOG(ERR, "Invalid size parameter to %s", __func__);
> +		return;
> +	}
> +
>  	info->mw_cnt = hw->mw_cnt;
>  	info->mw_size = hw->mw_size;
> 
> diff --git a/drivers/raw/skeleton/skeleton_rawdev.c
> b/drivers/raw/skeleton/skeleton_rawdev.c
> index 72ece887a..dc05f3ecf 100644
> --- a/drivers/raw/skeleton/skeleton_rawdev.c
> +++ b/drivers/raw/skeleton/skeleton_rawdev.c
> @@ -42,14 +42,15 @@ static struct queue_buffers
> queue_buf[SKELETON_MAX_QUEUES] = {};  static void clear_queue_bufs(int
> queue_id);
> 
>  static void skeleton_rawdev_info_get(struct rte_rawdev *dev,
> -				     rte_rawdev_obj_t dev_info)
> +				     rte_rawdev_obj_t dev_info,
> +				     size_t dev_info_size)
>  {
>  	struct skeleton_rawdev *skeldev;
>  	struct skeleton_rawdev_conf *skeldev_conf;
> 
>  	SKELETON_PMD_FUNC_TRACE();
> 
> -	if (!dev_info) {
> +	if (!dev_info || dev_info_size != sizeof(*skeldev_conf)) {
>  		SKELETON_PMD_ERR("Invalid request");
>  		return;
>  	}
> diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c
> b/drivers/raw/skeleton/skeleton_rawdev_test.c
> index 9ecfdee81..9b8390dfb 100644
> --- a/drivers/raw/skeleton/skeleton_rawdev_test.c
> +++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
> @@ -106,12 +106,12 @@ test_rawdev_info_get(void)
>  	struct rte_rawdev_info rdev_info = {0};
>  	struct skeleton_rawdev_conf skel_conf = {0};
> 
> -	ret = rte_rawdev_info_get(test_dev_id, NULL);
> +	ret = rte_rawdev_info_get(test_dev_id, NULL, 0);
>  	RTE_TEST_ASSERT(ret == -EINVAL, "Expected -EINVAL, %d", ret);
> 
>  	rdev_info.dev_private = &skel_conf;
> 
> -	ret = rte_rawdev_info_get(test_dev_id, &rdev_info);
> +	ret = rte_rawdev_info_get(test_dev_id, &rdev_info,
> sizeof(skel_conf));
>  	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to get raw dev info");
> 
>  	return TEST_SUCCESS;
> @@ -142,7 +142,8 @@ test_rawdev_configure(void)
> 
>  	rdev_info.dev_private = &rdev_conf_get;
>  	ret = rte_rawdev_info_get(test_dev_id,
> -				  (rte_rawdev_obj_t)&rdev_info);
> +				  (rte_rawdev_obj_t)&rdev_info,
> +				  sizeof(rdev_conf_get));
>  	RTE_TEST_ASSERT_SUCCESS(ret,
>  				"Failed to obtain rawdev configuration (%d)",
>  				ret);
> @@ -170,7 +171,8 @@ test_rawdev_queue_default_conf_get(void)
>  	/* Get the current configuration */
>  	rdev_info.dev_private = &rdev_conf_get;
>  	ret = rte_rawdev_info_get(test_dev_id,
> -				  (rte_rawdev_obj_t)&rdev_info);
> +				  (rte_rawdev_obj_t)&rdev_info,
> +				  sizeof(rdev_conf_get));
>  	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to obtain rawdev
> configuration (%d)",
>  				ret);
> 
> @@ -218,7 +220,8 @@ test_rawdev_queue_setup(void)
>  	/* Get the current configuration */
>  	rdev_info.dev_private = &rdev_conf_get;
>  	ret = rte_rawdev_info_get(test_dev_id,
> -				  (rte_rawdev_obj_t)&rdev_info);
> +				  (rte_rawdev_obj_t)&rdev_info,
> +				  sizeof(rdev_conf_get));
>  	RTE_TEST_ASSERT_SUCCESS(ret,
>  				"Failed to obtain rawdev configuration (%d)",
>  				ret);
> @@ -327,7 +330,8 @@ test_rawdev_start_stop(void)
>  	dummy_firmware = NULL;
> 
>  	rte_rawdev_start(test_dev_id);
> -	ret = rte_rawdev_info_get(test_dev_id,
> (rte_rawdev_obj_t)&rdev_info);
> +	ret = rte_rawdev_info_get(test_dev_id,
> (rte_rawdev_obj_t)&rdev_info,
> +			sizeof(rdev_conf_get));
>  	RTE_TEST_ASSERT_SUCCESS(ret,
>  				"Failed to obtain rawdev configuration (%d)",
>  				ret);
> @@ -336,7 +340,8 @@ test_rawdev_start_stop(void)
>  			      rdev_conf_get.device_state);
> 
>  	rte_rawdev_stop(test_dev_id);
> -	ret = rte_rawdev_info_get(test_dev_id,
> (rte_rawdev_obj_t)&rdev_info);
> +	ret = rte_rawdev_info_get(test_dev_id,
> (rte_rawdev_obj_t)&rdev_info,
> +			sizeof(rdev_conf_get));
>  	RTE_TEST_ASSERT_SUCCESS(ret,
>  				"Failed to obtain rawdev configuration (%d)",
>  				ret);
> diff --git a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c index
> b66ee73bc..5c631da1b 100644
> --- a/examples/ioat/ioatfwd.c
> +++ b/examples/ioat/ioatfwd.c
> @@ -757,7 +757,7 @@ assign_rawdevs(void)
>  			do {
>  				if (rdev_id == rte_rawdev_count())
>  					goto end;
> -				rte_rawdev_info_get(rdev_id++,
> &rdev_info);
> +				rte_rawdev_info_get(rdev_id++, &rdev_info,
> 0);
>  			} while (rdev_info.driver_name == NULL ||
>  					strcmp(rdev_info.driver_name,
> 
> 	IOAT_PMD_RAWDEV_NAME_STR) != 0);
> diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c index
> eba8ebf9f..11e224451 100644
> --- a/examples/ntb/ntb_fwd.c
> +++ b/examples/ntb/ntb_fwd.c
> @@ -1389,7 +1389,7 @@ main(int argc, char **argv)
>  	rte_rawdev_set_attr(dev_id, NTB_QUEUE_NUM_NAME,
> num_queues);
>  	printf("Set queue number as %u.\n", num_queues);
>  	ntb_rawdev_info.dev_private = (rte_rawdev_obj_t)(&ntb_info);
> -	rte_rawdev_info_get(dev_id, &ntb_rawdev_info);
> +	rte_rawdev_info_get(dev_id, &ntb_rawdev_info, sizeof(ntb_info));
> 
>  	nb_mbuf = nb_desc * num_queues * 2 * 2 + rte_lcore_count() *
>  		  MEMPOOL_CACHE_SIZE;
> diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
> index 8f84d0b22..a57689035 100644
> --- a/lib/librte_rawdev/rte_rawdev.c
> +++ b/lib/librte_rawdev/rte_rawdev.c
> @@ -78,7 +78,8 @@ rte_rawdev_socket_id(uint16_t dev_id)  }
> 
>  int
> -rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info)
> +rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
> +		size_t dev_private_size)
>  {
>  	struct rte_rawdev *rawdev;
> 
> @@ -89,7 +90,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct
> rte_rawdev_info *dev_info)
> 
>  	if (dev_info->dev_private != NULL) {
>  		RTE_FUNC_PTR_OR_ERR_RET(*rawdev->dev_ops-
> >dev_info_get, -ENOTSUP);
> -		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info-
> >dev_private);
> +		(*rawdev->dev_ops->dev_info_get)(rawdev, dev_info-
> >dev_private,
> +				dev_private_size);
>  	}
> 
>  	dev_info->driver_name = rawdev->driver_name; diff --git
> a/lib/librte_rawdev/rte_rawdev.h b/lib/librte_rawdev/rte_rawdev.h index
> 32f6b8bb0..cf6acfd26 100644
> --- a/lib/librte_rawdev/rte_rawdev.h
> +++ b/lib/librte_rawdev/rte_rawdev.h
> @@ -82,13 +82,20 @@ struct rte_rawdev_info;
>   *   will be returned. This can be used to safely query the type of a rawdev
>   *   instance without needing to know the size of the private data to return.
>   *
> + * @param dev_private_size
> + *   The length of the memory space pointed to by dev_private in dev_info.
> + *   This should be set to the size of the expected private structure to be
> + *   returned, and may be checked by drivers to ensure the expected struct
> + *   type is provided.
> + *
>   * @return
>   *   - 0: Success, driver updates the contextual information of the raw device
>   *   - <0: Error code returned by the driver info get function.
>   *
>   */
>  int
> -rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info);
> +rte_rawdev_info_get(uint16_t dev_id, struct rte_rawdev_info *dev_info,
> +		size_t dev_private_size);
> 
>  /**
>   * Configure a raw device.
> diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h
> b/lib/librte_rawdev/rte_rawdev_pmd.h
> index 4395a2182..0e72a9205 100644
> --- a/lib/librte_rawdev/rte_rawdev_pmd.h
> +++ b/lib/librte_rawdev/rte_rawdev_pmd.h
> @@ -138,12 +138,15 @@ rte_rawdev_pmd_is_valid_dev(uint8_t dev_id)
>   *   Raw device pointer
>   * @param dev_info
>   *   Raw device information structure
> + * @param dev_private_size
> + *   The size of the structure pointed to by dev_info->dev_private
>   *
>   * @return
>   *   Returns 0 on success
>   */
>  typedef void (*rawdev_info_get_t)(struct rte_rawdev *dev,
> -				  rte_rawdev_obj_t dev_info);
> +				  rte_rawdev_obj_t dev_info,
> +				  size_t dev_private_size);
> 
>  /**
>   * Configure a device.
> --
> 2.25.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn
  2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
@ 2020-07-12 14:13  0%   ` Xu, Rosen
  0 siblings, 0 replies; 200+ results
From: Xu, Rosen @ 2020-07-12 14:13 UTC (permalink / raw)
  To: Richardson, Bruce, Nipun Gupta, Hemant Agrawal
  Cc: dev, Zhang, Tianfei, Li, Xiaoyun, Wu, Jingjing, Satha Rao,
	Mahipal Challa, Jerin Jacob

Hi,

Reviewed-by: Rosen Xu <rosen.xu@intel.com>

> -----Original Message-----
> From: Richardson, Bruce <bruce.richardson@intel.com>
> Sent: Thursday, July 09, 2020 23:21
> To: Nipun Gupta <nipun.gupta@nxp.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>
> Cc: dev@dpdk.org; Xu, Rosen <rosen.xu@intel.com>; Zhang, Tianfei
> <tianfei.zhang@intel.com>; Li, Xiaoyun <xiaoyun.li@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Satha Rao <skoteshwar@marvell.com>; Mahipal
> Challa <mchalla@marvell.com>; Jerin Jacob <jerinj@marvell.com>;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [PATCH 20.11 3/5] rawdev: add private data length parameter to
> config fn
> 
> Currently with the rawdev API there is no way to check that the structure
> passed in via the dev_private pointer in the structure passed to configure API
> is of the correct type - it's just checked that it is non-NULL. Adding in the
> length of the expected structure provides a measure of typechecking, and
> can also be used for ABI compatibility in future, since ABI changes involving
> structs almost always involve a change in size.
> 
> Signed-off-by:  Bruce Richardson <bruce.richardson@intel.com>
> ---
>  drivers/raw/ifpga/ifpga_rawdev.c            | 3 ++-
>  drivers/raw/ioat/ioat_rawdev.c              | 5 +++--
>  drivers/raw/ioat/ioat_rawdev_test.c         | 2 +-
>  drivers/raw/ntb/ntb.c                       | 6 +++++-
>  drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c | 7 ++++---
>  drivers/raw/octeontx2_dma/otx2_dpi_test.c   | 3 ++-
>  drivers/raw/octeontx2_ep/otx2_ep_rawdev.c   | 7 ++++---
>  drivers/raw/octeontx2_ep/otx2_ep_test.c     | 2 +-
>  drivers/raw/skeleton/skeleton_rawdev.c      | 5 +++--
>  drivers/raw/skeleton/skeleton_rawdev_test.c | 5 +++--
>  examples/ioat/ioatfwd.c                     | 2 +-
>  examples/ntb/ntb_fwd.c                      | 2 +-
>  lib/librte_rawdev/rte_rawdev.c              | 6 ++++--
>  lib/librte_rawdev/rte_rawdev.h              | 8 +++++++-
>  lib/librte_rawdev/rte_rawdev_pmd.h          | 3 ++-
>  15 files changed, 43 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/raw/ifpga/ifpga_rawdev.c
> b/drivers/raw/ifpga/ifpga_rawdev.c
> index 32a2b96c9..a50173264 100644
> --- a/drivers/raw/ifpga/ifpga_rawdev.c
> +++ b/drivers/raw/ifpga/ifpga_rawdev.c
> @@ -684,7 +684,8 @@ ifpga_rawdev_info_get(struct rte_rawdev *dev,
> 
>  static int
>  ifpga_rawdev_configure(const struct rte_rawdev *dev,
> -		rte_rawdev_obj_t config)
> +		rte_rawdev_obj_t config,
> +		size_t config_size __rte_unused)
>  {
>  	IFPGA_RAWDEV_PMD_FUNC_TRACE();
> 
> diff --git a/drivers/raw/ioat/ioat_rawdev.c b/drivers/raw/ioat/ioat_rawdev.c
> index 6a336795d..b29ff983f 100644
> --- a/drivers/raw/ioat/ioat_rawdev.c
> +++ b/drivers/raw/ioat/ioat_rawdev.c
> @@ -41,7 +41,8 @@ RTE_LOG_REGISTER(ioat_pmd_logtype, rawdev.ioat,
> INFO);  #define COMPLETION_SZ sizeof(__m128i)
> 
>  static int
> -ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
> +ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> config,
> +		size_t config_size)
>  {
>  	struct rte_ioat_rawdev_config *params = config;
>  	struct rte_ioat_rawdev *ioat = dev->dev_private; @@ -51,7 +52,7
> @@ ioat_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> config)
>  	if (dev->started)
>  		return -EBUSY;
> 
> -	if (params == NULL)
> +	if (params == NULL || config_size != sizeof(*params))
>  		return -EINVAL;
> 
>  	if (params->ring_size > 4096 || params->ring_size < 64 || diff --git
> a/drivers/raw/ioat/ioat_rawdev_test.c
> b/drivers/raw/ioat/ioat_rawdev_test.c
> index 90f5974cd..e5b50ae9f 100644
> --- a/drivers/raw/ioat/ioat_rawdev_test.c
> +++ b/drivers/raw/ioat/ioat_rawdev_test.c
> @@ -165,7 +165,7 @@ ioat_rawdev_test(uint16_t dev_id)
>  	}
> 
>  	p.ring_size = IOAT_TEST_RINGSIZE;
> -	if (rte_rawdev_configure(dev_id, &info) != 0) {
> +	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
>  		printf("Error with rte_rawdev_configure()\n");
>  		return -1;
>  	}
> diff --git a/drivers/raw/ntb/ntb.c b/drivers/raw/ntb/ntb.c index
> eaeb67b74..c181094d5 100644
> --- a/drivers/raw/ntb/ntb.c
> +++ b/drivers/raw/ntb/ntb.c
> @@ -837,13 +837,17 @@ ntb_dev_info_get(struct rte_rawdev *dev,
> rte_rawdev_obj_t dev_info,  }
> 
>  static int
> -ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config)
> +ntb_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> config,
> +		size_t config_size)
>  {
>  	struct ntb_dev_config *conf = config;
>  	struct ntb_hw *hw = dev->dev_private;
>  	uint32_t xstats_num;
>  	int ret;
> 
> +	if (conf == NULL || config_size != sizeof(*conf))
> +		return -EINVAL;
> +
>  	hw->queue_pairs	= conf->num_queues;
>  	hw->queue_size = conf->queue_size;
>  	hw->used_mw_num = conf->mz_num;
> diff --git a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
> b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
> index e398abb75..5b496446c 100644
> --- a/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
> +++ b/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c
> @@ -294,7 +294,8 @@ otx2_dpi_rawdev_reset(struct rte_rawdev *dev)  }
> 
>  static int
> -otx2_dpi_rawdev_configure(const struct rte_rawdev *dev,
> rte_rawdev_obj_t config)
> +otx2_dpi_rawdev_configure(const struct rte_rawdev *dev,
> rte_rawdev_obj_t config,
> +		size_t config_size)
>  {
>  	struct dpi_rawdev_conf_s *conf = config;
>  	struct dpi_vf_s *dpivf = NULL;
> @@ -302,8 +303,8 @@ otx2_dpi_rawdev_configure(const struct rte_rawdev
> *dev, rte_rawdev_obj_t config)
>  	uintptr_t pool;
>  	uint32_t gaura;
> 
> -	if (conf == NULL) {
> -		otx2_dpi_dbg("NULL configuration");
> +	if (conf == NULL || config_size != sizeof(*conf)) {
> +		otx2_dpi_dbg("NULL or invalid configuration");
>  		return -EINVAL;
>  	}
>  	dpivf = (struct dpi_vf_s *)dev->dev_private; diff --git
> a/drivers/raw/octeontx2_dma/otx2_dpi_test.c
> b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
> index 276658af0..cec6ca91b 100644
> --- a/drivers/raw/octeontx2_dma/otx2_dpi_test.c
> +++ b/drivers/raw/octeontx2_dma/otx2_dpi_test.c
> @@ -182,7 +182,8 @@ test_otx2_dma_rawdev(uint16_t val)
>  	/* Configure rawdev ports */
>  	conf.chunk_pool = dpi_create_mempool();
>  	rdev_info.dev_private = &conf;
> -	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info);
> +	ret = rte_rawdev_configure(i, (rte_rawdev_obj_t)&rdev_info,
> +			sizeof(conf));
>  	if (ret) {
>  		otx2_dpi_dbg("Unable to configure DPIVF %d", i);
>  		return -ENODEV;
> diff --git a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
> b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
> index 0778603d5..2b78a7941 100644
> --- a/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
> +++ b/drivers/raw/octeontx2_ep/otx2_ep_rawdev.c
> @@ -224,13 +224,14 @@ sdp_rawdev_close(struct rte_rawdev *dev)  }
> 
>  static int
> -sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> config)
> +sdp_rawdev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t
> config,
> +		size_t config_size)
>  {
>  	struct sdp_rawdev_info *app_info = (struct sdp_rawdev_info
> *)config;
>  	struct sdp_device *sdpvf;
> 
> -	if (app_info == NULL) {
> -		otx2_err("Application config info [NULL]");
> +	if (app_info == NULL || config_size != sizeof(*app_info)) {
> +		otx2_err("Application config info [NULL] or incorrect size");
>  		return -EINVAL;
>  	}
> 
> diff --git a/drivers/raw/octeontx2_ep/otx2_ep_test.c
> b/drivers/raw/octeontx2_ep/otx2_ep_test.c
> index 091f1827c..b876275f7 100644
> --- a/drivers/raw/octeontx2_ep/otx2_ep_test.c
> +++ b/drivers/raw/octeontx2_ep/otx2_ep_test.c
> @@ -108,7 +108,7 @@ sdp_rawdev_selftest(uint16_t dev_id)
> 
>  	dev_info.dev_private = &app_info;
> 
> -	ret = rte_rawdev_configure(dev_id, &dev_info);
> +	ret = rte_rawdev_configure(dev_id, &dev_info, sizeof(app_info));
>  	if (ret) {
>  		otx2_err("Unable to configure SDP_VF %d", dev_id);
>  		rte_mempool_free(ioq_mpool);
> diff --git a/drivers/raw/skeleton/skeleton_rawdev.c
> b/drivers/raw/skeleton/skeleton_rawdev.c
> index dce300c35..531d0450c 100644
> --- a/drivers/raw/skeleton/skeleton_rawdev.c
> +++ b/drivers/raw/skeleton/skeleton_rawdev.c
> @@ -68,7 +68,8 @@ static int skeleton_rawdev_info_get(struct rte_rawdev
> *dev,  }
> 
>  static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
> -				     rte_rawdev_obj_t config)
> +				     rte_rawdev_obj_t config,
> +				     size_t config_size)
>  {
>  	struct skeleton_rawdev *skeldev;
>  	struct skeleton_rawdev_conf *skeldev_conf; @@ -77,7 +78,7 @@
> static int skeleton_rawdev_configure(const struct rte_rawdev *dev,
> 
>  	RTE_FUNC_PTR_OR_ERR_RET(dev, -EINVAL);
> 
> -	if (!config) {
> +	if (config == NULL || config_size != sizeof(*skeldev_conf)) {
>  		SKELETON_PMD_ERR("Invalid configuration");
>  		return -EINVAL;
>  	}
> diff --git a/drivers/raw/skeleton/skeleton_rawdev_test.c
> b/drivers/raw/skeleton/skeleton_rawdev_test.c
> index 9b8390dfb..7dc7c7684 100644
> --- a/drivers/raw/skeleton/skeleton_rawdev_test.c
> +++ b/drivers/raw/skeleton/skeleton_rawdev_test.c
> @@ -126,7 +126,7 @@ test_rawdev_configure(void)
>  	struct skeleton_rawdev_conf rdev_conf_get = {0};
> 
>  	/* Check invalid configuration */
> -	ret = rte_rawdev_configure(test_dev_id, NULL);
> +	ret = rte_rawdev_configure(test_dev_id, NULL, 0);
>  	RTE_TEST_ASSERT(ret == -EINVAL,
>  			"Null configure; Expected -EINVAL, got %d", ret);
> 
> @@ -137,7 +137,8 @@ test_rawdev_configure(void)
> 
>  	rdev_info.dev_private = &rdev_conf_set;
>  	ret = rte_rawdev_configure(test_dev_id,
> -				   (rte_rawdev_obj_t)&rdev_info);
> +				   (rte_rawdev_obj_t)&rdev_info,
> +				   sizeof(rdev_conf_set));
>  	RTE_TEST_ASSERT_SUCCESS(ret, "Failed to configure rawdev (%d)",
> ret);
> 
>  	rdev_info.dev_private = &rdev_conf_get; diff --git
> a/examples/ioat/ioatfwd.c b/examples/ioat/ioatfwd.c index
> 5c631da1b..8e9513e44 100644
> --- a/examples/ioat/ioatfwd.c
> +++ b/examples/ioat/ioatfwd.c
> @@ -734,7 +734,7 @@ configure_rawdev_queue(uint32_t dev_id)
>  	struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size };
>  	struct rte_rawdev_info info = { .dev_private = &dev_config };
> 
> -	if (rte_rawdev_configure(dev_id, &info) != 0) {
> +	if (rte_rawdev_configure(dev_id, &info, sizeof(dev_config)) != 0) {
>  		rte_exit(EXIT_FAILURE,
>  			"Error with rte_rawdev_configure()\n");
>  	}
> diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c index
> 11e224451..656f73659 100644
> --- a/examples/ntb/ntb_fwd.c
> +++ b/examples/ntb/ntb_fwd.c
> @@ -1401,7 +1401,7 @@ main(int argc, char **argv)
>  	ntb_conf.num_queues = num_queues;
>  	ntb_conf.queue_size = nb_desc;
>  	ntb_rawdev_conf.dev_private = (rte_rawdev_obj_t)(&ntb_conf);
> -	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf);
> +	ret = rte_rawdev_configure(dev_id, &ntb_rawdev_conf,
> +sizeof(ntb_conf));
>  	if (ret)
>  		rte_exit(EXIT_FAILURE, "Can't config ntb dev: err=%d, "
>  			"port=%u\n", ret, dev_id);
> diff --git a/lib/librte_rawdev/rte_rawdev.c b/lib/librte_rawdev/rte_rawdev.c
> index bde33763e..6c4d783cc 100644
> --- a/lib/librte_rawdev/rte_rawdev.c
> +++ b/lib/librte_rawdev/rte_rawdev.c
> @@ -104,7 +104,8 @@ rte_rawdev_info_get(uint16_t dev_id, struct
> rte_rawdev_info *dev_info,  }
> 
>  int
> -rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf)
> +rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
> +		size_t dev_private_size)
>  {
>  	struct rte_rawdev *dev;
>  	int diag;
> @@ -123,7 +124,8 @@ rte_rawdev_configure(uint16_t dev_id, struct
> rte_rawdev_info *dev_conf)
>  	}
> 
>  	/* Configure the device */
> -	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf-
> >dev_private);
> +	diag = (*dev->dev_ops->dev_configure)(dev, dev_conf-
> >dev_private,
> +			dev_private_size);
>  	if (diag != 0)
>  		RTE_RDEV_ERR("dev%d dev_configure = %d", dev_id, diag);
>  	else
> diff --git a/lib/librte_rawdev/rte_rawdev.h
> b/lib/librte_rawdev/rte_rawdev.h index cf6acfd26..73e3bd5ae 100644
> --- a/lib/librte_rawdev/rte_rawdev.h
> +++ b/lib/librte_rawdev/rte_rawdev.h
> @@ -116,13 +116,19 @@ rte_rawdev_info_get(uint16_t dev_id, struct
> rte_rawdev_info *dev_info,
>   *   driver/implementation can use to configure the device. It is also assumed
>   *   that once the configuration is done, a `queue_id` type field can be used
>   *   to refer to some arbitrary internal representation of a queue.
> + * @dev_private_size
> + *   The length of the memory space pointed to by dev_private in dev_info.
> + *   This should be set to the size of the expected private structure to be
> + *   used by the driver, and may be checked by drivers to ensure the
> expected
> + *   struct type is provided.
>   *
>   * @return
>   *   - 0: Success, device configured.
>   *   - <0: Error code returned by the driver configuration function.
>   */
>  int
> -rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf);
> +rte_rawdev_configure(uint16_t dev_id, struct rte_rawdev_info *dev_conf,
> +		size_t dev_private_size);
> 
> 
>  /**
> diff --git a/lib/librte_rawdev/rte_rawdev_pmd.h
> b/lib/librte_rawdev/rte_rawdev_pmd.h
> index 89e46412a..050f8b029 100644
> --- a/lib/librte_rawdev/rte_rawdev_pmd.h
> +++ b/lib/librte_rawdev/rte_rawdev_pmd.h
> @@ -160,7 +160,8 @@ typedef int (*rawdev_info_get_t)(struct rte_rawdev
> *dev,
>   *   Returns 0 on success
>   */
>  typedef int (*rawdev_configure_t)(const struct rte_rawdev *dev,
> -				  rte_rawdev_obj_t config);
> +				  rte_rawdev_obj_t config,
> +				  size_t config_size);
> 
>  /**
>   * Start a configured device.
> --
> 2.25.1


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API
  2020-07-12 13:17  3%         ` Olivier Matz
@ 2020-07-12 14:28  0%           ` Bing Zhao
  2020-07-12 14:43  0%             ` Olivier Matz
  0 siblings, 1 reply; 200+ results
From: Bing Zhao @ 2020-07-12 14:28 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Ori Kam, john.mcnamara, marko.kovacevic, Thomas Monjalon,
	ferruh.yigit, arybchenko, akhil.goyal, dev, wenzhuo.lu,
	beilei.xing, bernard.iremonger

Hi Olivier,
Thanks

BR. Bing

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Sunday, July 12, 2020 9:18 PM
> To: Bing Zhao <bingz@mellanox.com>
> Cc: Ori Kam <orika@mellanox.com>; john.mcnamara@intel.com;
> marko.kovacevic@intel.com; Thomas Monjalon
> <thomas@monjalon.net>; ferruh.yigit@intel.com;
> arybchenko@solarflare.com; akhil.goyal@nxp.com; dev@dpdk.org;
> wenzhuo.lu@intel.com; beilei.xing@intel.com;
> bernard.iremonger@intel.com
> Subject: Re: [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API
> 
> Hi Bing,
> 
> On Sat, Jul 11, 2020 at 04:25:49AM +0000, Bing Zhao wrote:
> > Hi Olivier,
> > Many thanks for your comments.
> 
> [...]
> 
> > > > +/**
> > > > + * eCPRI Common Header
> > > > + */
> > > > +RTE_STD_C11
> > > > +struct rte_ecpri_common_hdr {
> > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > +	uint32_t size:16;		/**< Payload Size */
> > > > +	uint32_t type:8;		/**< Message Type */
> > > > +	uint32_t c:1;			/**< Concatenation Indicator
> > > */
> > > > +	uint32_t res:3;			/**< Reserved */
> > > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > > +	uint32_t res:3;			/**< Reserved */
> > > > +	uint32_t c:1;			/**< Concatenation Indicator
> > > */
> > > > +	uint32_t type:8;		/**< Message Type */
> > > > +	uint32_t size:16;		/**< Payload Size */
> > > > +#endif
> > > > +} __rte_packed;
> > >
> > > Does it really need to be packed? Why next types do not need it?
> > > It looks only those which have bitfields are.
> > >
> >
> > Nice catch, thanks. For the common header, there is no need to use
> the
> > packed attribute, because it is a u32 and the bitfields will be
> > aligned.
> > I checked all the definitions again. Only " Type #4: Remote Memory
> Access"
> > needs to use the packed attribute.
> > For other sub-types, "sub-header" part of the message payload will
> get
> > aligned by nature. For example, u16 after u16, u8 after u16, these
> > should be OK.
> > But in type #4, the address is 48bits wide, with 16bits MSB and 32bits
> > LSB (no detailed description in the specification, correct me if
> > anything wrong.) Usually, the 48bits address will be devided as this
> > in a system. And there is no 48-bits type at all. So we need to define
> two parts for it: 32b LSB follows 16b MSB.
> > u32 after u16 should be with packed attribute. Thanks
> 
> What about using a bitfield into a uint64_t ? I mean:
> 
> 	struct rte_ecpri_msg_rm_access {
> if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> 		...
> 		uint64_t length:16;		/**< number of bytes
> */
> 		uint64_t addr:48;		/**< address */
> #else
> 		...
> 		uint64_t addr:48;		/**< address */
> 		uint64_t length:16;		/**< number of bytes
> */
> #endif
> 	};
> 

Thanks for your suggestion.
https://stackoverflow.com/questions/10906238/warning-when-using-bitfield-with-unsigned-char
AFAIK (from this page), bitfields support only support bool and int. uint64_t is some type of "long"
and most of the compilers should support it. But I am not sure if it is a standard implementation.

> 
> >
> > >
> > > I wonder if the 'dw0' could be in this definition instead of in
> > > struct rte_ecpri_msg_hdr?
> > >
> > > Something like this:
> > >
> > > struct rte_ecpri_common_hdr {
> > > 	union {
> > > 		uint32_t u32;
> > > 		struct {
> > > 			...
> > > 		};
> > > 	};
> > > };
> > >
> > > I see 2 advantages:
> > >
> > > - someone that only uses the common_hdr struct can use the .u32
> > >   in its software
> > > - when using it in messages, it looks clearer to me:
> > >     msg.common_hdr.u32 = value;
> > >   instead of:
> > >     msg.dw0 = value;
> > >
> > > What do you think?
> >
> > Thanks for the suggestion, this is much better, I will change it.
> > Indeed, in my original version, no DW(u32) is defined for the header.
> > After that, I noticed that it would not be easy for the static casting
> > to a u32 from bitfield(the compiler will complain), and it would not
> > be clear to swap the endian if the user wants to use this header. I
> > added this DW(u32) to simplify the usage of this header. But yes, if I
> > do not add it here, it would be not easy or clear for users who just
> use this header structure.
> > I will change it. Is it OK if I use the name "dw0"?
> 
> In my opinion, u32 is more usual than dw0.
> 

I sent patch set v6 with this change, thanks.

> >
> > >
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #0: IQ Data  */ struct
> > > > +rte_ecpri_msg_iq_data {
> > > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #1: Bit Sequence  */ struct
> > > > +rte_ecpri_msg_bit_seq {
> > > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #2: Real-Time Control Data  */
> > > struct
> > > > +rte_ecpri_msg_rtc_ctrl {
> > > > +	rte_be16_t rtc_id;		/**< Real-Time Control Data ID
> > > */
> > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #3: Generic Data Transfer  */
> > > struct
> > > > +rte_ecpri_msg_gen_data {
> > > > +	rte_be32_t pc_id;		/**< Physical channel ID */
> > > > +	rte_be32_t seq_id;		/**< Sequence ID */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #4: Remote Memory Access
> */
> > > > +RTE_STD_C11
> > > > +struct rte_ecpri_msg_rm_access {
> > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > +	uint32_t ele_id:16;		/**< Element ID */
> > > > +	uint32_t rr:4;			/**< Req/Resp */
> > > > +	uint32_t rw:4;			/**< Read/Write */
> > > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > > ID */
> > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > > ID */
> > > > +	uint32_t rw:4;			/**< Read/Write */
> > > > +	uint32_t rr:4;			/**< Req/Resp */
> > > > +	uint32_t ele_id:16;		/**< Element ID */
> > > > +#endif
> > > > +	rte_be16_t addr_m;		/**< 48-bits address (16 MSB)
> > > */
> > > > +	rte_be32_t addr_l;		/**< 48-bits address (32 LSB) */
> > > > +	rte_be16_t length;		/**< number of bytes */
> > > > +} __rte_packed;
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #5: One-Way Delay
> Measurement
> > > */
> > > > +struct rte_ecpri_msg_delay_measure {
> > > > +	uint8_t msr_id;			/**< Measurement ID */
> > > > +	uint8_t act_type;		/**< Action Type */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #6: Remote Reset  */ struct
> > > > +rte_ecpri_msg_remote_reset {
> > > > +	rte_be16_t rst_id;		/**< Reset ID */
> > > > +	uint8_t rst_op;			/**< Reset Code Op */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header of Type #7: Event Indication  */ struct
> > > > +rte_ecpri_msg_event_ind {
> > > > +	uint8_t evt_id;			/**< Event ID */
> > > > +	uint8_t evt_type;		/**< Event Type */
> > > > +	uint8_t seq;			/**< Sequence Number */
> > > > +	uint8_t number;			/**< Number of
> > > Faults/Notif */
> > > > +};
> > > > +
> > > > +/**
> > > > + * eCPRI Message Header Format: Common Header + Message
> > > Types  */
> > > > +RTE_STD_C11
> > > > +struct rte_ecpri_msg_hdr {
> > > > +	union {
> > > > +		struct rte_ecpri_common_hdr common;
> > > > +		uint32_t dw0;
> > > > +	};
> > > > +	union {
> > > > +		struct rte_ecpri_msg_iq_data type0;
> > > > +		struct rte_ecpri_msg_bit_seq type1;
> > > > +		struct rte_ecpri_msg_rtc_ctrl type2;
> > > > +		struct rte_ecpri_msg_bit_seq type3;
> > > > +		struct rte_ecpri_msg_rm_access type4;
> > > > +		struct rte_ecpri_msg_delay_measure type5;
> > > > +		struct rte_ecpri_msg_remote_reset type6;
> > > > +		struct rte_ecpri_msg_event_ind type7;
> > > > +		uint32_t dummy[3];
> > > > +	};
> > > > +};
> > >
> > > What is the point in having this struct?
> > >
> > > From a software point of view, I think it is a bit risky, because
> > > its size is the size of the largest message. This is probably what
> > > you want in your case, but when a software will rx or tx such
> > > packet, I think they shouldn't use this one. My understanding is
> > > that you only need this structure for the mask in rte_flow.
> > >
> > > Also, I'm not sure to understand the purpose of dummy[3], even
> after
> > > reading your answer to Akhil's question.
> > >
> >
> > Basically YES and no. To my understanding, the eCPRI message
> format is
> > something like the ICMP packet format. The message (packet) itself
> > will be parsed into different formats based on the type of the
> common
> > header. In the message payload part, there is no distinct definition
> > of the "sub-header". We can divide them into the sub-header and
> data parts based on the specification.
> > E.g. physical channel ID / real-time control ID / Event ID + type are
> > the parts that datapath forwarding will only care about. The
> following
> > timestamp or user data parts are the parts that the higher layer in
> the application will use.
> > 1. If an application wants to create some offload flow, or even
> handle
> > it in the SW, the common header + first several bytes in the payload
> > should be enough. BUT YES, it is not good or safe to use it in the
> higher layer of the application.
> > 2. A higher layer of the application should have its own definition of
> > the whole payload of a specific sub-type, including the parsing of the
> user data part after the "sub-header".
> > It is better for them just skip the first 4 bytes of the eCPRI message or
> a known offset.
> > We do not need to cover the upper layers.
> 
> Let me explain what is my vision of how an application would use the
> structures (these are completly dummy examples, as I don't know
> ecpri protocol at all).
> 
> Rx:
> 
> 	int ecpri_input(struct rte_mbuf *m)
> 	{
> 		struct rte_ecpri_common_hdr hdr_copy, *hdr;
> 		struct rte_ecpri_msg_event_ind event_copy, *event;
> 		struct app_specific app_copy, *app;
> 
> 		hdr = rte_pktmbuf_read(m, 0, sizeof(*hdr),
> &hdr_copy);
> 		if (unlikely(hdr == NULL))
> 			return -1;
> 		switch (hdr->type) {
> 		...
> 		case RTE_ECPRI_EVT_IND_NTFY_IND:
> 			event = rte_pktmbuf_read(m, sizeof(*hdr),
> sizeof(*event),
> 				&event_copy);
> 			if (unlikely(event == NULL))
> 				return -1;
> 			...
> 			app = rte_pktmbuf_read(m, sizeof(*app),
> 				sizeof(*hdr) + sizeof(*event),
> 				&app_copy);
> 			...
> 
> Tx:
> 
> 	int ecpri_output(void)
> 	{
> 		struct rte_ecpri_common_hdr *hdr;
> 		struct rte_ecpri_msg_event_ind *event;
> 		struct app_specific *app;
> 
> 		m = rte_pktmbuf_alloc(mp);
> 		if (unlikely(m == NULL))
> 			return -1;
> 
> 		app = rte_pktmbuf_append(m, sizeof(*data));
> 		if (app == NULL)
> 			...
> 		app->... = ...;
> 		...
> 		event = rte_pktmbuf_prepend(m, sizeof(*event));
> 		if (event == NULL)
> 			...
> 		event->... = ...;
> 		...
> 		hdr = rte_pktmbuf_prepend(m, sizeof(*hdr));
> 		if (hdr == NULL)
> 			...
> 		hdr->... = ...;
> 
> 		return packet_send(m);
> 	}
> 
> In these 2 examples, we never need the unioned structure (struct
> rte_ecpri_msg_hdr).
> 
> Using it does not look possible to me, because its size is corresponds to
> the largest message, not to the one we are parsing/building.
> 

Yes, in the cases, we do not need the unioned structures at all.
Since the common header is always 4 bytes, an application could use the
sub-types structures started from an offset of 4 of eCPRI layer, as in your example.
This is in the datapath. My original purpose is for some "control path", typically
the flow (not sure if any other new lib implementation) API, then the union
could be used there w/o treating the common header and message payload
header in a separate way and then combine them together. In this case, only
the first several bytes will be accessed and checked, there will be no change
of message itself, and then just passing it to datapath for further handling as
in your example.

> > I think some comments could be added here, is it OK?
> > 3. Regarding this structure, I add it because I do not want to
> > introduce a lot of new items in the rte_flow: new items with
> > structures, new enum types. I prefer one single structure will cover
> most of the cases (subtypes). What do you think?
> > 4. About the *dummy* u32, I calculated all the "subheaders" and
> choose
> > the maximal value of the length. Two purposes (same as the u32 in
> the common header):
> >   a. easy to swap the endianness, but not quite useful. Because some
> parts are u16 and u8,
> >     and should not be swapped in a u32. (some physical channel ID
> and address LSB have 32bits width)
> >     But if some HW parsed the header u32 by u32, then it would be
> helpful.
> >   b. easy for checking in flow API, if the user wants to insert a flow.
> Some checking should
> >       be done to confirm if it is wildcard flow (all eCPRI messages or
> eCPRI message in some specific type),
> >       or some precise flow (to match the pc id or rtc id, for example).
> With these fields, 3 DW
> >       of the masks only need to be check before continuing. Or else, the
> code needs to check the type
> >       and a lot of switch-case conditions and go through all different
> members of each header.
> 
> Thanks for the clarification.
> 
> I'll tend to say that if the rte_ecpri_msg_hdr structure is only useful for
> rte_flow, it should be defined inside rte_flow.
> 

Right now, yes it will be used in rte_flow. But I checked the current implementations
of each item, almost all the headers are defined in their own protocol files. So in v6,
I change the name of it, in order not to confuse the users of this API, would it be OK?
Thanks

> However, I have some fears about the dummy[3]. You said it could be
> enlarged later: I think it is dangerous to change the size of a structure
> that may be used to parse data (and this would be an ABI change).
> Also, it seems dummy[3] cannot be used easily to swap endianness, so
> is it really useful?
> 

It might be enlarger but not for now, until a new revision of this specification. For
all the subtypes it has right now, the specification will remain them as same as today.
Only new types will be added then. So after several years, we may consider to change it
in the LTS. Is it OK?
In most cases, the endianness swap could be easy, we will swap it in each DW / u32. Tome
the exception is that some field crosses the u32 boundary, like the address in type#4, we may
treat it separately. And the most useful case is for the mask checking, it could simplify the
checking to at most 3 (u32==0?) without going through each member of different types.

And v6 already sent, I change the code based on your suggestion. Would you please
help to give some comments also?

Appreciate for your help and suggestion.

> 
> Thanks,
> Olivier
> 
> 
> > > > +
> > > > +#ifdef __cplusplus
> > > > +}
> > > > +#endif
> > > > +
> > > > +#endif /* _RTE_ECPRI_H_ */
> > > > diff --git a/lib/librte_net/rte_ether.h
> > > > b/lib/librte_net/rte_ether.h index 0ae4e75..184a3f9 100644
> > > > --- a/lib/librte_net/rte_ether.h
> > > > +++ b/lib/librte_net/rte_ether.h
> > > > @@ -304,6 +304,7 @@ struct rte_vlan_hdr {  #define
> > > RTE_ETHER_TYPE_LLDP
> > > > 0x88CC /**< LLDP Protocol. */  #define RTE_ETHER_TYPE_MPLS
> > > 0x8847 /**<
> > > > MPLS ethertype. */  #define RTE_ETHER_TYPE_MPLSM 0x8848
> /**<
> > > MPLS
> > > > multicast ethertype. */
> > > > +#define RTE_ETHER_TYPE_ECPRI 0xAEFE /**< eCPRI ethertype
> (.1Q
> > > > +supported). */
> > > >
> > > >  /**
> > > >   * Extract VLAN tag information into mbuf
> > > > --
> > > > 1.8.3.1
> > > >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API
  2020-07-12 14:28  0%           ` Bing Zhao
@ 2020-07-12 14:43  0%             ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2020-07-12 14:43 UTC (permalink / raw)
  To: Bing Zhao
  Cc: Ori Kam, john.mcnamara, marko.kovacevic, Thomas Monjalon,
	ferruh.yigit, arybchenko, akhil.goyal, dev, wenzhuo.lu,
	beilei.xing, bernard.iremonger

On Sun, Jul 12, 2020 at 02:28:03PM +0000, Bing Zhao wrote:
> Hi Olivier,
> Thanks
> 
> BR. Bing
> 
> > -----Original Message-----
> > From: Olivier Matz <olivier.matz@6wind.com>
> > Sent: Sunday, July 12, 2020 9:18 PM
> > To: Bing Zhao <bingz@mellanox.com>
> > Cc: Ori Kam <orika@mellanox.com>; john.mcnamara@intel.com;
> > marko.kovacevic@intel.com; Thomas Monjalon
> > <thomas@monjalon.net>; ferruh.yigit@intel.com;
> > arybchenko@solarflare.com; akhil.goyal@nxp.com; dev@dpdk.org;
> > wenzhuo.lu@intel.com; beilei.xing@intel.com;
> > bernard.iremonger@intel.com
> > Subject: Re: [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API
> > 
> > Hi Bing,
> > 
> > On Sat, Jul 11, 2020 at 04:25:49AM +0000, Bing Zhao wrote:
> > > Hi Olivier,
> > > Many thanks for your comments.
> > 
> > [...]
> > 
> > > > > +/**
> > > > > + * eCPRI Common Header
> > > > > + */
> > > > > +RTE_STD_C11
> > > > > +struct rte_ecpri_common_hdr {
> > > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > > +	uint32_t size:16;		/**< Payload Size */
> > > > > +	uint32_t type:8;		/**< Message Type */
> > > > > +	uint32_t c:1;			/**< Concatenation Indicator
> > > > */
> > > > > +	uint32_t res:3;			/**< Reserved */
> > > > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > > +	uint32_t revision:4;		/**< Protocol Revision */
> > > > > +	uint32_t res:3;			/**< Reserved */
> > > > > +	uint32_t c:1;			/**< Concatenation Indicator
> > > > */
> > > > > +	uint32_t type:8;		/**< Message Type */
> > > > > +	uint32_t size:16;		/**< Payload Size */
> > > > > +#endif
> > > > > +} __rte_packed;
> > > >
> > > > Does it really need to be packed? Why next types do not need it?
> > > > It looks only those which have bitfields are.
> > > >
> > >
> > > Nice catch, thanks. For the common header, there is no need to use
> > the
> > > packed attribute, because it is a u32 and the bitfields will be
> > > aligned.
> > > I checked all the definitions again. Only " Type #4: Remote Memory
> > Access"
> > > needs to use the packed attribute.
> > > For other sub-types, "sub-header" part of the message payload will
> > get
> > > aligned by nature. For example, u16 after u16, u8 after u16, these
> > > should be OK.
> > > But in type #4, the address is 48bits wide, with 16bits MSB and 32bits
> > > LSB (no detailed description in the specification, correct me if
> > > anything wrong.) Usually, the 48bits address will be devided as this
> > > in a system. And there is no 48-bits type at all. So we need to define
> > two parts for it: 32b LSB follows 16b MSB.
> > > u32 after u16 should be with packed attribute. Thanks
> > 
> > What about using a bitfield into a uint64_t ? I mean:
> > 
> > 	struct rte_ecpri_msg_rm_access {
> > if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > 		...
> > 		uint64_t length:16;		/**< number of bytes
> > */
> > 		uint64_t addr:48;		/**< address */
> > #else
> > 		...
> > 		uint64_t addr:48;		/**< address */
> > 		uint64_t length:16;		/**< number of bytes
> > */
> > #endif
> > 	};
> > 
> 
> Thanks for your suggestion.
> https://stackoverflow.com/questions/10906238/warning-when-using-bitfield-with-unsigned-char
> AFAIK (from this page), bitfields support only support bool and int. uint64_t is some type of "long"
> and most of the compilers should support it. But I am not sure if it is a standard implementation.

The uint8_t[6], as in your v6, is a good idea.


> > >
> > > >
> > > > I wonder if the 'dw0' could be in this definition instead of in
> > > > struct rte_ecpri_msg_hdr?
> > > >
> > > > Something like this:
> > > >
> > > > struct rte_ecpri_common_hdr {
> > > > 	union {
> > > > 		uint32_t u32;
> > > > 		struct {
> > > > 			...
> > > > 		};
> > > > 	};
> > > > };
> > > >
> > > > I see 2 advantages:
> > > >
> > > > - someone that only uses the common_hdr struct can use the .u32
> > > >   in its software
> > > > - when using it in messages, it looks clearer to me:
> > > >     msg.common_hdr.u32 = value;
> > > >   instead of:
> > > >     msg.dw0 = value;
> > > >
> > > > What do you think?
> > >
> > > Thanks for the suggestion, this is much better, I will change it.
> > > Indeed, in my original version, no DW(u32) is defined for the header.
> > > After that, I noticed that it would not be easy for the static casting
> > > to a u32 from bitfield(the compiler will complain), and it would not
> > > be clear to swap the endian if the user wants to use this header. I
> > > added this DW(u32) to simplify the usage of this header. But yes, if I
> > > do not add it here, it would be not easy or clear for users who just
> > use this header structure.
> > > I will change it. Is it OK if I use the name "dw0"?
> > 
> > In my opinion, u32 is more usual than dw0.
> > 
> 
> I sent patch set v6 with this change, thanks.
> 
> > >
> > > >
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #0: IQ Data  */ struct
> > > > > +rte_ecpri_msg_iq_data {
> > > > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #1: Bit Sequence  */ struct
> > > > > +rte_ecpri_msg_bit_seq {
> > > > > +	rte_be16_t pc_id;		/**< Physical channel ID */
> > > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #2: Real-Time Control Data  */
> > > > struct
> > > > > +rte_ecpri_msg_rtc_ctrl {
> > > > > +	rte_be16_t rtc_id;		/**< Real-Time Control Data ID
> > > > */
> > > > > +	rte_be16_t seq_id;		/**< Sequence ID */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #3: Generic Data Transfer  */
> > > > struct
> > > > > +rte_ecpri_msg_gen_data {
> > > > > +	rte_be32_t pc_id;		/**< Physical channel ID */
> > > > > +	rte_be32_t seq_id;		/**< Sequence ID */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #4: Remote Memory Access
> > */
> > > > > +RTE_STD_C11
> > > > > +struct rte_ecpri_msg_rm_access {
> > > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > > +	uint32_t ele_id:16;		/**< Element ID */
> > > > > +	uint32_t rr:4;			/**< Req/Resp */
> > > > > +	uint32_t rw:4;			/**< Read/Write */
> > > > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > > > ID */
> > > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > > +	uint32_t rma_id:8;		/**< Remote Memory Access
> > > > ID */
> > > > > +	uint32_t rw:4;			/**< Read/Write */
> > > > > +	uint32_t rr:4;			/**< Req/Resp */
> > > > > +	uint32_t ele_id:16;		/**< Element ID */
> > > > > +#endif
> > > > > +	rte_be16_t addr_m;		/**< 48-bits address (16 MSB)
> > > > */
> > > > > +	rte_be32_t addr_l;		/**< 48-bits address (32 LSB) */
> > > > > +	rte_be16_t length;		/**< number of bytes */
> > > > > +} __rte_packed;
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #5: One-Way Delay
> > Measurement
> > > > */
> > > > > +struct rte_ecpri_msg_delay_measure {
> > > > > +	uint8_t msr_id;			/**< Measurement ID */
> > > > > +	uint8_t act_type;		/**< Action Type */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #6: Remote Reset  */ struct
> > > > > +rte_ecpri_msg_remote_reset {
> > > > > +	rte_be16_t rst_id;		/**< Reset ID */
> > > > > +	uint8_t rst_op;			/**< Reset Code Op */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header of Type #7: Event Indication  */ struct
> > > > > +rte_ecpri_msg_event_ind {
> > > > > +	uint8_t evt_id;			/**< Event ID */
> > > > > +	uint8_t evt_type;		/**< Event Type */
> > > > > +	uint8_t seq;			/**< Sequence Number */
> > > > > +	uint8_t number;			/**< Number of
> > > > Faults/Notif */
> > > > > +};
> > > > > +
> > > > > +/**
> > > > > + * eCPRI Message Header Format: Common Header + Message
> > > > Types  */
> > > > > +RTE_STD_C11
> > > > > +struct rte_ecpri_msg_hdr {
> > > > > +	union {
> > > > > +		struct rte_ecpri_common_hdr common;
> > > > > +		uint32_t dw0;
> > > > > +	};
> > > > > +	union {
> > > > > +		struct rte_ecpri_msg_iq_data type0;
> > > > > +		struct rte_ecpri_msg_bit_seq type1;
> > > > > +		struct rte_ecpri_msg_rtc_ctrl type2;
> > > > > +		struct rte_ecpri_msg_bit_seq type3;
> > > > > +		struct rte_ecpri_msg_rm_access type4;
> > > > > +		struct rte_ecpri_msg_delay_measure type5;
> > > > > +		struct rte_ecpri_msg_remote_reset type6;
> > > > > +		struct rte_ecpri_msg_event_ind type7;
> > > > > +		uint32_t dummy[3];
> > > > > +	};
> > > > > +};
> > > >
> > > > What is the point in having this struct?
> > > >
> > > > From a software point of view, I think it is a bit risky, because
> > > > its size is the size of the largest message. This is probably what
> > > > you want in your case, but when a software will rx or tx such
> > > > packet, I think they shouldn't use this one. My understanding is
> > > > that you only need this structure for the mask in rte_flow.
> > > >
> > > > Also, I'm not sure to understand the purpose of dummy[3], even
> > after
> > > > reading your answer to Akhil's question.
> > > >
> > >
> > > Basically YES and no. To my understanding, the eCPRI message
> > format is
> > > something like the ICMP packet format. The message (packet) itself
> > > will be parsed into different formats based on the type of the
> > common
> > > header. In the message payload part, there is no distinct definition
> > > of the "sub-header". We can divide them into the sub-header and
> > data parts based on the specification.
> > > E.g. physical channel ID / real-time control ID / Event ID + type are
> > > the parts that datapath forwarding will only care about. The
> > following
> > > timestamp or user data parts are the parts that the higher layer in
> > the application will use.
> > > 1. If an application wants to create some offload flow, or even
> > handle
> > > it in the SW, the common header + first several bytes in the payload
> > > should be enough. BUT YES, it is not good or safe to use it in the
> > higher layer of the application.
> > > 2. A higher layer of the application should have its own definition of
> > > the whole payload of a specific sub-type, including the parsing of the
> > user data part after the "sub-header".
> > > It is better for them just skip the first 4 bytes of the eCPRI message or
> > a known offset.
> > > We do not need to cover the upper layers.
> > 
> > Let me explain what is my vision of how an application would use the
> > structures (these are completly dummy examples, as I don't know
> > ecpri protocol at all).
> > 
> > Rx:
> > 
> > 	int ecpri_input(struct rte_mbuf *m)
> > 	{
> > 		struct rte_ecpri_common_hdr hdr_copy, *hdr;
> > 		struct rte_ecpri_msg_event_ind event_copy, *event;
> > 		struct app_specific app_copy, *app;
> > 
> > 		hdr = rte_pktmbuf_read(m, 0, sizeof(*hdr),
> > &hdr_copy);
> > 		if (unlikely(hdr == NULL))
> > 			return -1;
> > 		switch (hdr->type) {
> > 		...
> > 		case RTE_ECPRI_EVT_IND_NTFY_IND:
> > 			event = rte_pktmbuf_read(m, sizeof(*hdr),
> > sizeof(*event),
> > 				&event_copy);
> > 			if (unlikely(event == NULL))
> > 				return -1;
> > 			...
> > 			app = rte_pktmbuf_read(m, sizeof(*app),
> > 				sizeof(*hdr) + sizeof(*event),
> > 				&app_copy);
> > 			...
> > 
> > Tx:
> > 
> > 	int ecpri_output(void)
> > 	{
> > 		struct rte_ecpri_common_hdr *hdr;
> > 		struct rte_ecpri_msg_event_ind *event;
> > 		struct app_specific *app;
> > 
> > 		m = rte_pktmbuf_alloc(mp);
> > 		if (unlikely(m == NULL))
> > 			return -1;
> > 
> > 		app = rte_pktmbuf_append(m, sizeof(*data));
> > 		if (app == NULL)
> > 			...
> > 		app->... = ...;
> > 		...
> > 		event = rte_pktmbuf_prepend(m, sizeof(*event));
> > 		if (event == NULL)
> > 			...
> > 		event->... = ...;
> > 		...
> > 		hdr = rte_pktmbuf_prepend(m, sizeof(*hdr));
> > 		if (hdr == NULL)
> > 			...
> > 		hdr->... = ...;
> > 
> > 		return packet_send(m);
> > 	}
> > 
> > In these 2 examples, we never need the unioned structure (struct
> > rte_ecpri_msg_hdr).
> > 
> > Using it does not look possible to me, because its size is corresponds to
> > the largest message, not to the one we are parsing/building.
> > 
> 
> Yes, in the cases, we do not need the unioned structures at all.
> Since the common header is always 4 bytes, an application could use the
> sub-types structures started from an offset of 4 of eCPRI layer, as in your example.
> This is in the datapath. My original purpose is for some "control path", typically
> the flow (not sure if any other new lib implementation) API, then the union
> could be used there w/o treating the common header and message payload
> header in a separate way and then combine them together. In this case, only
> the first several bytes will be accessed and checked, there will be no change
> of message itself, and then just passing it to datapath for further handling as
> in your example.
> 
> > > I think some comments could be added here, is it OK?
> > > 3. Regarding this structure, I add it because I do not want to
> > > introduce a lot of new items in the rte_flow: new items with
> > > structures, new enum types. I prefer one single structure will cover
> > most of the cases (subtypes). What do you think?
> > > 4. About the *dummy* u32, I calculated all the "subheaders" and
> > choose
> > > the maximal value of the length. Two purposes (same as the u32 in
> > the common header):
> > >   a. easy to swap the endianness, but not quite useful. Because some
> > parts are u16 and u8,
> > >     and should not be swapped in a u32. (some physical channel ID
> > and address LSB have 32bits width)
> > >     But if some HW parsed the header u32 by u32, then it would be
> > helpful.
> > >   b. easy for checking in flow API, if the user wants to insert a flow.
> > Some checking should
> > >       be done to confirm if it is wildcard flow (all eCPRI messages or
> > eCPRI message in some specific type),
> > >       or some precise flow (to match the pc id or rtc id, for example).
> > With these fields, 3 DW
> > >       of the masks only need to be check before continuing. Or else, the
> > code needs to check the type
> > >       and a lot of switch-case conditions and go through all different
> > members of each header.
> > 
> > Thanks for the clarification.
> > 
> > I'll tend to say that if the rte_ecpri_msg_hdr structure is only useful for
> > rte_flow, it should be defined inside rte_flow.
> > 
> 
> Right now, yes it will be used in rte_flow. But I checked the current implementations
> of each item, almost all the headers are defined in their own protocol files. So in v6,
> I change the name of it, in order not to confuse the users of this API, would it be OK?
> Thanks

OK

> 
> > However, I have some fears about the dummy[3]. You said it could be
> > enlarged later: I think it is dangerous to change the size of a structure
> > that may be used to parse data (and this would be an ABI change).
> > Also, it seems dummy[3] cannot be used easily to swap endianness, so
> > is it really useful?
> > 
> 
> It might be enlarger but not for now, until a new revision of this specification. For
> all the subtypes it has right now, the specification will remain them as same as today.
> Only new types will be added then. So after several years, we may consider to change it
> in the LTS. Is it OK?

OK, I think in this case it may even be another structure

> In most cases, the endianness swap could be easy, we will swap it in each DW / u32. Tome
> the exception is that some field crosses the u32 boundary, like the address in type#4, we may
> treat it separately. And the most useful case is for the mask checking, it could simplify the
> checking to at most 3 (u32==0?) without going through each member of different types.
> 
> And v6 already sent, I change the code based on your suggestion. Would you please
> help to give some comments also?
> 
> Appreciate for your help and suggestion.
> 
> > 
> > Thanks,
> > Olivier
> > 
> > 
> > > > > +
> > > > > +#ifdef __cplusplus
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > +#endif /* _RTE_ECPRI_H_ */
> > > > > diff --git a/lib/librte_net/rte_ether.h
> > > > > b/lib/librte_net/rte_ether.h index 0ae4e75..184a3f9 100644
> > > > > --- a/lib/librte_net/rte_ether.h
> > > > > +++ b/lib/librte_net/rte_ether.h
> > > > > @@ -304,6 +304,7 @@ struct rte_vlan_hdr {  #define
> > > > RTE_ETHER_TYPE_LLDP
> > > > > 0x88CC /**< LLDP Protocol. */  #define RTE_ETHER_TYPE_MPLS
> > > > 0x8847 /**<
> > > > > MPLS ethertype. */  #define RTE_ETHER_TYPE_MPLSM 0x8848
> > /**<
> > > > MPLS
> > > > > multicast ethertype. */
> > > > > +#define RTE_ETHER_TYPE_ECPRI 0xAEFE /**< eCPRI ethertype
> > (.1Q
> > > > > +supported). */
> > > > >
> > > > >  /**
> > > > >   * Extract VLAN tag information into mbuf
> > > > > --
> > > > > 1.8.3.1
> > > > >
> > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] The mbuf API needs some cleaning up
@ 2020-07-13  9:57  3% Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2020-07-13  9:57 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev

The MBUF library exposes some macros and constants without the RTE_ prefix. I propose cleaning up these, so better names get into the coming LTS release.

The worst is:
#define MBUF_INVALID_PORT UINT16_MAX

I say it's the worst because when we were looking for the official "invalid" port value for our application, we didn't find this one. (Probably because its documentation is wrong.)

MBUF_INVALID_PORT is defined in rte_mbuf_core.h without any description, and in rte_mbuf.h, where it is injected between the rte_pktmbuf_reset() function and its description, so the API documentation shows the function's description for the constant, and no description for the function.

I propose keeping it at a sensible location in rte_mbuf_core.h only, adding a description, and renaming it to:
#define RTE_PORT_INVALID UINT16_MAX

For backwards compatibility, we could add:
/* this old name is deprecated */
#define MBUF_INVALID_PORT RTE_PORT_INVALID

I also wonder why there are no compiler warnings about the double definition?

There are also the data buffer location constants:
#define EXT_ATTACHED_MBUF    (1ULL << 61)
and
#define IND_ATTACHED_MBUF    (1ULL << 62)

There are already macros (with good names) for reading these, so simply adding the RTE_ prefix to these two constants suffices.

And all the packet offload flags, such as:
#define PKT_RX_VLAN          (1ULL << 0)

They are supposed to be used by applications, so I guess we should keep them unchanged for ABI stability reasons.

And the local macro:
#define MBUF_RAW_ALLOC_CHECK(m) do { \

This might as well be an internal inline function:
/* internal */
static inline void
__rte_mbuf_raw_alloc_check(const struct rte_mbuf *m)

Or we could keep it a macro and move it next to
__rte_mbuf_sanity_check(), keeping it clear that it is only relevant when RTE_LIBRTE_MBUF_DEBUG is set. But rename it to lower case, similar to the __rte_mbuf_sanity_check() macro.

Med venlig hilsen / kind regards
- Morten Brørup

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 2/2] doc: add deprecation notice for change of rawdev APIs
  @ 2020-07-13 12:31  5% ` Bruce Richardson
  2020-07-13 12:48  5%   ` Hemant Agrawal
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2020-07-13 12:31 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, Nipun Gupta, Hemant Agrawal

Add to the documentation for 20.08 a notice about the changes of rawdev
APIs proposed by patchset [1].

[1] http://inbox.dpdk.org/dev/20200709152047.167730-1-bruce.richardson@intel.com/

Cc: Nipun Gupta <nipun.gupta@nxp.com>
Cc: Hemant Agrawal <hemant.agrawal@nxp.com>

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index ead7cbe43..21b00103e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -117,6 +117,13 @@ Deprecation Notices
   break the ABI checks, that is why change is planned for 20.11.
   The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
 
+* rawdev: The rawdev APIs which take a device-specific structure as
+  parameter directly, or indirectly via a "private" pointer inside another
+  structure, will be modified to take an additional parameter of the
+  structure size. The affected APIs will include ``rte_rawdev_info_get``,
+  ``rte_rawdev_configure``, ``rte_rawdev_queue_conf_get`` and
+  ``rte_rawdev_queue_setup``.
+
 * traffic manager: All traffic manager API's in ``rte_tm.h`` were mistakenly made
   ABI stable in the v19.11 release. The TM maintainer and other contributors have
   agreed to keep the TM APIs as experimental in expectation of additional spec
-- 
2.25.1


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH 2/2] doc: add deprecation notice for change of rawdev APIs
  2020-07-13 12:31  5% ` [dpdk-dev] [PATCH 2/2] doc: add deprecation notice for change of rawdev APIs Bruce Richardson
@ 2020-07-13 12:48  5%   ` Hemant Agrawal
  0 siblings, 0 replies; 200+ results
From: Hemant Agrawal @ 2020-07-13 12:48 UTC (permalink / raw)
  To: Bruce Richardson, dev; +Cc: Nipun Gupta

Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>

-----Original Message-----
From: Bruce Richardson <bruce.richardson@intel.com> 
Sent: Monday, July 13, 2020 6:01 PM
To: dev@dpdk.org
Cc: Bruce Richardson <bruce.richardson@intel.com>; Nipun Gupta <nipun.gupta@nxp.com>; Hemant Agrawal <hemant.agrawal@nxp.com>
Subject: [PATCH 2/2] doc: add deprecation notice for change of rawdev APIs
Importance: High

Add to the documentation for 20.08 a notice about the changes of rawdev APIs proposed by patchset [1].

[1] https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Finbox.dpdk.org%2Fdev%2F20200709152047.167730-1-bruce.richardson%40intel.com%2F&amp;data=02%7C01%7Chemant.agrawal%40nxp.com%7C0f08b692d7e1471fa69908d82728a944%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637302402871919863&amp;sdata=MfNlKHVOUk%2FRViCJBqSnyJRB%2BnnKpPeViN6OiCb9MJA%3D&amp;reserved=0

Cc: Nipun Gupta <nipun.gupta@nxp.com>
Cc: Hemant Agrawal <hemant.agrawal@nxp.com>

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/rel_notes/deprecation.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index ead7cbe43..21b00103e 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -117,6 +117,13 @@ Deprecation Notices
   break the ABI checks, that is why change is planned for 20.11.
   The list of internal APIs are mainly ones listed in ``rte_ethdev_driver.h``.
 
+* rawdev: The rawdev APIs which take a device-specific structure as
+  parameter directly, or indirectly via a "private" pointer inside 
+another
+  structure, will be modified to take an additional parameter of the
+  structure size. The affected APIs will include 
+``rte_rawdev_info_get``,
+  ``rte_rawdev_configure``, ``rte_rawdev_queue_conf_get`` and
+  ``rte_rawdev_queue_setup``.
+
 * traffic manager: All traffic manager API's in ``rte_tm.h`` were mistakenly made
   ABI stable in the v19.11 release. The TM maintainer and other contributors have
   agreed to keep the TM APIs as experimental in expectation of additional spec
--
2.25.1


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2] mempool/ring: add support for new ring sync modes
  @ 2020-07-13 15:00  3%                 ` Olivier Matz
  2020-07-13 16:29  0%                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2020-07-13 15:00 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev, arybchenko, jielong.zjl, Eads, Gage

On Mon, Jul 13, 2020 at 02:46:35PM +0000, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
> > Hi Konstantin,
> > 
> > On Fri, Jul 10, 2020 at 03:20:12PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > >
> > > > Hi Olivier,
> > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On Thu, Jul 09, 2020 at 05:55:30PM +0000, Ananyev, Konstantin wrote:
> > > > > > Hi Olivier,
> > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Mon, Jun 29, 2020 at 05:10:24PM +0100, Konstantin Ananyev wrote:
> > > > > > > > v2:
> > > > > > > >  - update Release Notes (as per comments)
> > > > > > > >
> > > > > > > > Two new sync modes were introduced into rte_ring:
> > > > > > > > relaxed tail sync (RTS) and head/tail sync (HTS).
> > > > > > > > This change provides user with ability to select these
> > > > > > > > modes for ring based mempool via mempool ops API.
> > > > > > > >
> > > > > > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > > > > > Acked-by: Gage Eads <gage.eads@intel.com>
> > > > > > > > ---
> > > > > > > >  doc/guides/rel_notes/release_20_08.rst  |  6 ++
> > > > > > > >  drivers/mempool/ring/rte_mempool_ring.c | 97 ++++++++++++++++++++++---
> > > > > > > >  2 files changed, 94 insertions(+), 9 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > index eaaf11c37..7bdcf3aac 100644
> > > > > > > > --- a/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > +++ b/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > @@ -84,6 +84,12 @@ New Features
> > > > > > > >    * Dump ``rte_flow`` memory consumption.
> > > > > > > >    * Measure packet per second forwarding.
> > > > > > > >
> > > > > > > > +* **Added support for new sync modes into mempool ring driver.**
> > > > > > > > +
> > > > > > > > +  Added ability to select new ring synchronisation modes:
> > > > > > > > +  ``relaxed tail sync (ring_mt_rts)`` and ``head/tail sync (ring_mt_hts)``
> > > > > > > > +  via mempool ops API.
> > > > > > > > +
> > > > > > > >
> > > > > > > >  Removed Items
> > > > > > > >  -------------
> > > > > > > > diff --git a/drivers/mempool/ring/rte_mempool_ring.c b/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > index bc123fc52..15ec7dee7 100644
> > > > > > > > --- a/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > +++ b/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > @@ -25,6 +25,22 @@ common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > >  			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static int
> > > > > > > > +rts_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > > +	unsigned int n)
> > > > > > > > +{
> > > > > > > > +	return rte_ring_mp_rts_enqueue_bulk(mp->pool_data,
> > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int
> > > > > > > > +hts_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > > +	unsigned int n)
> > > > > > > > +{
> > > > > > > > +	return rte_ring_mp_hts_enqueue_bulk(mp->pool_data,
> > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static int
> > > > > > > >  common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
> > > > > > > >  {
> > > > > > > > @@ -39,17 +55,30 @@ common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
> > > > > > > >  			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static int
> > > > > > > > +rts_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned int n)
> > > > > > > > +{
> > > > > > > > +	return rte_ring_mc_rts_dequeue_bulk(mp->pool_data,
> > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int
> > > > > > > > +hts_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned int n)
> > > > > > > > +{
> > > > > > > > +	return rte_ring_mc_hts_dequeue_bulk(mp->pool_data,
> > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static unsigned
> > > > > > > >  common_ring_get_count(const struct rte_mempool *mp)
> > > > > > > >  {
> > > > > > > >  	return rte_ring_count(mp->pool_data);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > -
> > > > > > > >  static int
> > > > > > > > -common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > +ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
> > > > > > > >  {
> > > > > > > > -	int rg_flags = 0, ret;
> > > > > > > > +	int ret;
> > > > > > > >  	char rg_name[RTE_RING_NAMESIZE];
> > > > > > > >  	struct rte_ring *r;
> > > > > > > >
> > > > > > > > @@ -60,12 +89,6 @@ common_ring_alloc(struct rte_mempool *mp)
> > > > > > > >  		return -rte_errno;
> > > > > > > >  	}
> > > > > > > >
> > > > > > > > -	/* ring flags */
> > > > > > > > -	if (mp->flags & MEMPOOL_F_SP_PUT)
> > > > > > > > -		rg_flags |= RING_F_SP_ENQ;
> > > > > > > > -	if (mp->flags & MEMPOOL_F_SC_GET)
> > > > > > > > -		rg_flags |= RING_F_SC_DEQ;
> > > > > > > > -
> > > > > > > >  	/*
> > > > > > > >  	 * Allocate the ring that will be used to store objects.
> > > > > > > >  	 * Ring functions will return appropriate errors if we are
> > > > > > > > @@ -82,6 +105,40 @@ common_ring_alloc(struct rte_mempool *mp)
> > > > > > > >  	return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static int
> > > > > > > > +common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > +{
> > > > > > > > +	uint32_t rg_flags;
> > > > > > > > +
> > > > > > > > +	rg_flags = 0;
> > > > > > >
> > > > > > > Maybe it could go on the same line
> > > > > > >
> > > > > > > > +
> > > > > > > > +	/* ring flags */
> > > > > > >
> > > > > > > Not sure we need to keep this comment
> > > > > > >
> > > > > > > > +	if (mp->flags & MEMPOOL_F_SP_PUT)
> > > > > > > > +		rg_flags |= RING_F_SP_ENQ;
> > > > > > > > +	if (mp->flags & MEMPOOL_F_SC_GET)
> > > > > > > > +		rg_flags |= RING_F_SC_DEQ;
> > > > > > > > +
> > > > > > > > +	return ring_alloc(mp, rg_flags);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int
> > > > > > > > +rts_ring_alloc(struct rte_mempool *mp)
> > > > > > > > +{
> > > > > > > > +	if ((mp->flags & (MEMPOOL_F_SP_PUT | MEMPOOL_F_SC_GET)) != 0)
> > > > > > > > +		return -EINVAL;
> > > > > > >
> > > > > > > Why do we need this? It is a problem to allow sc/sp in this mode (even
> > > > > > > if it's not optimal)?
> > > > > >
> > > > > > These new sync modes (RTS, HTS) are for MT.
> > > > > > For SP/SC - there is simply no point to use MT sync modes.
> > > > > > I suppose there are few choices:
> > > > > > 1. Make F_SP_PUT/F_SC_GET flags silently override expected ops behaviour
> > > > > >    and create actual ring with ST sync mode for prod/cons.
> > > > > > 2. Report an error.
> > > > > > 3. Silently ignore these flags.
> > > > > >
> > > > > > As I can see for  "ring_mp_mc" ops, we doing #1,
> > > > > > while for "stack" we are doing #3.
> > > > > > For RTS/HTS I chosoe #2, as it seems cleaner to me.
> > > > > > Any thoughts from your side what preferable behaviour should be?
> > > > >
> > > > > The F_SP_PUT/F_SC_GET are only used in rte_mempool_create() to select
> > > > > the default ops among (ring_sp_sc, ring_mp_sc, ring_sp_mc,
> > > > > ring_mp_mc).
> > > >
> > > > As I understand, nothing prevents user from doing:
> > > >
> > > > mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
> > > >                  sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
> > >
> > > Apologies, hit send accidently.
> > > I meant user can do:
> > >
> > > mp = rte_mempool_create_empty(..., F_SP_PUT | F_SC_GET);
> > > rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
> > >
> > > An in that case, he'll get SP/SC ring underneath.
> > 
> > It looks it's not the case. Since commit 449c49b93a6b ("mempool: support
> > handler operations"), the flags SP_PUT/SC_GET are converted into a call
> > to rte_mempool_set_ops_byname() in rte_mempool_create() only.
> > 
> > In rte_mempool_create_empty(), these flags are ignored. It is expected
> > that the user calls rte_mempool_set_ops_byname() by itself.
> 
> As I understand the code - not exactly.
> rte_mempool_create_empty() doesn't make any specific actions based on 'flags' value,
> but it does store it's value inside mp->flags.
> Later, when mempool_ops_alloc_once() is called these flags will be used by
> common_ring_alloc() and might override selected by ops ring behaviour.
> 
> > 
> > I don't think it is a good behavior:
> > 
> > 1/ The documentation of rte_mempool_create_empty() does not say that the
> >    flags are ignored, and a user can expect that F_SP_PUT | F_SC_GET
> >    sets the default ops like rte_mempool_create().
> > 
> > 2/ If rte_mempool_set_ops_byname() is not called after
> >    rte_mempool_create_empty() (and it looks it happens in dpdk's code),
> >    the default ops are the ones registered at index 0. This depends on
> >    the link order.
> > 
> > So I propose to move the following code in
> > rte_mempool_create_empty().
> > 
> > 	if ((flags & MEMPOOL_F_SP_PUT) && (flags & MEMPOOL_F_SC_GET))
> > 		ret = rte_mempool_set_ops_byname(mp, "ring_sp_sc", NULL);
> > 	else if (flags & MEMPOOL_F_SP_PUT)
> > 		ret = rte_mempool_set_ops_byname(mp, "ring_sp_mc", NULL);
> > 	else if (flags & MEMPOOL_F_SC_GET)
> > 		ret = rte_mempool_set_ops_byname(mp, "ring_mp_sc", NULL);
> > 	else
> > 		ret = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
> > 
> > What do you think?
> 
> I think it will be a good thing - as in that case we'll always have
> "ring_mp_mc" selected as default one.
> As another thought, it porbably would be good to deprecate and later remove
> MEMPOOL_F_SP_PUT and MEMPOOL_F_SC_GET completely.
> These days user can select this behaviour via mempool ops and such dualism
> just makes things more error-prone and harder to maintain.
> Especially as we don't have clear policy what should be the higher priority
> for sync mode selection: mempool ops or flags. 
> 

I'll tend to agree, however it would mean deprecate rte_mempool_create()
too, because we wouldn't be able to set ops with it. Or we would have to
add a 12th (!) argument to the function, to set the ops name.

I don't like having that many arguments to this function, but it seems
it is widely used, probably because it is just one function call (vs
create_empty + set_ops + populate). So adding a "ops_name" argument is
maybe the right thing to do, given we can keep abi compat.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] mempool/ring: add support for new ring sync modes
  2020-07-13 15:00  3%                 ` Olivier Matz
@ 2020-07-13 16:29  0%                   ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2020-07-13 16:29 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, arybchenko, jielong.zjl, Eads, Gage


> On Mon, Jul 13, 2020 at 02:46:35PM +0000, Ananyev, Konstantin wrote:
> > Hi Olivier,
> >
> > > Hi Konstantin,
> > >
> > > On Fri, Jul 10, 2020 at 03:20:12PM +0000, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > >
> > > > > Hi Olivier,
> > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > On Thu, Jul 09, 2020 at 05:55:30PM +0000, Ananyev, Konstantin wrote:
> > > > > > > Hi Olivier,
> > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > On Mon, Jun 29, 2020 at 05:10:24PM +0100, Konstantin Ananyev wrote:
> > > > > > > > > v2:
> > > > > > > > >  - update Release Notes (as per comments)
> > > > > > > > >
> > > > > > > > > Two new sync modes were introduced into rte_ring:
> > > > > > > > > relaxed tail sync (RTS) and head/tail sync (HTS).
> > > > > > > > > This change provides user with ability to select these
> > > > > > > > > modes for ring based mempool via mempool ops API.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > > > > > > Acked-by: Gage Eads <gage.eads@intel.com>
> > > > > > > > > ---
> > > > > > > > >  doc/guides/rel_notes/release_20_08.rst  |  6 ++
> > > > > > > > >  drivers/mempool/ring/rte_mempool_ring.c | 97 ++++++++++++++++++++++---
> > > > > > > > >  2 files changed, 94 insertions(+), 9 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > > index eaaf11c37..7bdcf3aac 100644
> > > > > > > > > --- a/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > > +++ b/doc/guides/rel_notes/release_20_08.rst
> > > > > > > > > @@ -84,6 +84,12 @@ New Features
> > > > > > > > >    * Dump ``rte_flow`` memory consumption.
> > > > > > > > >    * Measure packet per second forwarding.
> > > > > > > > >
> > > > > > > > > +* **Added support for new sync modes into mempool ring driver.**
> > > > > > > > > +
> > > > > > > > > +  Added ability to select new ring synchronisation modes:
> > > > > > > > > +  ``relaxed tail sync (ring_mt_rts)`` and ``head/tail sync (ring_mt_hts)``
> > > > > > > > > +  via mempool ops API.
> > > > > > > > > +
> > > > > > > > >
> > > > > > > > >  Removed Items
> > > > > > > > >  -------------
> > > > > > > > > diff --git a/drivers/mempool/ring/rte_mempool_ring.c b/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > > index bc123fc52..15ec7dee7 100644
> > > > > > > > > --- a/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > > +++ b/drivers/mempool/ring/rte_mempool_ring.c
> > > > > > > > > @@ -25,6 +25,22 @@ common_ring_sp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > > >  			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static int
> > > > > > > > > +rts_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > > > +	unsigned int n)
> > > > > > > > > +{
> > > > > > > > > +	return rte_ring_mp_rts_enqueue_bulk(mp->pool_data,
> > > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int
> > > > > > > > > +hts_ring_mp_enqueue(struct rte_mempool *mp, void * const *obj_table,
> > > > > > > > > +	unsigned int n)
> > > > > > > > > +{
> > > > > > > > > +	return rte_ring_mp_hts_enqueue_bulk(mp->pool_data,
> > > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static int
> > > > > > > > >  common_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
> > > > > > > > >  {
> > > > > > > > > @@ -39,17 +55,30 @@ common_ring_sc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned n)
> > > > > > > > >  			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static int
> > > > > > > > > +rts_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned int n)
> > > > > > > > > +{
> > > > > > > > > +	return rte_ring_mc_rts_dequeue_bulk(mp->pool_data,
> > > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int
> > > > > > > > > +hts_ring_mc_dequeue(struct rte_mempool *mp, void **obj_table, unsigned int n)
> > > > > > > > > +{
> > > > > > > > > +	return rte_ring_mc_hts_dequeue_bulk(mp->pool_data,
> > > > > > > > > +			obj_table, n, NULL) == 0 ? -ENOBUFS : 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static unsigned
> > > > > > > > >  common_ring_get_count(const struct rte_mempool *mp)
> > > > > > > > >  {
> > > > > > > > >  	return rte_ring_count(mp->pool_data);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > -
> > > > > > > > >  static int
> > > > > > > > > -common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > > +ring_alloc(struct rte_mempool *mp, uint32_t rg_flags)
> > > > > > > > >  {
> > > > > > > > > -	int rg_flags = 0, ret;
> > > > > > > > > +	int ret;
> > > > > > > > >  	char rg_name[RTE_RING_NAMESIZE];
> > > > > > > > >  	struct rte_ring *r;
> > > > > > > > >
> > > > > > > > > @@ -60,12 +89,6 @@ common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > >  		return -rte_errno;
> > > > > > > > >  	}
> > > > > > > > >
> > > > > > > > > -	/* ring flags */
> > > > > > > > > -	if (mp->flags & MEMPOOL_F_SP_PUT)
> > > > > > > > > -		rg_flags |= RING_F_SP_ENQ;
> > > > > > > > > -	if (mp->flags & MEMPOOL_F_SC_GET)
> > > > > > > > > -		rg_flags |= RING_F_SC_DEQ;
> > > > > > > > > -
> > > > > > > > >  	/*
> > > > > > > > >  	 * Allocate the ring that will be used to store objects.
> > > > > > > > >  	 * Ring functions will return appropriate errors if we are
> > > > > > > > > @@ -82,6 +105,40 @@ common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > >  	return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static int
> > > > > > > > > +common_ring_alloc(struct rte_mempool *mp)
> > > > > > > > > +{
> > > > > > > > > +	uint32_t rg_flags;
> > > > > > > > > +
> > > > > > > > > +	rg_flags = 0;
> > > > > > > >
> > > > > > > > Maybe it could go on the same line
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +	/* ring flags */
> > > > > > > >
> > > > > > > > Not sure we need to keep this comment
> > > > > > > >
> > > > > > > > > +	if (mp->flags & MEMPOOL_F_SP_PUT)
> > > > > > > > > +		rg_flags |= RING_F_SP_ENQ;
> > > > > > > > > +	if (mp->flags & MEMPOOL_F_SC_GET)
> > > > > > > > > +		rg_flags |= RING_F_SC_DEQ;
> > > > > > > > > +
> > > > > > > > > +	return ring_alloc(mp, rg_flags);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int
> > > > > > > > > +rts_ring_alloc(struct rte_mempool *mp)
> > > > > > > > > +{
> > > > > > > > > +	if ((mp->flags & (MEMPOOL_F_SP_PUT | MEMPOOL_F_SC_GET)) != 0)
> > > > > > > > > +		return -EINVAL;
> > > > > > > >
> > > > > > > > Why do we need this? It is a problem to allow sc/sp in this mode (even
> > > > > > > > if it's not optimal)?
> > > > > > >
> > > > > > > These new sync modes (RTS, HTS) are for MT.
> > > > > > > For SP/SC - there is simply no point to use MT sync modes.
> > > > > > > I suppose there are few choices:
> > > > > > > 1. Make F_SP_PUT/F_SC_GET flags silently override expected ops behaviour
> > > > > > >    and create actual ring with ST sync mode for prod/cons.
> > > > > > > 2. Report an error.
> > > > > > > 3. Silently ignore these flags.
> > > > > > >
> > > > > > > As I can see for  "ring_mp_mc" ops, we doing #1,
> > > > > > > while for "stack" we are doing #3.
> > > > > > > For RTS/HTS I chosoe #2, as it seems cleaner to me.
> > > > > > > Any thoughts from your side what preferable behaviour should be?
> > > > > >
> > > > > > The F_SP_PUT/F_SC_GET are only used in rte_mempool_create() to select
> > > > > > the default ops among (ring_sp_sc, ring_mp_sc, ring_sp_mc,
> > > > > > ring_mp_mc).
> > > > >
> > > > > As I understand, nothing prevents user from doing:
> > > > >
> > > > > mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
> > > > >                  sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
> > > >
> > > > Apologies, hit send accidently.
> > > > I meant user can do:
> > > >
> > > > mp = rte_mempool_create_empty(..., F_SP_PUT | F_SC_GET);
> > > > rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
> > > >
> > > > An in that case, he'll get SP/SC ring underneath.
> > >
> > > It looks it's not the case. Since commit 449c49b93a6b ("mempool: support
> > > handler operations"), the flags SP_PUT/SC_GET are converted into a call
> > > to rte_mempool_set_ops_byname() in rte_mempool_create() only.
> > >
> > > In rte_mempool_create_empty(), these flags are ignored. It is expected
> > > that the user calls rte_mempool_set_ops_byname() by itself.
> >
> > As I understand the code - not exactly.
> > rte_mempool_create_empty() doesn't make any specific actions based on 'flags' value,
> > but it does store it's value inside mp->flags.
> > Later, when mempool_ops_alloc_once() is called these flags will be used by
> > common_ring_alloc() and might override selected by ops ring behaviour.
> >
> > >
> > > I don't think it is a good behavior:
> > >
> > > 1/ The documentation of rte_mempool_create_empty() does not say that the
> > >    flags are ignored, and a user can expect that F_SP_PUT | F_SC_GET
> > >    sets the default ops like rte_mempool_create().
> > >
> > > 2/ If rte_mempool_set_ops_byname() is not called after
> > >    rte_mempool_create_empty() (and it looks it happens in dpdk's code),
> > >    the default ops are the ones registered at index 0. This depends on
> > >    the link order.
> > >
> > > So I propose to move the following code in
> > > rte_mempool_create_empty().
> > >
> > > 	if ((flags & MEMPOOL_F_SP_PUT) && (flags & MEMPOOL_F_SC_GET))
> > > 		ret = rte_mempool_set_ops_byname(mp, "ring_sp_sc", NULL);
> > > 	else if (flags & MEMPOOL_F_SP_PUT)
> > > 		ret = rte_mempool_set_ops_byname(mp, "ring_sp_mc", NULL);
> > > 	else if (flags & MEMPOOL_F_SC_GET)
> > > 		ret = rte_mempool_set_ops_byname(mp, "ring_mp_sc", NULL);
> > > 	else
> > > 		ret = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
> > >
> > > What do you think?
> >
> > I think it will be a good thing - as in that case we'll always have
> > "ring_mp_mc" selected as default one.
> > As another thought, it porbably would be good to deprecate and later remove
> > MEMPOOL_F_SP_PUT and MEMPOOL_F_SC_GET completely.
> > These days user can select this behaviour via mempool ops and such dualism
> > just makes things more error-prone and harder to maintain.
> > Especially as we don't have clear policy what should be the higher priority
> > for sync mode selection: mempool ops or flags.
> >
> 
> I'll tend to agree, however it would mean deprecate rte_mempool_create()
> too, because we wouldn't be able to set ops with it. Or we would have to
> add a 12th (!) argument to the function, to set the ops name.
> 
> I don't like having that many arguments to this function, but it seems
> it is widely used, probably because it is just one function call (vs
> create_empty + set_ops + populate). So adding a "ops_name" argument is
> maybe the right thing to do, given we can keep abi compat.

My thought was - just keep rte_mempool_create()
parameter list as it is, and always set ops to "ring_mp_mc" for it.
Users who'd like some other ops would force to use
create_empty+set_ops+populate.
That's pretty much the same what we have right now,
the only diff will be ring with SP/SC mode.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 00/10] rename blacklist/whitelist to block/allow
  2020-07-10 15:06  3% ` David Marchand
@ 2020-07-14  4:43  0%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-14  4:43 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, techboard, Luca Boccassi, Mcnamara, John

On Fri, 10 Jul 2020 17:06:11 +0200
David Marchand <david.marchand@redhat.com> wrote:

> On Sat, Jun 13, 2020 at 2:01 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > The terms blacklist and whitelist are often seen as reminders
> > of the divisions in society. Instead, use more exact terms for
> > handling of which devices are used in DPDK.
> >
> > This is a proposed change for DPDK 20.08 to replace the names
> > blacklist and whitelist in API and command lines.
> >
> > The first three patches fix some other unnecessary use of
> > blacklist/whitelist and have no user visible impact.
> >
> > The rest change the PCI blacklist to be blocklist and
> > whitelist to be allowlist.  
> 
> Thanks for working on this.
> 
> I agree, the first patches can go in right now.
> 
> But I have some concerns about the rest.
> 
> New options in EAL are not consistent with "allow"/"block" list:
> +    "b:" /* pci-skip-probe */
> +    "w:" /* pci-only-probe */
> +#define OPT_PCI_SKIP_PROBE     "pci-skip-probe"
> +    OPT_PCI_SKIP_PROBE_NUM  = 'b',
> +#define OPT_PCI_ONLY_PROBE     "pci-only-probe"
> +    OPT_PCI_ONLY_PROBE_NUM  = 'w',
> 
> The CI flagged the series as failing, because the unit test for EAL
> flags is unaligned:
> +#define pci_allowlist "--pci-allowlist"
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/348668299#L5657
> 
> 
> The ABI check complains about the enum update:
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/348668301#L2400
> Either we deal with this, or we need a libabigail exception rule.
> 
> 
> About deprecating existing API/EAL flags in this release, this should
> go through the standard deprecation process.
> I would go with introducing new options + full compatibility + a
> deprecation notice in the 20.08 release.
> The actual deprecation/API flagging will go in 20.11.
> Removal will come later.
> 
> 

The next version will use different flags, and the old flags will cause
runtime deprecation warning.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 09/11] doc: add note about blacklist/whitelist changes
  @ 2020-07-14  5:39  4%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-14  5:39 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Luca Boccassi

The blacklist/whitelist changes to API will not be a breaking
change for applications in this release but worth adding a note
to encourage migration.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
---
 doc/guides/rel_notes/release_20_08.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index f19b748728e4..b9509f657b30 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -261,6 +261,12 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* eal: The definitions related to including and excluding devices
+  has been changed from blacklist/whitelist to include/exclude.
+  There are compatiablity macros and command line mapping to accept
+  the old values but applications and scripts are strongly encouraged
+  to migrate to the new names.
+
 
 ABI Changes
 -----------
-- 
2.26.2


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] devtools: give some hints for ABI errors
  2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
                   ` (3 preceding siblings ...)
  2020-07-10 10:58  4% ` Neil Horman
@ 2020-07-15 12:15 25% ` David Marchand
  2020-07-15 12:48  4%   ` Aaron Conole
  4 siblings, 1 reply; 200+ results
From: David Marchand @ 2020-07-15 12:15 UTC (permalink / raw)
  To: dev; +Cc: thomas, mdr, nhorman, dodji, aconole

abidiff can provide some more information about the ABI difference it
detected.
In all cases, a discussion on the mailing must happen but we can give
some hints to know if this is a problem with the script calling abidiff,
a potential ABI breakage or an unambiguous ABI breakage.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
---
Changes since v1:
- used arithmetic test,
- updated error message for generic errors,

---
 devtools/check-abi.sh | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/devtools/check-abi.sh b/devtools/check-abi.sh
index e17fedbd9f..172e934382 100755
--- a/devtools/check-abi.sh
+++ b/devtools/check-abi.sh
@@ -50,10 +50,22 @@ for dump in $(find $refdir -name "*.dump"); do
 		error=1
 		continue
 	fi
-	if ! abidiff $ABIDIFF_OPTIONS $dump $dump2; then
+	abidiff $ABIDIFF_OPTIONS $dump $dump2 || {
+		abiret=$?
 		echo "Error: ABI issue reported for 'abidiff $ABIDIFF_OPTIONS $dump $dump2'"
 		error=1
-	fi
+		echo
+		if [ $(($abiret & 3)) -ne 0 ]; then
+			echo "ABIDIFF_ERROR|ABIDIFF_USAGE_ERROR, this could be a script or environment issue."
+		fi
+		if [ $(($abiret & 4)) -ne 0 ]; then
+			echo "ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue)."
+		fi
+		if [ $(($abiret & 8)) -ne 0 ]; then
+			echo "ABIDIFF_ABI_INCOMPATIBLE_CHANGE, this change breaks the ABI."
+		fi
+		echo
+	}
 done
 
 [ -z "$error" ] || [ -n "$warnonly" ]
-- 
2.23.0


^ permalink raw reply	[relevance 25%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
@ 2020-07-15 12:29  0%       ` David Marchand
  2020-07-15 12:49  0%         ` Aaron Conole
                           ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: David Marchand @ 2020-07-15 12:29 UTC (permalink / raw)
  To: Phil Yang
  Cc: Olivier Matz, dev, Stephen Hemminger, David Christensen,
	Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli, Aaron Conole

On Thu, Jul 9, 2020 at 5:59 PM Phil Yang <phil.yang@arm.com> wrote:
>
> Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> ops which enforce unnecessary barriers on aarch64.
>
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v4:
> 1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
> to avoid ABI breakage. (Olivier)
> 2. Add notice of refcnt_atomic deprecation. (Honnappa)

v4 does not pass the checks (in both my env, and Travis).
https://travis-ci.com/github/ovsrobot/dpdk/jobs/359393389#L2405

It seems the robot had a hiccup as I can't see a report in the test-report ml.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] devtools: give some hints for ABI errors
  2020-07-15 12:15 25% ` [dpdk-dev] [PATCH v2] " David Marchand
@ 2020-07-15 12:48  4%   ` Aaron Conole
  2020-07-16  7:29  4%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Aaron Conole @ 2020-07-15 12:48 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, mdr, nhorman, dodji

David Marchand <david.marchand@redhat.com> writes:

> abidiff can provide some more information about the ABI difference it
> detected.
> In all cases, a discussion on the mailing must happen but we can give
> some hints to know if this is a problem with the script calling abidiff,
> a potential ABI breakage or an unambiguous ABI breakage.
>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Ray Kinsella <mdr@ashroe.eu>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> ---
> Changes since v1:
> - used arithmetic test,
> - updated error message for generic errors,
>

Acked-by: Aaron Conole <aconole@redhat.com>


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-15 12:29  0%       ` David Marchand
@ 2020-07-15 12:49  0%         ` Aaron Conole
  2020-07-15 16:29  0%         ` Phil Yang
  2020-07-16  4:16  0%         ` Phil Yang
  2 siblings, 0 replies; 200+ results
From: Aaron Conole @ 2020-07-15 12:49 UTC (permalink / raw)
  To: David Marchand
  Cc: Phil Yang, Olivier Matz, dev, Stephen Hemminger,
	David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	nd, Dodji Seketeli

David Marchand <david.marchand@redhat.com> writes:

> On Thu, Jul 9, 2020 at 5:59 PM Phil Yang <phil.yang@arm.com> wrote:
>>
>> Use C11 atomic built-ins with explicit ordering instead of rte_atomic
>> ops which enforce unnecessary barriers on aarch64.
>>
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>> v4:
>> 1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
>> to avoid ABI breakage. (Olivier)
>> 2. Add notice of refcnt_atomic deprecation. (Honnappa)
>
> v4 does not pass the checks (in both my env, and Travis).
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/359393389#L2405
>
> It seems the robot had a hiccup as I can't see a report in the test-report ml.

Hrrm... that has been happening quite a bit.  I'll investigate.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-15 12:29  0%       ` David Marchand
  2020-07-15 12:49  0%         ` Aaron Conole
@ 2020-07-15 16:29  0%         ` Phil Yang
  2020-07-16  4:16  0%         ` Phil Yang
  2 siblings, 0 replies; 200+ results
From: Phil Yang @ 2020-07-15 16:29 UTC (permalink / raw)
  To: David Marchand
  Cc: Olivier Matz, dev, Stephen Hemminger, David Christensen,
	Honnappa Nagarahalli, Ruifeng Wang, nd, Dodji Seketeli,
	Aaron Conole, nd

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, July 15, 2020 8:29 PM
> To: Phil Yang <Phil.Yang@arm.com>
> Cc: Olivier Matz <olivier.matz@6wind.com>; dev <dev@dpdk.org>; Stephen
> Hemminger <stephen@networkplumber.org>; David Christensen
> <drc@linux.vnet.ibm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Dodji Seketeli
> <dodji@redhat.com>; Aaron Conole <aconole@redhat.com>
> Subject: Re: [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt
> operations
> 
> On Thu, Jul 9, 2020 at 5:59 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> > ops which enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v4:
> > 1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
> > to avoid ABI breakage. (Olivier)
> > 2. Add notice of refcnt_atomic deprecation. (Honnappa)
> 
> v4 does not pass the checks (in both my env, and Travis).
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/359393389#L2405

I had tested with test-meson-builds in my env and it didn't give any error message.  The reference version is v20.05.
$  DPDK_ABI_REF_DIR=$PWD/reference  DPDK_ABI_REF_VERSION=v20.05 ./devtools/test-meson-builds.sh

It seems to be a problem with my test environment.
I will fix this problem as soon as possible.


> 
> It seems the robot had a hiccup as I can't see a report in the test-report ml.
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC PATCH 0/2] Enable dyynamic configuration of subport bandwidth profile
@ 2020-07-15 18:27  3% Savinay Dharmappa
  2020-07-16  8:14  0% ` Singh, Jasvinder
  0 siblings, 1 reply; 200+ results
From: Savinay Dharmappa @ 2020-07-15 18:27 UTC (permalink / raw)
  To: savinay.dharmappa, jasvinder.singh, dev

DPDK sched library allows runtime configuration of the pipe profiles to the
pipes of the subport once scheduler hierarchy is constructed. However, to
change the subport level bandwidth, existing hierarchy needs to be dismantled
and whole process of building hierarchy under subport nodes needs to be
repeated which might result in router downtime. Furthermore, due to lack of
dynamic configuration of the subport bandwidth profile configuration
(shaper and Traffic class rates), the user application is unable to dynamically
re-distribute the excess-bandwidth of one subport among other subports in the
scheduler hierarchy. Therefore, it is also not possible to adjust the subport
bandwidth profile in sync with dynamic changes in pipe profiles of subscribers
who want to consume higher bandwidth opportunistically. 

This RFC proposes dynamic configuration of the subport bandwidth profile to
overcome the runtime situation when group of subscribers are not using the
allotted bandwidth and dynamic bandwidth re-distribution is needed the without
making any structural changes in the hierarchy.

The implementation work includes refactoring the existing data structures
defined for port and subport level, new APIs for adding subport level
bandwidth profiles that can be used in runtime which causes API/ABI change.
Therefore, deprecation notice will be sent out soon.

Savinay Dharmappa (2):
  sched: add dynamic config of subport bandwidth profile
  example/qos_sched: subport bandwidth dynmaic conf

 examples/qos_sched/cfg_file.c          | 158 ++++++-----
 examples/qos_sched/cfg_file.h          |   4 +
 examples/qos_sched/init.c              |  24 +-
 examples/qos_sched/main.h              |   1 +
 examples/qos_sched/profile.cfg         |   3 +
 lib/librte_sched/rte_sched.c           | 486 ++++++++++++++++++++++++---------
 lib/librte_sched/rte_sched.h           |  82 +++++-
 lib/librte_sched/rte_sched_version.map |   2 +
 8 files changed, 544 insertions(+), 216 deletions(-)

-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 9/9] doc: replace references to blacklist/whitelist
  @ 2020-07-15 23:02  1%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2020-07-15 23:02 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Luca Boccassi

The terms blacklist and whitelist are no longer used.

Most of this was automatic replacement, but in a couple of
places the language was awkward before and have tried to improve
the readabilty.

The blacklist/whitelist changes to API will not be a breaking
change for applications in this release but worth adding a note
to encourage migration.

Update examples to new config options
Replace -w with -i and -b with -x to reflect new usage.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
---
 doc/guides/cryptodevs/dpaa2_sec.rst           |  4 ++--
 doc/guides/cryptodevs/dpaa_sec.rst            |  4 ++--
 doc/guides/cryptodevs/qat.rst                 |  6 ++---
 doc/guides/eventdevs/octeontx2.rst            | 20 ++++++++---------
 doc/guides/freebsd_gsg/build_sample_apps.rst  |  2 +-
 doc/guides/linux_gsg/build_sample_apps.rst    |  2 +-
 doc/guides/linux_gsg/eal_args.include.rst     | 14 ++++++------
 doc/guides/linux_gsg/linux_drivers.rst        |  4 ++--
 doc/guides/mempool/octeontx2.rst              |  4 ++--
 doc/guides/nics/bnxt.rst                      |  6 ++---
 doc/guides/nics/cxgbe.rst                     | 12 +++++-----
 doc/guides/nics/dpaa.rst                      |  4 ++--
 doc/guides/nics/dpaa2.rst                     |  4 ++--
 doc/guides/nics/enic.rst                      | 12 +++++-----
 doc/guides/nics/fail_safe.rst                 | 22 +++++++++----------
 doc/guides/nics/features.rst                  |  2 +-
 doc/guides/nics/i40e.rst                      | 12 +++++-----
 doc/guides/nics/ice.rst                       | 18 +++++++--------
 doc/guides/nics/mlx4.rst                      | 16 +++++++-------
 doc/guides/nics/mlx5.rst                      | 12 +++++-----
 doc/guides/nics/octeontx2.rst                 | 22 +++++++++----------
 doc/guides/nics/sfc_efx.rst                   |  2 +-
 doc/guides/nics/tap.rst                       | 10 ++++-----
 doc/guides/nics/thunderx.rst                  |  4 ++--
 .../prog_guide/env_abstraction_layer.rst      |  7 +++---
 doc/guides/prog_guide/multi_proc_support.rst  |  4 ++--
 doc/guides/rel_notes/known_issues.rst         |  4 ++--
 doc/guides/rel_notes/release_20_08.rst        |  6 +++++
 doc/guides/rel_notes/release_2_1.rst          |  2 +-
 doc/guides/sample_app_ug/bbdev_app.rst        |  6 ++---
 doc/guides/sample_app_ug/ipsec_secgw.rst      |  6 ++---
 doc/guides/sample_app_ug/l3_forward.rst       |  2 +-
 .../sample_app_ug/l3_forward_access_ctrl.rst  |  2 +-
 .../sample_app_ug/l3_forward_power_man.rst    |  2 +-
 doc/guides/sample_app_ug/vdpa.rst             |  2 +-
 doc/guides/tools/cryptoperf.rst               |  6 ++---
 doc/guides/tools/flow-perf.rst                |  2 +-
 37 files changed, 137 insertions(+), 132 deletions(-)

diff --git a/doc/guides/cryptodevs/dpaa2_sec.rst b/doc/guides/cryptodevs/dpaa2_sec.rst
index 3053636b8295..363c52f0422f 100644
--- a/doc/guides/cryptodevs/dpaa2_sec.rst
+++ b/doc/guides/cryptodevs/dpaa2_sec.rst
@@ -134,10 +134,10 @@ Supported DPAA2 SoCs
 * LS2088A/LS2048A
 * LS1088A/LS1048A
 
-Whitelisting & Blacklisting
+Allowlisting & Blocklisting
 ---------------------------
 
-For blacklisting a DPAA2 SEC device, following commands can be used.
+The DPAA2 SEC device can be blocked with the following:
 
  .. code-block:: console
 
diff --git a/doc/guides/cryptodevs/dpaa_sec.rst b/doc/guides/cryptodevs/dpaa_sec.rst
index db3c8e918945..295164523d22 100644
--- a/doc/guides/cryptodevs/dpaa_sec.rst
+++ b/doc/guides/cryptodevs/dpaa_sec.rst
@@ -82,10 +82,10 @@ Supported DPAA SoCs
 * LS1046A/LS1026A
 * LS1043A/LS1023A
 
-Whitelisting & Blacklisting
+Allowlisting & Blocklisting
 ---------------------------
 
-For blacklisting a DPAA device, following commands can be used.
+For blocking a DPAA device, following commands can be used.
 
  .. code-block:: console
 
diff --git a/doc/guides/cryptodevs/qat.rst b/doc/guides/cryptodevs/qat.rst
index 7f4036c3210e..38e5b0a96206 100644
--- a/doc/guides/cryptodevs/qat.rst
+++ b/doc/guides/cryptodevs/qat.rst
@@ -126,7 +126,7 @@ Limitations
   optimisations in the GEN3 device. And if a GCM session is initialised on a
   GEN3 device, then attached to an op sent to a GEN1/GEN2 device, it will not be
   enqueued to the device and will be marked as failed. The simplest way to
-  mitigate this is to use the bdf whitelist to avoid mixing devices of different
+  mitigate this is to use the bdf allowlist to avoid mixing devices of different
   generations in the same process if planning to use for GCM.
 * The mixed algo feature on GEN2 is not supported by all kernel drivers. Check
   the notes under the Available Kernel Drivers table below for specific details.
@@ -261,7 +261,7 @@ adjusted to the number of VFs which the QAT common code will need to handle.
         QAT VF may expose two crypto devices, sym and asym, it may happen that the
         number of devices will be bigger than MAX_DEVS and the process will show an error
         during PMD initialisation. To avoid this problem CONFIG_RTE_CRYPTO_MAX_DEVS may be
-        increased or -w, pci-whitelist domain:bus:devid:func option may be used.
+        increased or -w, pci-allowlist domain:bus:devid:func option may be used.
 
 
 QAT compression PMD needs intermediate buffers to support Deflate compression
@@ -299,7 +299,7 @@ return 0 (thereby avoiding an MMIO) if the device is congested and number of pac
 possible to enqueue is smaller.
 To use this feature the user must set the parameter on process start as a device additional parameter::
 
-  -w 03:01.1,qat_sym_enq_threshold=32,qat_comp_enq_threshold=16
+  -i 03:01.1,qat_sym_enq_threshold=32,qat_comp_enq_threshold=16
 
 All parameters can be used with the same device regardless of order. Parameters are separated
 by comma. When the same parameter is used more than once first occurrence of the parameter
diff --git a/doc/guides/eventdevs/octeontx2.rst b/doc/guides/eventdevs/octeontx2.rst
index 6502f6415fb4..470ea5450432 100644
--- a/doc/guides/eventdevs/octeontx2.rst
+++ b/doc/guides/eventdevs/octeontx2.rst
@@ -66,7 +66,7 @@ Runtime Config Options
   upper limit for in-flight events.
   For example::
 
-    -w 0002:0e:00.0,xae_cnt=16384
+    -i 0002:0e:00.0,xae_cnt=16384
 
 - ``Force legacy mode``
 
@@ -74,7 +74,7 @@ Runtime Config Options
   single workslot mode in SSO and disable the default dual workslot mode.
   For example::
 
-    -w 0002:0e:00.0,single_ws=1
+    -i 0002:0e:00.0,single_ws=1
 
 - ``Event Group QoS support``
 
@@ -89,7 +89,7 @@ Runtime Config Options
   default.
   For example::
 
-    -w 0002:0e:00.0,qos=[1-50-50-50]
+    -i 0002:0e:00.0,qos=[1-50-50-50]
 
 - ``Selftest``
 
@@ -98,7 +98,7 @@ Runtime Config Options
   The tests are run once the vdev creation is successfully complete.
   For example::
 
-    -w 0002:0e:00.0,selftest=1
+    -i 0002:0e:00.0,selftest=1
 
 - ``TIM disable NPA``
 
@@ -107,7 +107,7 @@ Runtime Config Options
   parameter disables NPA and uses software mempool to manage chunks
   For example::
 
-    -w 0002:0e:00.0,tim_disable_npa=1
+    -i 0002:0e:00.0,tim_disable_npa=1
 
 - ``TIM modify chunk slots``
 
@@ -118,7 +118,7 @@ Runtime Config Options
   to SSO. The default value is 255 and the max value is 4095.
   For example::
 
-    -w 0002:0e:00.0,tim_chnk_slots=1023
+    -i 0002:0e:00.0,tim_chnk_slots=1023
 
 - ``TIM enable arm/cancel statistics``
 
@@ -126,7 +126,7 @@ Runtime Config Options
   event timer adapter.
   For example::
 
-    -w 0002:0e:00.0,tim_stats_ena=1
+    -i 0002:0e:00.0,tim_stats_ena=1
 
 - ``TIM limit max rings reserved``
 
@@ -136,7 +136,7 @@ Runtime Config Options
   rings.
   For example::
 
-    -w 0002:0e:00.0,tim_rings_lmt=5
+    -i 0002:0e:00.0,tim_rings_lmt=5
 
 - ``TIM ring control internal parameters``
 
@@ -146,7 +146,7 @@ Runtime Config Options
   default values.
   For Example::
 
-    -w 0002:0e:00.0,tim_ring_ctl=[2-1023-1-0]
+    -i 0002:0e:00.0,tim_ring_ctl=[2-1023-1-0]
 
 - ``Lock NPA contexts in NDC``
 
@@ -156,7 +156,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:0e:00.0,npa_lock_mask=0xf
+      -i 0002:0e:00.0,npa_lock_mask=0xf
 
 Debugging Options
 ~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/freebsd_gsg/build_sample_apps.rst b/doc/guides/freebsd_gsg/build_sample_apps.rst
index 2a68f5fc3820..4fba671e4f5b 100644
--- a/doc/guides/freebsd_gsg/build_sample_apps.rst
+++ b/doc/guides/freebsd_gsg/build_sample_apps.rst
@@ -67,7 +67,7 @@ DPDK application. Some of the EAL options for FreeBSD are as follows:
     is a list of cores to use instead of a core mask.
 
 *   ``-b <domain:bus:devid.func>``:
-    Blacklisting of ports; prevent EAL from using specified PCI device
+    Blocklisting of ports; prevent EAL from using specified PCI device
     (multiple ``-b`` options are allowed).
 
 *   ``--use-device``:
diff --git a/doc/guides/linux_gsg/build_sample_apps.rst b/doc/guides/linux_gsg/build_sample_apps.rst
index 2f606535c374..ebc6e3e02d74 100644
--- a/doc/guides/linux_gsg/build_sample_apps.rst
+++ b/doc/guides/linux_gsg/build_sample_apps.rst
@@ -102,7 +102,7 @@ The EAL options are as follows:
   Number of memory channels per processor socket.
 
 * ``-b <domain:bus:devid.func>``:
-  Blacklisting of ports; prevent EAL from using specified PCI device
+  Blocklisting of ports; prevent EAL from using specified PCI device
   (multiple ``-b`` options are allowed).
 
 * ``--use-device``:
diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
index 0fe44579689b..41f399ccd608 100644
--- a/doc/guides/linux_gsg/eal_args.include.rst
+++ b/doc/guides/linux_gsg/eal_args.include.rst
@@ -44,20 +44,20 @@ Lcore-related options
 Device-related options
 ~~~~~~~~~~~~~~~~~~~~~~
 
-*   ``-b, --pci-blacklist <[domain:]bus:devid.func>``
+*   ``-b, --pci-skip-probe <[domain:]bus:devid.func>``
 
-    Blacklist a PCI device to prevent EAL from using it. Multiple -b options are
-    allowed.
+    Skip probing a PCI device to prevent EAL from using it.
+    Multiple -b options are allowed.
 
 .. Note::
-    PCI blacklist cannot be used with ``-w`` option.
+    PCI skip probe cannot be used with the only list ``-w`` option.
 
-*   ``-w, --pci-whitelist <[domain:]bus:devid.func>``
+*   ``-w, --pci-only-list <[domain:]bus:devid.func>``
 
-    Add a PCI device in white list.
+    Add a PCI device in to the list of probed devices.
 
 .. Note::
-    PCI whitelist cannot be used with ``-b`` option.
+    PCI only list cannot be used with the skip probe ``-b`` option.
 
 *   ``--vdev <device arguments>``
 
diff --git a/doc/guides/linux_gsg/linux_drivers.rst b/doc/guides/linux_gsg/linux_drivers.rst
index 4eda3d5bf4fe..0c6f9f8572ee 100644
--- a/doc/guides/linux_gsg/linux_drivers.rst
+++ b/doc/guides/linux_gsg/linux_drivers.rst
@@ -104,11 +104,11 @@ parameter ``--vfio-vf-token``.
     3. echo 2 > /sys/bus/pci/devices/0000:86:00.0/sriov_numvfs
 
     4. Start the PF:
-        ./x86_64-native-linux-gcc/app/testpmd -l 22-25 -n 4 -w 86:00.0 \
+        ./x86_64-native-linux-gcc/app/testpmd -l 22-25 -n 4 -i 86:00.0 \
          --vfio-vf-token=14d63f20-8445-11ea-8900-1f9ce7d5650d --file-prefix=pf -- -i
 
     5. Start the VF:
-        ./x86_64-native-linux-gcc/app/testpmd -l 26-29 -n 4 -w 86:02.0 \
+        ./x86_64-native-linux-gcc/app/testpmd -l 26-29 -n 4 -i 86:02.0 \
          --vfio-vf-token=14d63f20-8445-11ea-8900-1f9ce7d5650d --file-prefix=vf0 -- -i
 
 Also, to use VFIO, both kernel and BIOS must support and be configured to use IO virtualization (such as Intel® VT-d).
diff --git a/doc/guides/mempool/octeontx2.rst b/doc/guides/mempool/octeontx2.rst
index 49b45a04e8ec..507591d809c6 100644
--- a/doc/guides/mempool/octeontx2.rst
+++ b/doc/guides/mempool/octeontx2.rst
@@ -50,7 +50,7 @@ Runtime Config Options
   for the application.
   For example::
 
-    -w 0002:02:00.0,max_pools=512
+    -i 0002:02:00.0,max_pools=512
 
   With the above configuration, the driver will set up only 512 mempools for
   the given application to save HW resources.
@@ -69,7 +69,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,npa_lock_mask=0xf
+      -i 0002:02:00.0,npa_lock_mask=0xf
 
 Debugging Options
 ~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/nics/bnxt.rst b/doc/guides/nics/bnxt.rst
index 6ff75d0a25e9..716d02beba3c 100644
--- a/doc/guides/nics/bnxt.rst
+++ b/doc/guides/nics/bnxt.rst
@@ -259,7 +259,7 @@ Unicast MAC Filter
 ^^^^^^^^^^^^^^^^^^
 
 The application adds (or removes) MAC addresses to enable (or disable)
-whitelist filtering to accept packets.
+allowlist filtering to accept packets.
 
 .. code-block:: console
 
@@ -270,7 +270,7 @@ Multicast MAC Filter
 ^^^^^^^^^^^^^^^^^^^^
 
 Application adds (or removes) Multicast addresses to enable (or disable)
-whitelist filtering to accept packets.
+allowlist filtering to accept packets.
 
 .. code-block:: console
 
@@ -278,7 +278,7 @@ whitelist filtering to accept packets.
     testpmd> mcast_addr (add|remove) (port_id) (XX:XX:XX:XX:XX:XX)
 
 Application adds (or removes) Multicast addresses to enable (or disable)
-whitelist filtering to accept packets.
+allowlist filtering to accept packets.
 
 Note that the BNXT PMD supports up to 16 MC MAC filters. if the user adds more
 than 16 MC MACs, the BNXT PMD puts the port into the Allmulticast mode.
diff --git a/doc/guides/nics/cxgbe.rst b/doc/guides/nics/cxgbe.rst
index 54a4c138998c..870904cfd9b0 100644
--- a/doc/guides/nics/cxgbe.rst
+++ b/doc/guides/nics/cxgbe.rst
@@ -40,8 +40,8 @@ expose a single PCI bus address, thus, librte_pmd_cxgbe registers
 itself as a PCI driver that allocates one Ethernet device per detected
 port.
 
-For this reason, one cannot whitelist/blacklist a single port without
-whitelisting/blacklisting the other ports on the same device.
+For this reason, one cannot allowlist/blocklist a single port without
+allowlisting/blocklisting the other ports on the same device.
 
 .. _t5-nics:
 
@@ -112,7 +112,7 @@ be passed as part of EAL arguments. For example,
 
 .. code-block:: console
 
-   testpmd -w 02:00.4,keep_ovlan=1 -- -i
+   testpmd -i 02:00.4,keep_ovlan=1 -- -i
 
 Common Runtime Options
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -317,7 +317,7 @@ CXGBE PF Only Runtime Options
 
   .. code-block:: console
 
-     testpmd -w 02:00.4,filtermode=0x88 -- -i
+     testpmd -i 02:00.4,filtermode=0x88 -- -i
 
 - ``filtermask`` (default **0**)
 
@@ -344,7 +344,7 @@ CXGBE PF Only Runtime Options
 
   .. code-block:: console
 
-     testpmd -w 02:00.4,filtermode=0x88,filtermask=0x80 -- -i
+     testpmd -i 02:00.4,filtermode=0x88,filtermask=0x80 -- -i
 
 .. _driver-compilation:
 
@@ -776,7 +776,7 @@ devices managed by librte_pmd_cxgbe in FreeBSD operating system.
 
    .. code-block:: console
 
-      ./x86_64-native-freebsd-clang/app/testpmd -l 0-3 -n 4 -w 0000:02:00.4 -- -i
+      ./x86_64-native-freebsd-clang/app/testpmd -l 0-3 -n 4 -i 0000:02:00.4 -- -i
 
    Example output:
 
diff --git a/doc/guides/nics/dpaa.rst b/doc/guides/nics/dpaa.rst
index 17839a920e60..efcbb7207734 100644
--- a/doc/guides/nics/dpaa.rst
+++ b/doc/guides/nics/dpaa.rst
@@ -162,10 +162,10 @@ Manager.
   this pool.
 
 
-Whitelisting & Blacklisting
+Allowlisting & Blocklisting
 ---------------------------
 
-For blacklisting a DPAA device, following commands can be used.
+For blocking a DPAA device, following commands can be used.
 
  .. code-block:: console
 
diff --git a/doc/guides/nics/dpaa2.rst b/doc/guides/nics/dpaa2.rst
index fdfa6fdd5aea..91b5c59f8c0f 100644
--- a/doc/guides/nics/dpaa2.rst
+++ b/doc/guides/nics/dpaa2.rst
@@ -527,10 +527,10 @@ which are lower than logging ``level``.
 Using ``pmd.net.dpaa2`` as log matching criteria, all PMD logs can be enabled
 which are lower than logging ``level``.
 
-Whitelisting & Blacklisting
+Allowlisting & Blocklisting
 ---------------------------
 
-For blacklisting a DPAA2 device, following commands can be used.
+For blocking a DPAA2 device, following commands can be used.
 
  .. code-block:: console
 
diff --git a/doc/guides/nics/enic.rst b/doc/guides/nics/enic.rst
index a28a7f4e477a..a67f169a87a8 100644
--- a/doc/guides/nics/enic.rst
+++ b/doc/guides/nics/enic.rst
@@ -187,14 +187,14 @@ or ``vfio`` in non-IOMMU mode.
 
 In the VM, the kernel enic driver may be automatically bound to the VF during
 boot. Unbinding it currently hangs due to a known issue with the driver. To
-work around the issue, blacklist the enic module as follows.
+work around the issue, blocklist the enic module as follows.
 Please see :ref:`Limitations <enic_limitations>` for limitations in
 the use of SR-IOV.
 
 .. code-block:: console
 
      # cat /etc/modprobe.d/enic.conf
-     blacklist enic
+     blocklist enic
 
      # dracut --force
 
@@ -312,7 +312,7 @@ enables overlay offload, it prints the following message on the console.
 By default, PMD enables overlay offload if hardware supports it. To disable
 it, set ``devargs`` parameter ``disable-overlay=1``. For example::
 
-    -w 12:00.0,disable-overlay=1
+    -i 12:00.0,disable-overlay=1
 
 By default, the NIC uses 4789 as the VXLAN port. The user may change
 it through ``rte_eth_dev_udp_tunnel_port_{add,delete}``. However, as
@@ -378,7 +378,7 @@ vectorized handler, take the following steps.
   PMD consider the vectorized handler when selecting the receive handler.
   For example::
 
-    -w 12:00.0,enable-avx2-rx=1
+    -i 12:00.0,enable-avx2-rx=1
 
   As the current implementation is intended for field trials, by default, the
   vectorized handler is not considered (``enable-avx2-rx=0``).
@@ -427,7 +427,7 @@ DPDK as untagged packets. In this case mbuf->vlan_tci and the PKT_RX_VLAN and
 PKT_RX_VLAN_STRIPPED mbuf flags would not be set. This mode is enabled with the
 ``devargs`` parameter ``ig-vlan-rewrite=untag``. For example::
 
-    -w 12:00.0,ig-vlan-rewrite=untag
+    -i 12:00.0,ig-vlan-rewrite=untag
 
 - **SR-IOV**
 
@@ -437,7 +437,7 @@ PKT_RX_VLAN_STRIPPED mbuf flags would not be set. This mode is enabled with the
   - VF devices are not usable directly from the host. They can  only be used
     as assigned devices on VM instances.
   - Currently, unbind of the ENIC kernel mode driver 'enic.ko' on the VM
-    instance may hang. As a workaround, enic.ko should be blacklisted or removed
+    instance may hang. As a workaround, enic.ko should be blocklisted or removed
     from the boot process.
   - pci_generic cannot be used as the uio module in the VM. igb_uio or
     vfio in non-IOMMU mode can be used.
diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
index b4a92f663b17..46d1224b048f 100644
--- a/doc/guides/nics/fail_safe.rst
+++ b/doc/guides/nics/fail_safe.rst
@@ -68,11 +68,11 @@ Fail-safe command line parameters
 
 .. note::
 
-   In case where the sub-device is also used as a whitelist device, using ``-w``
+   In case where the sub-device is also used as a allowlist device, using ``-w``
    on the EAL command line, the fail-safe PMD will use the device with the
    options provided to the EAL instead of its own parameters.
 
-   When trying to use a PCI device automatically probed by the blacklist mode,
+   When trying to use a PCI device automatically probed by the blocklist mode,
    the name for the fail-safe sub-device must be the full PCI id:
    Domain:Bus:Device.Function, *i.e.* ``00:00:00.0`` instead of ``00:00.0``,
    as the second form is historically accepted by the DPDK.
@@ -123,28 +123,28 @@ This section shows some example of using **testpmd** with a fail-safe PMD.
 #. To build a PMD and configure DPDK, refer to the document
    :ref:`compiling and testing a PMD for a NIC <pmd_build_and_test>`.
 
-#. Start testpmd. The sub-device ``84:00.0`` should be blacklisted from normal EAL
-   operations to avoid probing it twice, as the PCI bus is in blacklist mode.
+#. Start testpmd. The sub-device ``84:00.0`` should be blocklisted from normal EAL
+   operations to avoid probing it twice, as the PCI bus is in blocklist mode.
 
    .. code-block:: console
 
       $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
          --vdev 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)' \
-         -b 84:00.0 -b 00:04.0 -- -i
+         -x 84:00.0 -x 00:04.0 -- -i
 
-   If the sub-device ``84:00.0`` is not blacklisted, it will be probed by the
+   If the sub-device ``84:00.0`` is not blocklisted, it will be probed by the
    EAL first. When the fail-safe then tries to initialize it the probe operation
    fails.
 
-   Note that PCI blacklist mode is the default PCI operating mode.
+   Note that PCI blocklist mode is the default PCI operating mode.
 
-#. Alternatively, it can be used alongside any other device in whitelist mode.
+#. Alternatively, it can be used alongside any other device in allowlist mode.
 
    .. code-block:: console
 
       $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
          --vdev 'net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0)' \
-         -w 81:00.0 -- -i
+         -i 81:00.0 -- -i
 
 #. Start testpmd using a flexible device definition
 
@@ -155,9 +155,9 @@ This section shows some example of using **testpmd** with a fail-safe PMD.
 
 #. Start testpmd, automatically probing the device 84:00.0 and using it with
    the fail-safe.
- 
+
    .. code-block:: console
- 
+
       $RTE_TARGET/build/app/testpmd -c 0xff -n 4 \
          --vdev 'net_failsafe0,dev(0000:84:00.0),dev(net_ring0)' -- -i
 
diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
index edd21c4d8e9d..6aecead6e019 100644
--- a/doc/guides/nics/features.rst
+++ b/doc/guides/nics/features.rst
@@ -247,7 +247,7 @@ Supports enabling/disabling receiving multicast frames.
 Unicast MAC filter
 ------------------
 
-Supports adding MAC addresses to enable whitelist filtering to accept packets.
+Supports adding MAC addresses to enable allowlist filtering to accept packets.
 
 * **[implements] eth_dev_ops**: ``mac_addr_set``, ``mac_addr_add``, ``mac_addr_remove``.
 * **[implements] rte_eth_dev_data**: ``mac_addrs``.
diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst
index cf1ae2d0b043..4f4072b3bf6f 100644
--- a/doc/guides/nics/i40e.rst
+++ b/doc/guides/nics/i40e.rst
@@ -203,7 +203,7 @@ Runtime Config Options
   Adapter with both Linux kernel and DPDK PMD. To fix this issue, ``devargs``
   parameter ``support-multi-driver`` is introduced, for example::
 
-    -w 84:00.0,support-multi-driver=1
+    -i 84:00.0,support-multi-driver=1
 
   With the above configuration, DPDK PMD will not change global registers, and
   will switch PF interrupt from IntN to Int0 to avoid interrupt conflict between
@@ -230,7 +230,7 @@ Runtime Config Options
   since it can get better perf in some real work loading cases. So ``devargs`` param
   ``use-latest-supported-vec`` is introduced, for example::
 
-  -w 84:00.0,use-latest-supported-vec=1
+  -i 84:00.0,use-latest-supported-vec=1
 
 - ``Enable validation for VF message`` (default ``not enabled``)
 
@@ -240,7 +240,7 @@ Runtime Config Options
   Format -- "maximal-message@period-seconds:ignore-seconds"
   For example::
 
-  -w 84:00.0,vf_msg_cfg=80@120:180
+  -i 84:00.0,vf_msg_cfg=80@120:180
 
 Vector RX Pre-conditions
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -475,7 +475,7 @@ no physical uplink on the associated NIC port.
 To enable this feature, the user should pass a ``devargs`` parameter to the
 EAL, for example::
 
-    -w 84:00.0,enable_floating_veb=1
+    -i 84:00.0,enable_floating_veb=1
 
 In this configuration the PMD will use the floating VEB feature for all the
 VFs created by this PF device.
@@ -483,7 +483,7 @@ VFs created by this PF device.
 Alternatively, the user can specify which VFs need to connect to this floating
 VEB using the ``floating_veb_list`` argument::
 
-    -w 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4
+    -i 84:00.0,enable_floating_veb=1,floating_veb_list=1;3-4
 
 In this example ``VF1``, ``VF3`` and ``VF4`` connect to the floating VEB,
 while other VFs connect to the normal VEB.
@@ -809,7 +809,7 @@ See :numref:`figure_intel_perf_test_setup` for the performance test setup.
 
 7. The command line of running l3fwd would be something like the following::
 
-      ./l3fwd -l 18-21 -n 4 -w 82:00.0 -w 85:00.0 \
+      ./l3fwd -l 18-21 -n 4 -i 82:00.0 -i 85:00.0 \
               -- -p 0x3 --config '(0,0,18),(0,1,19),(1,0,20),(1,1,21)'
 
    This means that the application uses core 18 for port 0, queue pair 0 forwarding, core 19 for port 0, queue pair 1 forwarding,
diff --git a/doc/guides/nics/ice.rst b/doc/guides/nics/ice.rst
index 9a9f4a6bb093..93eb2f0c2264 100644
--- a/doc/guides/nics/ice.rst
+++ b/doc/guides/nics/ice.rst
@@ -50,7 +50,7 @@ Runtime Config Options
   But if user intend to use the device without OS package, user can take ``devargs``
   parameter ``safe-mode-support``, for example::
 
-    -w 80:00.0,safe-mode-support=1
+    -i 80:00.0,safe-mode-support=1
 
   Then the driver will be initialized successfully and the device will enter Safe Mode.
   NOTE: In Safe mode, only very limited features are available, features like RSS,
@@ -61,7 +61,7 @@ Runtime Config Options
   In pipeline mode, a flow can be set at one specific stage by setting parameter
   ``priority``. Currently, we support two stages: priority = 0 or !0. Flows with
   priority 0 located at the first pipeline stage which typically be used as a firewall
-  to drop the packet on a blacklist(we called it permission stage). At this stage,
+  to drop the packet on a blocklist(we called it permission stage). At this stage,
   flow rules are created for the device's exact match engine: switch. Flows with priority
   !0 located at the second stage, typically packets are classified here and be steered to
   specific queue or queue group (we called it distribution stage), At this stage, flow
@@ -73,7 +73,7 @@ Runtime Config Options
   use pipeline mode by setting ``devargs`` parameter ``pipeline-mode-support``,
   for example::
 
-    -w 80:00.0,pipeline-mode-support=1
+    -i 80:00.0,pipeline-mode-support=1
 
 - ``Flow Mark Support`` (default ``0``)
 
@@ -85,7 +85,7 @@ Runtime Config Options
   2) a new offload like RTE_DEV_RX_OFFLOAD_FLOW_MARK be introduced as a standard way to hint.
   Example::
 
-    -w 80:00.0,flow-mark-support=1
+    -i 80:00.0,flow-mark-support=1
 
 - ``Protocol extraction for per queue``
 
@@ -94,8 +94,8 @@ Runtime Config Options
 
   The argument format is::
 
-      -w 18:00.0,proto_xtr=<queues:protocol>[<queues:protocol>...]
-      -w 18:00.0,proto_xtr=<protocol>
+      -i 18:00.0,proto_xtr=<queues:protocol>[<queues:protocol>...]
+      -i 18:00.0,proto_xtr=<protocol>
 
   Queues are grouped by ``(`` and ``)`` within the group. The ``-`` character
   is used as a range separator and ``,`` is used as a single number separator.
@@ -106,14 +106,14 @@ Runtime Config Options
 
   .. code-block:: console
 
-    testpmd -w 18:00.0,proto_xtr='[(1,2-3,8-9):tcp,10-13:vlan]'
+    testpmd -i 18:00.0,proto_xtr='[(1,2-3,8-9):tcp,10-13:vlan]'
 
   This setting means queues 1, 2-3, 8-9 are TCP extraction, queues 10-13 are
   VLAN extraction, other queues run with no protocol extraction.
 
   .. code-block:: console
 
-    testpmd -w 18:00.0,proto_xtr=vlan,proto_xtr='[(1,2-3,8-9):tcp,10-23:ipv6]'
+    testpmd -i 18:00.0,proto_xtr=vlan,proto_xtr='[(1,2-3,8-9):tcp,10-23:ipv6]'
 
   This setting means queues 1, 2-3, 8-9 are TCP extraction, queues 10-23 are
   IPv6 extraction, other queues use the default VLAN extraction.
@@ -253,7 +253,7 @@ responses for the same from PF.
 
 #. Bind the VF0,  and run testpmd with 'cap=dcf' devarg::
 
-      testpmd -l 22-25 -n 4 -w 18:01.0,cap=dcf -- -i
+      testpmd -l 22-25 -n 4 -i 18:01.0,cap=dcf -- -i
 
 #. Monitor the VF2 interface network traffic::
 
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index 1f1e2f6c7767..f445dd51a65f 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -29,8 +29,8 @@ Most Mellanox ConnectX-3 devices provide two ports but expose a single PCI
 bus address, thus unlike most drivers, librte_pmd_mlx4 registers itself as a
 PCI driver that allocates one Ethernet device per detected port.
 
-For this reason, one cannot white/blacklist a single port without also
-white/blacklisting the others on the same device.
+For this reason, one cannot white/blocklist a single port without also
+white/blocklisting the others on the same device.
 
 Besides its dependency on libibverbs (that implies libmlx4 and associated
 kernel support), librte_pmd_mlx4 relies heavily on system calls for control
@@ -422,7 +422,7 @@ devices managed by librte_pmd_mlx4.
       eth4
       eth5
 
-#. Optionally, retrieve their PCI bus addresses for whitelisting::
+#. Optionally, retrieve their PCI bus addresses for allowlisting::
 
       {
           for intf in eth2 eth3 eth4 eth5;
@@ -434,10 +434,10 @@ devices managed by librte_pmd_mlx4.
 
    Example output::
 
-      -w 0000:83:00.0
-      -w 0000:83:00.0
-      -w 0000:84:00.0
-      -w 0000:84:00.0
+      -i 0000:83:00.0
+      -i 0000:83:00.0
+      -i 0000:84:00.0
+      -i 0000:84:00.0
 
    .. note::
 
@@ -450,7 +450,7 @@ devices managed by librte_pmd_mlx4.
 
 #. Start testpmd with basic parameters::
 
-      testpmd -l 8-15 -n 4 -w 0000:83:00.0 -w 0000:84:00.0 -- --rxq=2 --txq=2 -i
+      testpmd -l 8-15 -n 4 -i 0000:83:00.0 -i 0000:84:00.0 -- --rxq=2 --txq=2 -i
 
    Example output::
 
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 4b6d8fb4d55b..bafabba518b5 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -1454,7 +1454,7 @@ ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices managed by librte_pmd_mlx5.
       eth32
       eth33
 
-#. Optionally, retrieve their PCI bus addresses for whitelisting::
+#. Optionally, retrieve their PCI bus addresses for allowlisting::
 
       {
           for intf in eth2 eth3 eth4 eth5;
@@ -1466,10 +1466,10 @@ ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices managed by librte_pmd_mlx5.
 
    Example output::
 
-      -w 0000:05:00.1
-      -w 0000:06:00.0
-      -w 0000:06:00.1
-      -w 0000:05:00.0
+      -i 0000:05:00.1
+      -i 0000:06:00.0
+      -i 0000:06:00.1
+      -i 0000:05:00.0
 
 #. Request huge pages::
 
@@ -1477,7 +1477,7 @@ ConnectX-4/ConnectX-5/ConnectX-6/BlueField devices managed by librte_pmd_mlx5.
 
 #. Start testpmd with basic parameters::
 
-      testpmd -l 8-15 -n 4 -w 05:00.0 -w 05:00.1 -w 06:00.0 -w 06:00.1 -- --rxq=2 --txq=2 -i
+      testpmd -l 8-15 -n 4 -i 05:00.0 -i 05:00.1 -i 06:00.0 -i 06:00.1 -- --rxq=2 --txq=2 -i
 
    Example output::
 
diff --git a/doc/guides/nics/octeontx2.rst b/doc/guides/nics/octeontx2.rst
index bb591a8b7e65..3d382446d1d1 100644
--- a/doc/guides/nics/octeontx2.rst
+++ b/doc/guides/nics/octeontx2.rst
@@ -74,7 +74,7 @@ use arm64-octeontx2-linux-gcc as target.
 
    .. code-block:: console
 
-      ./build/app/testpmd -c 0x300 -w 0002:02:00.0 -- --portmask=0x1 --nb-cores=1 --port-topology=loop --rxq=1 --txq=1
+      ./build/app/testpmd -c 0x300 -i 0002:02:00.0 -- --portmask=0x1 --nb-cores=1 --port-topology=loop --rxq=1 --txq=1
       EAL: Detected 24 lcore(s)
       EAL: Detected 1 NUMA nodes
       EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
@@ -127,7 +127,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,reta_size=256
+      -i 0002:02:00.0,reta_size=256
 
    With the above configuration, reta table of size 256 is populated.
 
@@ -138,7 +138,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,flow_max_priority=10
+      -i 0002:02:00.0,flow_max_priority=10
 
    With the above configuration, priority level was set to 10 (0-9). Max
    priority level supported is 32.
@@ -150,7 +150,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,flow_prealloc_size=4
+      -i 0002:02:00.0,flow_prealloc_size=4
 
    With the above configuration, pre alloc size was set to 4. Max pre alloc
    size supported is 32.
@@ -162,7 +162,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,max_sqb_count=64
+      -i 0002:02:00.0,max_sqb_count=64
 
    With the above configuration, each send queue's decscriptor buffer count is
    limited to a maximum of 64 buffers.
@@ -174,7 +174,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,switch_header="higig2"
+      -i 0002:02:00.0,switch_header="higig2"
 
    With the above configuration, higig2 will be enabled on that port and the
    traffic on this port should be higig2 traffic only. Supported switch header
@@ -196,7 +196,7 @@ Runtime Config Options
 
    For example to select the legacy mode(RSS tag adder as XOR)::
 
-      -w 0002:02:00.0,tag_as_xor=1
+      -i 0002:02:00.0,tag_as_xor=1
 
 - ``Max SPI for inbound inline IPsec`` (default ``1``)
 
@@ -205,7 +205,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,ipsec_in_max_spi=128
+      -i 0002:02:00.0,ipsec_in_max_spi=128
 
    With the above configuration, application can enable inline IPsec processing
    on 128 SAs (SPI 0-127).
@@ -216,7 +216,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,lock_rx_ctx=1
+      -i 0002:02:00.0,lock_rx_ctx=1
 
 - ``Lock Tx contexts in NDC cache``
 
@@ -224,7 +224,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,lock_tx_ctx=1
+      -i 0002:02:00.0,lock_tx_ctx=1
 
 .. note::
 
@@ -240,7 +240,7 @@ Runtime Config Options
 
    For example::
 
-      -w 0002:02:00.0,npa_lock_mask=0xf
+      -i 0002:02:00.0,npa_lock_mask=0xf
 
 .. _otx2_tmapi:
 
diff --git a/doc/guides/nics/sfc_efx.rst b/doc/guides/nics/sfc_efx.rst
index be1c2fe1d67e..44115a666a94 100644
--- a/doc/guides/nics/sfc_efx.rst
+++ b/doc/guides/nics/sfc_efx.rst
@@ -290,7 +290,7 @@ Per-Device Parameters
 ~~~~~~~~~~~~~~~~~~~~~
 
 The following per-device parameters can be passed via EAL PCI device
-whitelist option like "-w 02:00.0,arg1=value1,...".
+allowlist option like "-w 02:00.0,arg1=value1,...".
 
 Case-insensitive 1/y/yes/on or 0/n/no/off may be used to specify
 boolean parameters value.
diff --git a/doc/guides/nics/tap.rst b/doc/guides/nics/tap.rst
index 7e44f846206c..b5a0f51988aa 100644
--- a/doc/guides/nics/tap.rst
+++ b/doc/guides/nics/tap.rst
@@ -183,15 +183,15 @@ following::
 
     sudo ./app/app/x86_64-native-linux-gcc/app/pktgen -l 1-5 -n 4        \
      --proc-type auto --log-level debug --socket-mem 512,512 --file-prefix pg   \
-     --vdev=net_tap0 --vdev=net_tap1 -b 05:00.0 -b 05:00.1                  \
-     -b 04:00.0 -b 04:00.1 -b 04:00.2 -b 04:00.3                            \
-     -b 81:00.0 -b 81:00.1 -b 81:00.2 -b 81:00.3                            \
-     -b 82:00.0 -b 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1                   \
+     --vdev=net_tap0 --vdev=net_tap1 -x 05:00.0 -x 05:00.1                  \
+     -x 04:00.0 -x 04:00.1 -x 04:00.2 -x 04:00.3                            \
+     -x 81:00.0 -x 81:00.1 -x 81:00.2 -x 81:00.3                            \
+     -x 82:00.0 -x 83:00.0 -- -T -P -m [2:3].0 -m [4:5].1                   \
      -f themes/black-yellow.theme
 
 .. Note:
 
-   Change the ``-b`` options to blacklist all of your physical ports. The
+   Change the ``-x`` options to exclude all of your physical ports. The
    following command line is all one line.
 
    Also, ``-f themes/black-yellow.theme`` is optional if the default colors
diff --git a/doc/guides/nics/thunderx.rst b/doc/guides/nics/thunderx.rst
index f42133e5464d..f1b27a3f269c 100644
--- a/doc/guides/nics/thunderx.rst
+++ b/doc/guides/nics/thunderx.rst
@@ -178,7 +178,7 @@ This section provides instructions to configure SR-IOV with Linux OS.
 
    .. code-block:: console
 
-      ./arm64-thunderx-linux-gcc/app/testpmd -l 0-3 -n 4 -w 0002:01:00.2 \
+      ./arm64-thunderx-linux-gcc/app/testpmd -l 0-3 -n 4 -i 0002:01:00.2 \
         -- -i --no-flush-rx \
         --port-topology=loop
 
@@ -398,7 +398,7 @@ This scheme is useful when application would like to insert vlan header without
 Example:
    .. code-block:: console
 
-      -w 0002:01:00.2,skip_data_bytes=8
+      -i 0002:01:00.2,skip_data_bytes=8
 
 Limitations
 -----------
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index f64ae953d106..5965c15baa43 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -407,12 +407,11 @@ device having emitted a Device Removal Event. In such case, calling
 callback. Care must be taken not to close the device from the interrupt handler
 context. It is necessary to reschedule such closing operation.
 
-Blacklisting
+Blocklisting
 ~~~~~~~~~~~~
 
-The EAL PCI device blacklist functionality can be used to mark certain NIC ports as blacklisted,
-so they are ignored by the DPDK.
-The ports to be blacklisted are identified using the PCIe* description (Domain:Bus:Device.Function).
+The EAL PCI device blocklist functionality can be used to mark certain NIC ports as unavailale, so they are ignored by the DPDK.
+The ports to be blocklisted are identified using the PCIe* description (Domain:Bus:Device.Function).
 
 Misc Functions
 ~~~~~~~~~~~~~~
diff --git a/doc/guides/prog_guide/multi_proc_support.rst b/doc/guides/prog_guide/multi_proc_support.rst
index a84083b96c8a..14cb6db85661 100644
--- a/doc/guides/prog_guide/multi_proc_support.rst
+++ b/doc/guides/prog_guide/multi_proc_support.rst
@@ -30,7 +30,7 @@ after a primary process has already configured the hugepage shared memory for th
     Secondary processes should run alongside primary process with same DPDK version.
 
     Secondary processes which requires access to physical devices in Primary process, must
-    be passed with the same whitelist and blacklist options.
+    be passed with the same allowlist and blocklist options.
 
 To support these two process types, and other multi-process setups described later,
 two additional command-line parameters are available to the EAL:
@@ -131,7 +131,7 @@ can use).
 .. note::
 
     Independent DPDK instances running side-by-side on a single machine cannot share any network ports.
-    Any network ports being used by one process should be blacklisted in every other process.
+    Any network ports being used by one process should be blocklisted in every other process.
 
 Running Multiple Independent Groups of DPDK Applications
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index de0782136d3c..83a3b38c0ae0 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -523,8 +523,8 @@ Devices bound to igb_uio with VT-d enabled do not work on Linux kernel 3.15-3.17
       DMAR:[fault reason 02] Present bit in context entry is clear
 
 **Resolution/Workaround**:
-   Use earlier or later kernel versions, or avoid driver binding on boot by blacklisting the driver modules.
-   I.e., in the case of ``ixgbe``, we can pass the kernel command line option: ``modprobe.blacklist=ixgbe``.
+   Use earlier or later kernel versions, or avoid driver binding on boot by blocklisting the driver modules.
+   I.e., in the case of ``ixgbe``, we can pass the kernel command line option: ``modprobe.blocklist=ixgbe``.
    This way we do not need to unbind the device to bind it to igb_uio.
 
 **Affected Environment/Platform**:
diff --git a/doc/guides/rel_notes/release_20_08.rst b/doc/guides/rel_notes/release_20_08.rst
index f19b748728e4..b9509f657b30 100644
--- a/doc/guides/rel_notes/release_20_08.rst
+++ b/doc/guides/rel_notes/release_20_08.rst
@@ -261,6 +261,12 @@ API Changes
 * vhost: The API of ``rte_vhost_host_notifier_ctrl`` was changed to be per
   queue and not per device, a qid parameter was added to the arguments list.
 
+* eal: The definitions related to including and excluding devices
+  has been changed from blacklist/whitelist to include/exclude.
+  There are compatiablity macros and command line mapping to accept
+  the old values but applications and scripts are strongly encouraged
+  to migrate to the new names.
+
 
 ABI Changes
 -----------
diff --git a/doc/guides/rel_notes/release_2_1.rst b/doc/guides/rel_notes/release_2_1.rst
index beadc51ba438..6339172c64fa 100644
--- a/doc/guides/rel_notes/release_2_1.rst
+++ b/doc/guides/rel_notes/release_2_1.rst
@@ -472,7 +472,7 @@ Resolved Issues
 
 * **devargs: Fix crash on failure.**
 
-  This problem occurred when passing an invalid PCI id to the blacklist API in
+  This problem occurred when passing an invalid PCI id to the blocklist API in
   devargs.
 
 
diff --git a/doc/guides/sample_app_ug/bbdev_app.rst b/doc/guides/sample_app_ug/bbdev_app.rst
index 405e706a46e4..b722d0263772 100644
--- a/doc/guides/sample_app_ug/bbdev_app.rst
+++ b/doc/guides/sample_app_ug/bbdev_app.rst
@@ -79,7 +79,7 @@ This means that HW baseband device/s must be bound to a DPDK driver or
 a SW baseband device/s (virtual BBdev) must be created (using --vdev).
 
 To run the application in linux environment with the turbo_sw baseband device
-using the whitelisted port running on 1 encoding lcore and 1 decoding lcore
+using the allowlisted port running on 1 encoding lcore and 1 decoding lcore
 issue the command:
 
 .. code-block:: console
@@ -90,7 +90,7 @@ issue the command:
 where, NIC0PCIADDR is the PCI address of the Rx port
 
 This command creates one virtual bbdev devices ``baseband_turbo_sw`` where the
-device gets linked to a corresponding ethernet port as whitelisted by
+device gets linked to a corresponding ethernet port as allowlisted by
 the parameter -w.
 3 cores are allocated to the application, and assigned as:
 
@@ -111,7 +111,7 @@ Using Packet Generator with baseband device sample application
 To allow the bbdev sample app to do the loopback, an influx of traffic is required.
 This can be done by using DPDK Pktgen to burst traffic on two ethernet ports, and
 it will print the transmitted along with the looped-back traffic on Rx ports.
-Executing the command below will generate traffic on the two whitelisted ethernet
+Executing the command below will generate traffic on the two allowlisted ethernet
 ports.
 
 .. code-block:: console
diff --git a/doc/guides/sample_app_ug/ipsec_secgw.rst b/doc/guides/sample_app_ug/ipsec_secgw.rst
index 81c5d4360615..fac75819a762 100644
--- a/doc/guides/sample_app_ug/ipsec_secgw.rst
+++ b/doc/guides/sample_app_ug/ipsec_secgw.rst
@@ -329,15 +329,15 @@ This means that if the application is using a single core and both hardware
 and software crypto devices are detected, hardware devices will be used.
 
 A way to achieve the case where you want to force the use of virtual crypto
-devices is to whitelist the Ethernet devices needed and therefore implicitly
-blacklisting all hardware crypto devices.
+devices is to allowlist the Ethernet devices needed and therefore implicitly
+blocklisting all hardware crypto devices.
 
 For example, something like the following command line:
 
 .. code-block:: console
 
     ./build/ipsec-secgw -l 20,21 -n 4 --socket-mem 0,2048 \
-            -w 81:00.0 -w 81:00.1 -w 81:00.2 -w 81:00.3 \
+            -i 81:00.0 -i 81:00.1 -i 81:00.2 -i 81:00.3 \
             --vdev "crypto_aesni_mb" --vdev "crypto_null" \
 	    -- \
             -p 0xf -P -u 0x3 --config="(0,0,20),(1,0,20),(2,0,21),(3,0,21)" \
diff --git a/doc/guides/sample_app_ug/l3_forward.rst b/doc/guides/sample_app_ug/l3_forward.rst
index 07c8d44936d6..69a29ab1314e 100644
--- a/doc/guides/sample_app_ug/l3_forward.rst
+++ b/doc/guides/sample_app_ug/l3_forward.rst
@@ -148,7 +148,7 @@ or
 
 In this command:
 
-*   -w option whitelist the event device supported by platform. Way to pass this device may vary based on platform.
+*   -w option allowlist the event device supported by platform. Way to pass this device may vary based on platform.
 
 *   The --mode option defines PMD to be used for packet I/O.
 
diff --git a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
index a44fbcd52c3a..473326275e49 100644
--- a/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
+++ b/doc/guides/sample_app_ug/l3_forward_access_ctrl.rst
@@ -18,7 +18,7 @@ The application loads two types of rules at initialization:
 
 *   Route information rules, which are used for L3 forwarding
 
-*   Access Control List (ACL) rules that blacklist (or block) packets with a specific characteristic
+*   Access Control List (ACL) rules that blocklist (or block) packets with a specific characteristic
 
 When packets are received from a port,
 the application extracts the necessary information from the TCP/IP header of the received packet and
diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst
index 0cc6f2e62e75..4cc55004cca8 100644
--- a/doc/guides/sample_app_ug/l3_forward_power_man.rst
+++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst
@@ -378,7 +378,7 @@ See :doc:`Power Management<../prog_guide/power_man>` chapter in the DPDK Program
 
 .. code-block:: console
 
-    ./l3fwd-power -l xxx   -n 4   -w 0000:xx:00.0 -w 0000:xx:00.1 -- -p 0x3 -P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1
+    ./l3fwd-power -l xxx   -n 4   -i 0000:xx:00.0 -i 0000:xx:00.1 -- -p 0x3 -P --config="(0,0,xx),(1,0,xx)" --empty-poll="0,0,0" -l 14 -m 9 -h 1
 
 Where,
 
diff --git a/doc/guides/sample_app_ug/vdpa.rst b/doc/guides/sample_app_ug/vdpa.rst
index d66a724827af..e388c738a1e3 100644
--- a/doc/guides/sample_app_ug/vdpa.rst
+++ b/doc/guides/sample_app_ug/vdpa.rst
@@ -52,7 +52,7 @@ Take IFCVF driver for example:
 .. code-block:: console
 
         ./vdpa -c 0x2 -n 4 --socket-mem 1024,1024 \
-                -w 0000:06:00.3,vdpa=1 -w 0000:06:00.4,vdpa=1 \
+                -i 0000:06:00.3,vdpa=1 -i 0000:06:00.4,vdpa=1 \
                 -- --interactive
 
 .. note::
diff --git a/doc/guides/tools/cryptoperf.rst b/doc/guides/tools/cryptoperf.rst
index 28b729dbda8b..334a4f558abd 100644
--- a/doc/guides/tools/cryptoperf.rst
+++ b/doc/guides/tools/cryptoperf.rst
@@ -417,7 +417,7 @@ Call application for performance throughput test of single Aesni MB PMD
 for cipher encryption aes-cbc and auth generation sha1-hmac,
 one million operations, burst size 32, packet size 64::
 
-   dpdk-test-crypto-perf -l 6-7 --vdev crypto_aesni_mb -w 0000:00:00.0 --
+   dpdk-test-crypto-perf -l 6-7 --vdev crypto_aesni_mb -i 0000:00:00.0 --
    --ptest throughput --devtype crypto_aesni_mb --optype cipher-then-auth
    --cipher-algo aes-cbc --cipher-op encrypt --cipher-key-sz 16 --auth-algo
    sha1-hmac --auth-op generate --auth-key-sz 64 --digest-sz 12
@@ -427,7 +427,7 @@ Call application for performance latency test of two Aesni MB PMD executed
 on two cores for cipher encryption aes-cbc, ten operations in silent mode::
 
    dpdk-test-crypto-perf -l 4-7 --vdev crypto_aesni_mb1
-   --vdev crypto_aesni_mb2 -w 0000:00:00.0 -- --devtype crypto_aesni_mb
+   --vdev crypto_aesni_mb2 -i 0000:00:00.0 -- --devtype crypto_aesni_mb
    --cipher-algo aes-cbc --cipher-key-sz 16 --cipher-iv-sz 16
    --cipher-op encrypt --optype cipher-only --silent
    --ptest latency --total-ops 10
@@ -437,7 +437,7 @@ for cipher encryption aes-gcm and auth generation aes-gcm,ten operations
 in silent mode, test vector provide in file "test_aes_gcm.data"
 with packet verification::
 
-   dpdk-test-crypto-perf -l 4-7 --vdev crypto_openssl -w 0000:00:00.0 --
+   dpdk-test-crypto-perf -l 4-7 --vdev crypto_openssl -i 0000:00:00.0 --
    --devtype crypto_openssl --aead-algo aes-gcm --aead-key-sz 16
    --aead-iv-sz 16 --aead-op encrypt --aead-aad-sz 16 --digest-sz 16
    --optype aead --silent --ptest verify --total-ops 10
diff --git a/doc/guides/tools/flow-perf.rst b/doc/guides/tools/flow-perf.rst
index cdedaf9a97d4..c03681525e60 100644
--- a/doc/guides/tools/flow-perf.rst
+++ b/doc/guides/tools/flow-perf.rst
@@ -61,7 +61,7 @@ with a ``--`` separator:
 
 .. code-block:: console
 
-	sudo ./dpdk-test-flow_perf -n 4 -w 08:00.0 -- --ingress --ether --ipv4 --queue --flows-count=1000000
+	sudo ./dpdk-test-flow_perf -n 4 -i 08:00.0 -- --ingress --ether --ipv4 --queue --flows-count=1000000
 
 The command line options are:
 
-- 
2.27.0


^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-15 12:29  0%       ` David Marchand
  2020-07-15 12:49  0%         ` Aaron Conole
  2020-07-15 16:29  0%         ` Phil Yang
@ 2020-07-16  4:16  0%         ` Phil Yang
  2020-07-16 11:30  4%           ` David Marchand
  2 siblings, 1 reply; 200+ results
From: Phil Yang @ 2020-07-16  4:16 UTC (permalink / raw)
  To: David Marchand, Olivier Matz
  Cc: dev, Stephen Hemminger, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang, nd, Dodji Seketeli, Aaron Conole, nd

David Marchand <david.marchand@redhat.com> writes:

> Subject: Re: [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt
> operations
> 
> On Thu, Jul 9, 2020 at 5:59 PM Phil Yang <phil.yang@arm.com> wrote:
> >
> > Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> > ops which enforce unnecessary barriers on aarch64.
> >
> > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > v4:
> > 1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
> > to avoid ABI breakage. (Olivier)
> > 2. Add notice of refcnt_atomic deprecation. (Honnappa)
> 
> v4 does not pass the checks (in both my env, and Travis).
> https://travis-ci.com/github/ovsrobot/dpdk/jobs/359393389#L2405

I think we need an exception in 'libabigail.abignore' for this change.
Is that OK with you?

Thanks,
Phil
> 
> It seems the robot had a hiccup as I can't see a report in the test-report ml.
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] devtools: give some hints for ABI errors
  2020-07-15 12:48  4%   ` Aaron Conole
@ 2020-07-16  7:29  4%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-07-16  7:29 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Ray Kinsella, Neil Horman, Dodji Seketeli,
	Aaron Conole

On Wed, Jul 15, 2020 at 2:49 PM Aaron Conole <aconole@redhat.com> wrote:
> David Marchand <david.marchand@redhat.com> writes:
>
> > abidiff can provide some more information about the ABI difference it
> > detected.
> > In all cases, a discussion on the mailing must happen but we can give
> > some hints to know if this is a problem with the script calling abidiff,
> > a potential ABI breakage or an unambiguous ABI breakage.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Acked-by: Aaron Conole <aconole@redhat.com>

Applied.


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC PATCH 0/2] Enable dyynamic configuration of subport bandwidth profile
  2020-07-15 18:27  3% [dpdk-dev] [RFC PATCH 0/2] Enable dyynamic configuration of subport bandwidth profile Savinay Dharmappa
@ 2020-07-16  8:14  0% ` Singh, Jasvinder
  0 siblings, 0 replies; 200+ results
From: Singh, Jasvinder @ 2020-07-16  8:14 UTC (permalink / raw)
  To: Dharmappa, Savinay, dev, Dumitrescu, Cristian



> -----Original Message-----
> From: Dharmappa, Savinay <savinay.dharmappa@intel.com>
> Sent: Wednesday, July 15, 2020 7:28 PM
> To: Dharmappa, Savinay <savinay.dharmappa@intel.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>; dev@dpdk.org
> Subject: [RFC PATCH 0/2] Enable dyynamic configuration of subport
> bandwidth profile
> 
> DPDK sched library allows runtime configuration of the pipe profiles to the
> pipes of the subport once scheduler hierarchy is constructed. However, to
> change the subport level bandwidth, existing hierarchy needs to be
> dismantled and whole process of building hierarchy under subport nodes
> needs to be repeated which might result in router downtime. Furthermore,
> due to lack of dynamic configuration of the subport bandwidth profile
> configuration (shaper and Traffic class rates), the user application is unable
> to dynamically re-distribute the excess-bandwidth of one subport among
> other subports in the scheduler hierarchy. Therefore, it is also not possible to
> adjust the subport bandwidth profile in sync with dynamic changes in pipe
> profiles of subscribers who want to consume higher bandwidth
> opportunistically.
> 
> This RFC proposes dynamic configuration of the subport bandwidth profile to
> overcome the runtime situation when group of subscribers are not using the
> allotted bandwidth and dynamic bandwidth re-distribution is needed the
> without making any structural changes in the hierarchy.
> 
> The implementation work includes refactoring the existing data structures
> defined for port and subport level, new APIs for adding subport level
> bandwidth profiles that can be used in runtime which causes API/ABI change.
> Therefore, deprecation notice will be sent out soon.
> 
> Savinay Dharmappa (2):
>   sched: add dynamic config of subport bandwidth profile
>   example/qos_sched: subport bandwidth dynmaic conf
> 
>  examples/qos_sched/cfg_file.c          | 158 ++++++-----
>  examples/qos_sched/cfg_file.h          |   4 +
>  examples/qos_sched/init.c              |  24 +-
>  examples/qos_sched/main.h              |   1 +
>  examples/qos_sched/profile.cfg         |   3 +
>  lib/librte_sched/rte_sched.c           | 486 ++++++++++++++++++++++++---------
>  lib/librte_sched/rte_sched.h           |  82 +++++-
>  lib/librte_sched/rte_sched_version.map |   2 +
>  8 files changed, 544 insertions(+), 216 deletions(-)
> 
> --
> 2.7.4

+ Cristian

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt operations
  2020-07-16  4:16  0%         ` Phil Yang
@ 2020-07-16 11:30  4%           ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2020-07-16 11:30 UTC (permalink / raw)
  To: Phil Yang, Olivier Matz, Dodji Seketeli, Ray Kinsella
  Cc: dev, Stephen Hemminger, David Christensen, Honnappa Nagarahalli,
	Ruifeng Wang, nd, Aaron Conole

On Thu, Jul 16, 2020 at 6:16 AM Phil Yang <Phil.Yang@arm.com> wrote:
>
> David Marchand <david.marchand@redhat.com> writes:
>
> > Subject: Re: [PATCH v4 1/2] mbuf: use C11 atomic built-ins for refcnt
> > operations
> >
> > On Thu, Jul 9, 2020 at 5:59 PM Phil Yang <phil.yang@arm.com> wrote:
> > >
> > > Use C11 atomic built-ins with explicit ordering instead of rte_atomic
> > > ops which enforce unnecessary barriers on aarch64.
> > >
> > > Signed-off-by: Phil Yang <phil.yang@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > v4:
> > > 1. Add union for refcnt_atomic and refcnt in rte_mbuf_ext_shared_info
> > > to avoid ABI breakage. (Olivier)
> > > 2. Add notice of refcnt_atomic deprecation. (Honnappa)
> >
> > v4 does not pass the checks (in both my env, and Travis).
> > https://travis-ci.com/github/ovsrobot/dpdk/jobs/359393389#L2405
>
> I think we need an exception in 'libabigail.abignore' for this change.
> Is that OK with you?

Testing the series with libabigail 1.7.0:

Functions changes summary: 0 Removed, 1 Changed (6 filtered out), 0
Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some indirect sub-type change:

  [C]'function unsigned int rte_reorder_drain(rte_reorder_buffer*,
rte_mbuf**, unsigned int)' at rte_reorder.c:367:1 has some indirect
sub-type changes:
    parameter 2 of type 'rte_mbuf**' has sub-type changes:
      in pointed to type 'rte_mbuf*':
        in pointed to type 'struct rte_mbuf' at rte_mbuf_core.h:469:1:
          type size hasn't changed
          1 data member changes (1 filtered):
           type of 'rte_mbuf_ext_shared_info* rte_mbuf::shinfo' changed:
             in pointed to type 'struct rte_mbuf_ext_shared_info' at
rte_mbuf_core.h:679:1:
               type size hasn't changed
               1 data member change:
                data member rte_atomic16_t
rte_mbuf_ext_shared_info::refcnt_atomic at offset 128 (in bits) became
anonymous data member 'union {rte_atomic16_t refcnt_atomic; uint16_t
refcnt;}'



Error: ABI issue reported for 'abidiff --suppr
/home/dmarchan/dpdk/devtools/../devtools/libabigail.abignore
--no-added-syms --headers-dir1
/home/dmarchan/abi/v20.05/build-gcc-static/usr/local/include
--headers-dir2 /home/dmarchan/builds/build-gcc-static/install/usr/local/include
/home/dmarchan/abi/v20.05/build-gcc-static/dump/librte_reorder.dump
/home/dmarchan/builds/build-gcc-static/install/dump/librte_reorder.dump'

ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
this as a potential issue).



We will have no other update on mbuf for 20.08, so the following rule
can do the job for 20.08 and we will remove it in 20.11.

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index daa4631bf..b35f91257 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -52,6 +52,10 @@
 [suppress_type]
         type_kind = struct
         name = rte_epoll_event
+; Ignore updates of rte_mbuf_ext_shared_info
+[suppress_type]
+        type_kind = struct
+        name = rte_mbuf_ext_shared_info

 ;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till DPDK 20.11


Olivier, Dodji, Ray?


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

Results 8401-8600 of ~18000  next (newer) | prev (older) | reverse | sort options + mbox downloads above

-- links below jump to the message on this page --
2018-01-15 16:16     [dpdk-dev] [PATCH v6] sched: make RED scaling configurable alangordondewar
2019-04-08  8:53     ` [dpdk-dev] [PATCH v7] " Thomas Monjalon
2019-04-08 13:29       ` Dumitrescu, Cristian
2020-07-06 23:09  3%     ` Thomas Monjalon
2019-09-06  9:45     [dpdk-dev] [PATCH v2 0/6] RCU integration with LPM library Ruifeng Wang
2020-06-29  8:02  3% ` [dpdk-dev] [PATCH v5 0/3] " Ruifeng Wang
2020-06-29  8:02       ` [dpdk-dev] [PATCH v5 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-06-29 11:56         ` David Marchand
2020-06-29 12:55           ` Bruce Richardson
2020-06-30 10:35  3%         ` Kinsella, Ray
2020-07-04 17:00  3%       ` Ruifeng Wang
2020-07-07 14:40  3% ` [dpdk-dev] [PATCH v6 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-07 15:15  3% ` [dpdk-dev] [PATCH v7 " Ruifeng Wang
2020-07-07 15:15       ` [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-08 14:30  2%     ` David Marchand
2020-07-08 15:34  5%       ` Ruifeng Wang
2020-07-09  8:02  4% ` [dpdk-dev] [PATCH v8 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09  8:02  2%   ` [dpdk-dev] [PATCH v8 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-09 15:42  4% ` [dpdk-dev] [PATCH v9 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-09 15:42  2%   ` [dpdk-dev] [PATCH v9 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-10  2:22  4% ` [dpdk-dev] [PATCH v10 0/3] RCU integration with LPM library Ruifeng Wang
2020-07-10  2:22  2%   ` [dpdk-dev] [PATCH v10 1/3] lib/lpm: integrate RCU QSBR Ruifeng Wang
2020-07-10  2:29  0%     ` Ruifeng Wang
2020-03-05  4:33     [dpdk-dev] [RFC v1 1/1] vfio: set vf token and gain vf device access vattunuru
2020-07-03 14:57  4% ` [dpdk-dev] [PATCH v17 0/2] support for VFIO-PCI VF token interface Haiyue Wang
2020-03-20 16:41     [dpdk-dev] [RFC] ring: make ring implementation non-inlined Konstantin Ananyev
2020-03-25 21:09     ` Jerin Jacob
2020-03-26  8:04       ` Morten Brørup
2020-03-31 23:25         ` Thomas Monjalon
2020-06-30 23:15  0%       ` Honnappa Nagarahalli
2020-07-01  7:27  0%         ` Morten Brørup
2020-07-01 12:21  0%           ` Ananyev, Konstantin
2020-07-01 14:11  0%             ` Honnappa Nagarahalli
2020-04-21  2:04     [dpdk-dev] [PATCH] devtools: remove useless files from ABI reference Thomas Monjalon
2020-05-24 17:43     ` [dpdk-dev] [PATCH v2] " Thomas Monjalon
2020-05-28 13:16       ` David Marchand
2020-07-03  9:08  4%     ` David Marchand
2020-05-21 13:20     [dpdk-dev] [PATCH 20.08] mempool/ring: add support for new ring sync modes Konstantin Ananyev
2020-06-29 16:10     ` [dpdk-dev] [PATCH v2] " Konstantin Ananyev
2020-07-09 16:18       ` Olivier Matz
2020-07-09 17:55         ` Ananyev, Konstantin
2020-07-10 12:52           ` Olivier Matz
2020-07-10 15:15             ` Ananyev, Konstantin
2020-07-10 15:20               ` Ananyev, Konstantin
2020-07-13 13:30                 ` Olivier Matz
2020-07-13 14:46                   ` Ananyev, Konstantin
2020-07-13 15:00  3%                 ` Olivier Matz
2020-07-13 16:29  0%                   ` Ananyev, Konstantin
2020-05-22  6:58     [dpdk-dev] [PATCH 0/3] Experimental/internal libraries cleanup David Marchand
2020-06-26  8:16     ` [dpdk-dev] [PATCH v3 " David Marchand
2020-07-05 19:55  3%   ` Thomas Monjalon
2020-07-06  8:02  3%     ` [dpdk-dev] [dpdk-techboard] " Bruce Richardson
2020-07-06  8:12  0%       ` Thomas Monjalon
2020-07-06 16:57  0%     ` [dpdk-dev] " Medvedkin, Vladimir
2020-05-22 13:23     [dpdk-dev] [PATCH 20.08 0/9] adding support for python 3 only Louise Kilheeney
2020-07-02 10:37     ` [dpdk-dev] [PATCH v3 0/9] dding " Louise Kilheeney
2020-07-02 10:37  4%   ` [dpdk-dev] [PATCH v3 8/9] devtools: support python3 only Louise Kilheeney
2020-05-31 14:43     [dpdk-dev] [RFC] ethdev: add fragment attribute to IPv6 item Dekel Peled
2020-06-02 14:32     ` Andrew Rybchenko
2020-06-02 18:28       ` Ori Kam
2020-06-02 19:04         ` Adrien Mazarguil
2020-06-03  8:16           ` Ori Kam
2020-06-03 12:10             ` Dekel Peled
2020-06-18  6:58               ` Dekel Peled
2020-06-28 14:52  0%             ` Dekel Peled
2020-07-05 13:13  0%       ` Andrew Rybchenko
2020-06-04 21:02     [dpdk-dev] [RFC] doc: change to diverse and inclusive language Stephen Hemminger
2020-07-01 19:46     ` [dpdk-dev] [PATCH v3 00/27] Replace references to master and slave Stephen Hemminger
2020-07-01 19:46  1%   ` [dpdk-dev] [PATCH v3 22/27] doc: update references to master/slave lcore in documentation Stephen Hemminger
2020-07-01 20:23     ` [dpdk-dev] [PATCH v4 00/27] Replace references to master and slave Stephen Hemminger
2020-07-01 20:23  1%   ` [dpdk-dev] [PATCH v4 22/27] doc: update references to master/slave lcore in documentation Stephen Hemminger
2020-06-07 17:01     [dpdk-dev] [PATCH 0/9] Rename blacklist/whitelist to blocklist/allowlist Stephen Hemminger
2020-07-15 23:02     ` [dpdk-dev] [PATCH v5 0/9] rename blacklist/whitelist to exclude/include Stephen Hemminger
2020-07-15 23:02  1%   ` [dpdk-dev] [PATCH v5 9/9] doc: replace references to blacklist/whitelist Stephen Hemminger
2020-06-10  6:38     [dpdk-dev] [RFC] mbuf: accurate packet Tx scheduling Viacheslav Ovsiienko
2020-06-10 13:33     ` Harman Kalra
2020-06-10 15:16       ` Slava Ovsiienko
2020-06-17 15:57         ` [dpdk-dev] [EXT] " Harman Kalra
2020-07-01 15:46  0%       ` Slava Ovsiienko
2020-07-01 15:36  2% ` [dpdk-dev] [PATCH 1/2] mbuf: introduce " Viacheslav Ovsiienko
2020-07-07 11:50  0%   ` Olivier Matz
2020-07-07 12:46  0%     ` Slava Ovsiienko
2020-07-07 12:59  2% ` [dpdk-dev] [PATCH v2 " Viacheslav Ovsiienko
2020-07-07 13:08  2% ` [dpdk-dev] [PATCH v3 " Viacheslav Ovsiienko
2020-07-07 14:32  0%   ` Olivier Matz
2020-07-07 14:57  2% ` [dpdk-dev] [PATCH v4 " Viacheslav Ovsiienko
2020-07-07 15:23  0%   ` Olivier Matz
2020-07-08 14:16  0%   ` [dpdk-dev] [PATCH v4 1/2] mbuf: introduce accurate packet Txscheduling Morten Brørup
2020-07-08 14:54  0%     ` Slava Ovsiienko
2020-07-08 15:27  0%       ` Morten Brørup
2020-07-08 15:51  0%         ` Slava Ovsiienko
2020-07-08 15:47  2% ` [dpdk-dev] [PATCH v5 1/2] mbuf: introduce accurate packet Tx scheduling Viacheslav Ovsiienko
2020-07-08 16:05  0%   ` Slava Ovsiienko
2020-07-09 12:36  2% ` [dpdk-dev] [PATCH v6 " Viacheslav Ovsiienko
2020-07-09 23:47  0%   ` Ferruh Yigit
2020-07-10 12:32  0%     ` Slava Ovsiienko
2020-07-10 12:39  2% ` [dpdk-dev] [PATCH v7 " Viacheslav Ovsiienko
2020-07-10 15:46  0%   ` Slava Ovsiienko
2020-07-10 22:07  0%     ` Ferruh Yigit
2020-06-10 14:44     [dpdk-dev] [PATCH 0/7] Register external threads as lcore David Marchand
2020-06-26 14:47     ` [dpdk-dev] [PATCH v4 0/9] Register non-EAL " David Marchand
2020-06-26 14:47       ` [dpdk-dev] [PATCH v4 2/9] eal: fix multiple definition of per lcore thread id David Marchand
2020-06-30  9:34  0%     ` Olivier Matz
2020-06-26 14:47       ` [dpdk-dev] [PATCH v4 4/9] eal: introduce thread uninit helper David Marchand
2020-06-26 15:00         ` Jerin Jacob
2020-06-29  9:07  0%       ` David Marchand
2020-06-29  8:59  0%     ` [dpdk-dev] [EXT] " Sunil Kumar Kori
2020-06-30  9:42  0%     ` [dpdk-dev] " Olivier Matz
2020-07-06 14:15     ` [dpdk-dev] [PATCH v5 00/10] Register non-EAL threads as lcore David Marchand
2020-07-06 14:15  3%   ` [dpdk-dev] [PATCH v5 02/10] eal: fix multiple definition of per lcore thread id David Marchand
2020-07-06 14:16  3%   ` [dpdk-dev] [PATCH v5 04/10] eal: introduce thread uninit helper David Marchand
2020-07-06 20:52     ` [dpdk-dev] [PATCH v6 00/10] Register non-EAL threads as lcore David Marchand
2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 02/10] eal: fix multiple definition of per lcore thread id David Marchand
2020-07-06 20:52  3%   ` [dpdk-dev] [PATCH v6 04/10] eal: introduce thread uninit helper David Marchand
2020-06-10 17:17     [dpdk-dev] [RFC PATCH 1/6] eal: introduce macros for getting value for bit Parav Pandit
2020-06-21 19:11     ` [dpdk-dev] [PATCH v2 0/6] Improve mlx5 PMD common driver framework for multiple classes Parav Pandit
2020-06-21 19:11       ` [dpdk-dev] [PATCH v2 4/6] bus/mlx5_pci: register a PCI driver Parav Pandit
2020-06-29 15:49  2%     ` Gaëtan Rivet
2020-06-11 10:24     [dpdk-dev] [PATCH 1/2] eal: remove redundant code Phil Yang
2020-06-11 10:24     ` [dpdk-dev] [PATCH 2/2] eal: use c11 atomics for interrupt status Phil Yang
2020-07-08 12:29  3%   ` David Marchand
2020-07-08 13:43  0%     ` Aaron Conole
2020-07-08 15:04  0%     ` Kinsella, Ray
2020-07-09  5:21  0%       ` Phil Yang
2020-07-09  6:46  3% ` [dpdk-dev] [PATCH v2] eal: use c11 atomic built-ins " Phil Yang
2020-07-09  8:02  0%   ` Stefan Puiu
2020-07-09  8:34  3%   ` [dpdk-dev] [PATCH v3] " Phil Yang
2020-07-09 10:30  0%     ` David Marchand
2020-07-10  7:18  3%       ` Dodji Seketeli
2020-06-11 10:26     [dpdk-dev] [PATCH] mbuf: use c11 atomics for refcnt operations Phil Yang
2020-07-03 15:38  3% ` David Marchand
2020-07-06  8:03  3%   ` Phil Yang
2020-07-07 10:10  3% ` [dpdk-dev] [PATCH v2] mbuf: use C11 " Phil Yang
2020-07-08  5:11  3%   ` Phil Yang
2020-07-08 11:44  0%   ` Olivier Matz
2020-07-09 10:00  3%     ` Phil Yang
2020-07-09 10:10  4%   ` [dpdk-dev] [PATCH v3] mbuf: use C11 atomic built-ins " Phil Yang
2020-07-09 11:03  3%     ` Olivier Matz
2020-07-09 13:00  3%       ` Phil Yang
2020-07-09 13:31  0%         ` Honnappa Nagarahalli
2020-07-09 14:10  0%           ` Phil Yang
2020-07-09 15:58  4%     ` [dpdk-dev] [PATCH v4 1/2] " Phil Yang
2020-07-15 12:29  0%       ` David Marchand
2020-07-15 12:49  0%         ` Aaron Conole
2020-07-15 16:29  0%         ` Phil Yang
2020-07-16  4:16  0%         ` Phil Yang
2020-07-16 11:30  4%           ` David Marchand
2020-06-12 11:19     [dpdk-dev] [PATCH 1/3] eventdev: fix race condition on timer list counter Phil Yang
2020-07-02  5:26     ` [dpdk-dev] [PATCH v2 1/4] " Phil Yang
2020-07-02  5:26       ` [dpdk-dev] [PATCH v2 4/4] eventdev: relax smp barriers with c11 atomics Phil Yang
2020-07-06 10:04  4%     ` Thomas Monjalon
2020-07-06 15:32  0%       ` Phil Yang
2020-07-06 15:40  0%         ` Thomas Monjalon
2020-07-07 11:13       ` [dpdk-dev] [PATCH v3 1/4] eventdev: fix race condition on timer list counter Phil Yang
2020-07-07 11:13  4%     ` [dpdk-dev] [PATCH v3 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
2020-07-07 14:29  0%       ` Jerin Jacob
2020-07-07 15:56  0%         ` Phil Yang
2020-07-07 15:54         ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Phil Yang
2020-07-07 15:54  4%       ` [dpdk-dev] [PATCH v4 4/4] eventdev: relax smp barriers with C11 atomics Phil Yang
2020-07-08 13:30  4%       ` [dpdk-dev] [PATCH v4 1/4] eventdev: fix race condition on timer list counter Jerin Jacob
2020-07-08 15:01  0%         ` Thomas Monjalon
2020-06-13  0:00     [dpdk-dev] [PATCH v3 00/10] rename blacklist/whitelist to block/allow Stephen Hemminger
2020-07-10 15:06  3% ` David Marchand
2020-07-14  4:43  0%   ` Stephen Hemminger
2020-07-14  5:39     ` [dpdk-dev] [PATCH v4 00/11] rename blacklist/whitelist to exclude/include Stephen Hemminger
2020-07-14  5:39  4%   ` [dpdk-dev] [PATCH v4 09/11] doc: add note about blacklist/whitelist changes Stephen Hemminger
2020-06-14 22:57     [dpdk-dev] [PATCH 0/4] add PPC and Windows to meson test Thomas Monjalon
2020-06-15 22:22     ` [dpdk-dev] [PATCH v2 0/4] add PPC and Windows cross-compilation " Thomas Monjalon
2020-06-29 23:15  0%   ` Thomas Monjalon
2020-06-20 21:05     [dpdk-dev] [PATCH 0/7] cmdline: support Windows Dmitry Kozlyuk
2020-06-20 21:05     ` [dpdk-dev] [PATCH 6/7] " Dmitry Kozlyuk
2020-06-28 14:20       ` Fady Bader
2020-06-29  6:23         ` Ranjit Menon
2020-06-29  7:42  3%       ` Dmitry Kozlyuk
2020-06-29  8:12  0%         ` Tal Shnaiderman
2020-06-29 23:56  0%           ` Dmitry Kozlyuk
2020-07-08  1:09  0%             ` Dmitry Kozlyuk
2020-06-23 13:49     [dpdk-dev] [PATCH] doc: mark internal symbols in ethdev Ferruh Yigit
2020-06-26  8:49     ` Kinsella, Ray
2020-07-10 14:20  0%   ` Thomas Monjalon
2020-07-10 16:17  0%     ` Ferruh Yigit
2020-06-24  8:28     [dpdk-dev] [PATCH v9 00/10] Windows bus/pci support talshn
2020-06-29 12:37     ` [dpdk-dev] [PATCH v10 " talshn
2020-06-29 12:37  4%   ` [dpdk-dev] [PATCH v10 10/10] build: generate version.map file for MinGW on Windows talshn
2020-06-24  9:36     [dpdk-dev] [PATCH 20.11] eal: simplify exit functions Thomas Monjalon
2020-06-30 10:26  0% ` Kinsella, Ray
2020-06-25 13:38     [dpdk-dev] [PATCH v2 0/5] vhost: improve ready state Matan Azrad
2020-06-29 14:08     ` [dpdk-dev] [PATCH v3 0/6] " Matan Azrad
2020-06-29 14:08  4%   ` [dpdk-dev] [PATCH v3 1/6] vhost: support host notifier queue configuration Matan Azrad
2020-06-26 23:14     [dpdk-dev] [20.11, PATCH] bbdev: remove experimental tag from API Nicolas Chautru
2020-06-30  7:30  4% ` David Marchand
2020-06-30  7:35  3%   ` Akhil Goyal
2020-07-02 17:54  0%     ` Akhil Goyal
2020-07-02 18:02  3%       ` Chautru, Nicolas
2020-07-02 18:09  4%         ` Akhil Goyal
2020-06-27  4:37     [dpdk-dev] [PATCH 00/27] event/dlb Intel DLB PMD Tim McDaniel
2020-06-27  4:37     ` [dpdk-dev] [PATCH 01/27] eventdev: dlb upstream prerequisites Tim McDaniel
2020-06-27  7:44  5%   ` Jerin Jacob
2020-06-29 19:30  4%     ` McDaniel, Timothy
2020-06-30  4:21  0%       ` Jerin Jacob
2020-06-30 15:37  0%         ` McDaniel, Timothy
2020-06-30 15:57  0%           ` Jerin Jacob
2020-06-30 19:26  0%             ` McDaniel, Timothy
2020-06-30 20:40  0%               ` Pavan Nikhilesh Bhagavatula
2020-06-30 21:07  0%                 ` McDaniel, Timothy
2020-07-01  4:50  3%               ` Jerin Jacob
2020-07-01 16:48  0%                 ` McDaniel, Timothy
2020-06-30 11:22  0%     ` Kinsella, Ray
2020-06-30 11:30  0%       ` Jerin Jacob
2020-06-30 11:36  0%         ` Kinsella, Ray
2020-06-30 12:14  0%           ` Jerin Jacob
2020-07-02 15:21  0%             ` Kinsella, Ray
2020-07-02 16:35  3%               ` McDaniel, Timothy
2020-06-29 22:36  3% [dpdk-dev] [dpdk-announce] DPDK Userspace CFP now open; help celebrate 10 years of DPDK Jill Lovato
2020-07-02  6:19  4% [dpdk-dev] [PATCH (v20.11) 1/2] eventdev: reserve space in config structs for extension pbhagavatula
2020-07-02  6:19  4% ` [dpdk-dev] [PATCH (v20.11) 2/2] eventdev: reserve space in timer " pbhagavatula
2020-07-02 14:58  4% [dpdk-dev] DPDK Release Status Meeting 2/07/2020 Ferruh Yigit
2020-07-03 10:26     [dpdk-dev] [PATCH 0/3] ring clean up Feifei Wang
2020-07-03 10:26     ` [dpdk-dev] [PATCH 1/3] ring: remove experimental tag for ring reset API Feifei Wang
2020-07-03 16:16  4%   ` Kinsella, Ray
2020-07-03 18:46  3%     ` Honnappa Nagarahalli
2020-07-06  6:23  3%       ` Kinsella, Ray
2020-07-07  3:19  3%         ` Feifei Wang
2020-07-07  7:40  0%           ` Kinsella, Ray
2020-07-03 10:26     ` [dpdk-dev] [PATCH 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
2020-07-03 16:17  3%   ` Kinsella, Ray
2020-07-03 17:15  4% [dpdk-dev] [PATCH] doc: add sample for ABI checks in contribution guide Ferruh Yigit
2020-07-05  3:41     [dpdk-dev] [pull-request] next-eventdev 20.08 RC1 Jerin Jacob Kollanukkaran
2020-07-06  9:57  3% ` Thomas Monjalon
2020-07-05 11:46     [dpdk-dev] [PATCH v5 0/3] build mempool on Windows Fady Bader
2020-07-05 13:47     ` [dpdk-dev] [PATCH v6 " Fady Bader
2020-07-05 13:47       ` [dpdk-dev] [PATCH v6 1/3] eal: disable function versioning " Fady Bader
2020-07-05 20:23  4%     ` Thomas Monjalon
2020-07-06  7:02  0%       ` Fady Bader
2020-07-06 11:32       ` [dpdk-dev] [PATCH v7 0/3] build mempool " Fady Bader
2020-07-06 11:32  5%     ` [dpdk-dev] [PATCH v7 1/3] eal: disable function versioning " Fady Bader
2020-07-06 12:22  0%       ` Bruce Richardson
2020-07-06 23:16  0%         ` Thomas Monjalon
2020-07-07  9:03     [dpdk-dev] [PATCH] lib/librte_timer:fix corruption with reset Sarosh Arif
2020-07-10  6:59     ` [dpdk-dev] [PATCH v3] " Sarosh Arif
2020-07-10 15:19  3%   ` Stephen Hemminger
2020-07-07 14:45  8% [dpdk-dev] [PATCH v1 0/2] doc: minor abi policy fixes Ray Kinsella
2020-07-07 14:45 24% ` [dpdk-dev] [PATCH v1 1/2] doc: reword abi policy for windows Ray Kinsella
2020-07-07 15:23  7%   ` Thomas Monjalon
2020-07-07 16:33  4%     ` Kinsella, Ray
2020-07-07 14:45 12% ` [dpdk-dev] [PATCH v1 2/2] doc: clarify alias to experimental period Ray Kinsella
2020-07-07 15:26  0%   ` Thomas Monjalon
2020-07-07 16:35  3%     ` Kinsella, Ray
2020-07-07 16:36  0%       ` Thomas Monjalon
2020-07-07 16:37  0%         ` Kinsella, Ray
2020-07-07 16:55  0%           ` Honnappa Nagarahalli
2020-07-07 17:00  0%             ` Thomas Monjalon
2020-07-07 17:01  0%               ` Kinsella, Ray
2020-07-07 16:57  0%           ` Thomas Monjalon
2020-07-07 17:01  4%             ` Kinsella, Ray
2020-07-07 17:08  0%               ` Thomas Monjalon
2020-07-07 17:50  8% ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Ray Kinsella
2020-07-07 17:51 24%   ` [dpdk-dev] [PATCH v2 1/2] doc: reword abi policy for windows Ray Kinsella
2020-07-07 17:51 12%   ` [dpdk-dev] [PATCH v2 2/2] doc: clarify alias to experimental period Ray Kinsella
2020-07-07 18:44  0%     ` Honnappa Nagarahalli
2020-07-08 10:32  7%   ` [dpdk-dev] [PATCH v2 0/2] doc: minor abi policy fixes Thomas Monjalon
2020-07-08 12:02  4%     ` Kinsella, Ray
2020-07-07 15:36     [dpdk-dev] [PATCH v4 0/2] rte_flow: introduce eCPRI item for rte_flow Bing Zhao
2020-07-10  8:45     ` [dpdk-dev] [PATCH v5 " Bing Zhao
2020-07-10  8:45       ` [dpdk-dev] [PATCH v5 1/2] rte_flow: add eCPRI key fields to flow API Bing Zhao
2020-07-10 14:31         ` Olivier Matz
2020-07-11  4:25           ` Bing Zhao
2020-07-12 13:17  3%         ` Olivier Matz
2020-07-12 14:28  0%           ` Bing Zhao
2020-07-12 14:43  0%             ` Olivier Matz
2020-07-08 10:22 25% [dpdk-dev] [PATCH] devtools: give some hints for ABI errors David Marchand
2020-07-08 13:09  7% ` Kinsella, Ray
2020-07-08 13:15  4%   ` David Marchand
2020-07-08 13:22  4%     ` Kinsella, Ray
2020-07-08 13:45  7%   ` Aaron Conole
2020-07-08 14:01  4%     ` Kinsella, Ray
2020-07-09 15:52  4% ` Dodji Seketeli
2020-07-10  7:37  4% ` Kinsella, Ray
2020-07-10 10:58  4% ` Neil Horman
2020-07-15 12:15 25% ` [dpdk-dev] [PATCH v2] " David Marchand
2020-07-15 12:48  4%   ` Aaron Conole
2020-07-16  7:29  4%     ` David Marchand
     [not found]     <20200703102651.8918-1>
2020-07-09  6:12     ` [dpdk-dev] [PATCH v2 0/3] ring clean up Feifei Wang
2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 1/3] ring: remove experimental tag for ring reset API Feifei Wang
2020-07-09  6:12  3%   ` [dpdk-dev] [PATCH v2 2/3] ring: remove experimental tag for ring element APIs Feifei Wang
2020-07-09  6:53  4% [dpdk-dev] [PATCH] devtools: fix ninja break under default DESTDIR path Phil Yang
2020-07-09 15:20  4% [dpdk-dev] [PATCH 20.11 0/5] Enhance rawdev APIs Bruce Richardson
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 1/5] rawdev: add private data length parameter to info fn Bruce Richardson
2020-07-12 14:13  0%   ` Xu, Rosen
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 3/5] rawdev: add private data length parameter to config fn Bruce Richardson
2020-07-12 14:13  0%   ` Xu, Rosen
2020-07-09 15:20  3% ` [dpdk-dev] [PATCH 20.11 4/5] rawdev: add private data length parameter to queue fns Bruce Richardson
2020-07-13  9:57  3% [dpdk-dev] The mbuf API needs some cleaning up Morten Brørup
2020-07-13 12:31     [dpdk-dev] [PATCH 0/2] Deprecation notice updates Bruce Richardson
2020-07-13 12:31  5% ` [dpdk-dev] [PATCH 2/2] doc: add deprecation notice for change of rawdev APIs Bruce Richardson
2020-07-13 12:48  5%   ` Hemant Agrawal
2020-07-15 18:27  3% [dpdk-dev] [RFC PATCH 0/2] Enable dyynamic configuration of subport bandwidth profile Savinay Dharmappa
2020-07-16  8:14  0% ` Singh, Jasvinder
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).